UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Introduction
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Eccentricity Centrality
    • Betweenness Centrality
    • Bridges
    • Articulation Points
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • KNN
    • Vector Similarity
    • Bipartite Graph
    • HyperANF
    • Weakly Connected Components (WCC)
    • Strongly Connected Components (SCC)
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Triangle Count
    • Clique Count
    • k-Core
    • k-Truss
    • p-Cohesion
    • Induced Subgraph
    • Topological Sort
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Dijkstra's Shortest Path
    • A* Shortest Path
    • Yen's K-Shortest Paths
    • Shortest Path (BFS)
    • Delta-Stepping SSSP
    • Shortest Path Faster Algorithm (SPFA)
    • All-Pairs Shortest Path (APSP)
    • Minimum Spanning Tree (MST)
    • K-Spanning Tree
    • Steiner Tree
    • Prize-Collecting Steiner Tree (PCST)
    • Minimum Cost Flow
    • Maximum Flow
    • K-Hop Fast
    • Longest Path (DAG)
    • Random Walk
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Same Community
    • Louvain
    • Leiden
    • Modularity Optimization
    • Label Propagation
    • HANP
    • SLPA
    • k-Means
    • HDBSCAN
    • K-1 Coloring
    • Modularity
    • Conductance
    • Max k-Cut
      • Node2Vec
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Algorithms
  4. /
  5. Centrality

TextRank

Overview

TextRank, derived from PageRank, is a graph-based ranking model for text processing. It can be used for various natural language processing tasks, including keyword extraction, keyphrase extraction, and text summarization.

  • R. Mihalcea, P. Tarau, TextRank: Bringing Order Into Texts (2004)

Concepts

Text as a Graph

To apply the TextRank algorithm, the text must first be represented as a graph. The structure of the graph depends on the specific application:

  • Nodes: Text units that best fit the task, such as words, collocations, or sentences, are added as nodes in the graph.
  • Edges: Relationships between text units, such as semantic similarity, co-occurrence, or contextual overlap, are used to connect nodes with edges.

Sample graph build for keyphrase extraction (Source: Original paper)

TextRank Model

TextRank computes the ranks of all text units recursively using a "recommendation" mechanism, similar to the PageRank algorithm. It incorporates edge weights through a modified formula:

where,

  • Out(v) is the set of nodes that node v points to;
  • wvu is the edge weight between nodes v and u;
  • d is the damping factor.

Compared to PageRank:

  • Undirected: All edges are treated as undirected.
  • Weighted: Edge weights influence rank propagation.
  • Unnormalized: The base rank is (1 - d) instead of (1 - d) / n, so scores do not sum to 1.

Considerations

  • The rank of isolated text units will stay the same as the value of (1 - d).
  • A self-loop acts as both a successor and a predecessor, meaning a node can pass some rank to itself. If a network has many self-loops, it will take more iterations to converge.

Example Graph

GQL
INSERT (A:default {_id: "A"}), (B:default {_id: "B"}),
       (C:default {_id: "C"}), (D:default {_id: "D"}),
       (E:default {_id: "E"}), (F:default {_id: "F"}),
       (G:default {_id: "G"}),
       (A)-[:default {weight: 3}]->(E), (B)-[:default {weight: 3}]->(A),
       (B)-[:default {weight: 2}]->(E), (C)-[:default {weight: 1}]->(A),
       (C)-[:default {weight: 4}]->(D), (D)-[:default {weight: 5}]->(E),
       (E)-[:default {weight: 2}]->(G), (F)-[:default {weight: 1}]->(B),
       (F)-[:default {weight: 3}]->(G)

Parameters

NameTypeDefaultDescription
weightSTRING or LIST/Edge property name(s) to use as weight (empty = unweighted).
dampingFLOAT0.85Damping factor (0, 1).
maxIterationsINT20Maximum number of iterations.
toleranceFLOAT0.0001Convergence tolerance. The algorithm terminates when score changes between iterations are less than this value.
limitINT-1Limits the number of results returned (-1 = all).
orderSTRING/Sorts the results by score: asc or desc.

Run Mode

Returns:

ColumnTypeDescription
nodeIdSTRINGNode identifier (_id)
scoreFLOATTextRank score
rankINTRank position (1 = highest TextRank)

TextRank for all nodes:

GQL
CALL algo.textrank({
  weight: "weight",
  damping: 0.8,
  maxIterations: 50,
  order: "desc"
}) YIELD nodeId, score, rank

Result:

nodeIdscorerank
E1.58656747839064161
D1.20340267158827772
A0.99510827796484793
B0.89797224974493734
G0.84737652040023815
C0.7415707123269086
F0.72644440052658897

Stream Mode

Returns the same columns as run mode, streamed for memory efficiency.

GQL
CALL algo.textrank.stream({
  order: "desc",
  limit: 3
}) YIELD nodeId, score
RETURN nodeId, score

Result:

nodeIdscore
E1.4296320035955716
A1.0934842307056216
B1.0934842307056216

Stats Mode

Returns:

ColumnTypeDescription
nodeCountINTTotal number of nodes
minScoreFLOATMinimum TextRank score
maxScoreFLOATMaximum TextRank score
avgScoreFLOATAverage TextRank score
GQL
CALL algo.textrank.stats() YIELD nodeCount, minScore, maxScore, avgScore

Result:

nodeCountminScoremaxScoreavgScore
70.78560517244526371.42963200359557160.9667775447847021

Write Mode

Computes results and writes them back to node properties. The write configuration is passed as a second argument map.

Write parameters:

NameTypeDescription
db.propertySTRING or MAPNode property to write results to. String: writes the score column in results to a property. Map: explicit column-to-property mapping (e.g., {score: 'tr_score', rank: 'tr_rank'}).

Writable columns:

ColumnTypeDescription
scoreFLOATTextRank score
rankINTRank position

Returns:

ColumnTypeDescription
task_idSTRINGTask identifier for tracking via SHOW TASKS
nodesWrittenINTNumber of nodes with properties written
computeTimeMsINTTime spent computing the algorithm (milliseconds)
writeTimeMsINTTime spent writing properties to storage (milliseconds)
GQL
CALL algo.textrank.write({damping: 0.85}, {
  db: {
    property: "tr_score"
  }
}) YIELD task_id, nodesWritten, computeTimeMs, writeTimeMs