UltipaDocs
Try Playground
  • Introduction
  • Managing HDC Graphs
  • Managing Distributed Projections
  • Installing Algorithms
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Graph Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • K-Hop All
    • Bipartite Graph
    • HyperANF
    • Connected Component
    • Triangle Counting
    • Induced Subgraph
    • k-Core
    • k-Truss
    • p-Cohesion
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Topological Sort
    • Schema Overview
    • Dijkstra's Single-Source Shortest Path
    • Delta-Stepping Single-Source Shortest Path
    • Shortest Path Faster Algorithm (SPFA)
    • Minimum Spanning Tree
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Louvain
    • Leiden
    • Label Propagation
    • HANP
    • k-Means
    • kNN (k-Nearest Neighbors)
    • K-1 Coloring
    • Conductance
      • Random Walk
      • Node2Vec Walk
      • Node2Vec
      • Struc2Vec Walk
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Analytics & Algorithms
  4. /
  5. Centrality

TextRank

HDC

Overview

TextRank, derived from PageRank, is a graph-based ranking model for text processing. It can be used for various natural language processing tasks, including keyword extraction, keyphrase extraction, and text summarization.

  • R. Mihalcea, P. Tarau, TextRank: Bringing Order Into Texts (2004)

Concepts

Text as a Graph

To apply the TextRank algorithm, the text must first be represented as a graph .The structure of the graph depends on the specific application:

  • Nodes: Text units that best fit the task, such as words, collocations, or sentences, are added as nodes in the graph.
  • Edges: Relationships between text units, such as semantic similarity, co-occurrence, or contextual overlap, are used to connect nodes with edges.

Sample graph build for keyphrase extraction: nodes are selected lexical units from the text, and edges are established based on co-occurrence within a defined window of words (Source: Original paper)

TextRank Model

TextRank computes the ranks of all text units recursively using a "recommendation" mechanism, similar to the PageRank algorithm. It incorporates edge weights through a modified formula that integrates them effectively:

where,

  • Out(v) is the set of nodes that node v points to;
  • wvu is the edge weight between nodes v and u;
  • d is the damping factor.

Considerations

  • The rank of isolated text unit will stay the same as the value of (1 - d).
  • A self-loop acts as both a successor and a predecessor, meaning a node can pass some rank to itself. If a network has many self-loops, it will take more iterations to converge.

Example Graph

Run the following statements on an empty graph to define its structure and insert data:

ALTER EDGE default ADD PROPERTY {
  weight int32
};
INSERT (A:default {_id: "A"}),
       (B:default {_id: "B"}),
       (C:default {_id: "C"}),
       (D:default {_id: "D"}),
       (E:default {_id: "E"}),
       (F:default {_id: "F"}),
       (G:default {_id: "G"}),       
       (A)-[:default {weight: 3}]->(E),
       (B)-[:default {weight: 3}]->(A),
       (B)-[:default {weight: 2}]->(E),
       (C)-[:default {weight: 1}]->(A),
       (C)-[:default {weight: 4}]->(D),
       (D)-[:default {weight: 5}]->(E),
       (E)-[:default {weight: 2}]->(G),
       (F)-[:default {weight: 1}]->(B),
       (F)-[:default {weight: 3}]->(G);

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as my_hdc_graph:

CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}

Parameters

Algorithm name: text_rank

Name
Type
Spec
Default
Optional
Description
init_valueFloat>00.2YesThe initial rank assigned to all nodes.
loop_numInteger≥15YesThe maximum number of iteration rounds. The algorithm terminates after all iterations are completed.
dampingFloat(0,1)0.8YesThe damping factor.
max_changeFloat≥00YesThe algorithm terminates when the changes in all ranks between iterations are less than the specified max_change, indicating that the result is stable. Sets to 0 to disable this criterion.
edge_schema_property[]"<@schema.?><property>"//NoNumeric edge properties as weights, summing values across the specified properties; edges without the specified properties are ignored.
return_id_uuidStringuuid, id, bothuuidYesIncludes _uuid, _id, or both values to represent nodes in the results.
limitInteger≥-1-1YesLimits the number of results returned; -1 includes all results.
orderStringasc, desc/YesSorts the results by rank.

File Writeback

algo(text_rank).params({
  projection: "my_hdc_graph",
  return_id_uuid: "id",
  init_value: 1,
  loop_num: 50,
  damping: 0.8,
  edge_schema_property: "weight",
  order: 'desc'
}).write({
  file: {
    filename: "textrank"
  }
})

Result:

File: textrank
_id,text_rank
G,0.973568
E,0.81696
A,0.3472
D,0.328
B,0.24
F,0.2
C,0.2

DB Writeback

Writes the text_rank values from the results to the specified node property. The property type is double.

algo(text_rank).params({
  projection: "my_hdc_graph",
  loop_num: 50,
  edge_schema_property: "@default.weight"
}).write({
  db:{ 
    property: "rank"
  }
})

Full Return

exec{
  algo(text_rank).params({
    return_id_uuid: "id",    
    init_value: 1,
    loop_num: 50,
    damping: 0.8,
    edge_schema_property: "weight",
    order: "desc",
    limit: 5
  }) as TR
  return TR
} on my_hdc_graph

Result:

_idtext_rank
G0.973568
E0.81696
A0.3472
D0.328
B0.24

Stream Return

exec{
  algo(text_rank).params({
    return_id_uuid: "id",
    loop_num: 50,
    damping: 0.8,
    edge_schema_property: "weight",
    order: "desc",
    limit: 5
  }).stream() as TR
  return TR
} on my_hdc_graph

Result:

_idtext_rank
G0.973568
E0.81696
A0.3472
D0.328
B0.24