TextRank, derived from PageRank, is a graph-based ranking model for text processing. It can be used for various natural language processing tasks, including keyword extraction, keyphrase extraction, and text summarization.
To apply the TextRank algorithm, the text must first be represented as a graph .The structure of the graph depends on the specific application:

TextRank computes the ranks of all text units recursively using a "recommendation" mechanism, similar to the PageRank algorithm. It incorporates edge weights through a modified formula that integrates them effectively:

where,

Run the following statements on an empty graph to define its structure and insert data:
ALTER EDGE default ADD PROPERTY { weight int32 }; INSERT (A:default {_id: "A"}), (B:default {_id: "B"}), (C:default {_id: "C"}), (D:default {_id: "D"}), (E:default {_id: "E"}), (F:default {_id: "F"}), (G:default {_id: "G"}), (A)-[:default {weight: 3}]->(E), (B)-[:default {weight: 3}]->(A), (B)-[:default {weight: 2}]->(E), (C)-[:default {weight: 1}]->(A), (C)-[:default {weight: 4}]->(D), (D)-[:default {weight: 5}]->(E), (E)-[:default {weight: 2}]->(G), (F)-[:default {weight: 1}]->(B), (F)-[:default {weight: 3}]->(G);
To load the entire graph to the HDC server hdc-server-1 as my_hdc_graph:
CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS { nodes: {"*": ["*"]}, edges: {"*": ["*"]}, direction: "undirected", load_id: true, update: "static" }
Algorithm name: text_rank
Name | Type | Spec | Default | Optional | Description |
|---|---|---|---|---|---|
init_value | Float | >0 | 0.2 | Yes | The initial rank assigned to all nodes. |
loop_num | Integer | ≥1 | 5 | Yes | The maximum number of iteration rounds. The algorithm terminates after all iterations are completed. |
damping | Float | (0,1) | 0.8 | Yes | The damping factor. |
max_change | Float | ≥0 | 0 | Yes | The algorithm terminates when the changes in all ranks between iterations are less than the specified max_change, indicating that the result is stable. Sets to 0 to disable this criterion. |
edge_schema_property | []"<@schema.?><property>" | / | / | No | Numeric edge properties as weights, summing values across the specified properties; edges without the specified properties are ignored. |
return_id_uuid | String | uuid, id, both | uuid | Yes | Includes _uuid, _id, or both values to represent nodes in the results. |
limit | Integer | ≥-1 | -1 | Yes | Limits the number of results returned; -1 includes all results. |
order | String | asc, desc | / | Yes | Sorts the results by rank. |
algo(text_rank).params({ projection: "my_hdc_graph", return_id_uuid: "id", init_value: 1, loop_num: 50, damping: 0.8, edge_schema_property: "weight", order: 'desc' }).write({ file: { filename: "textrank" } })
Result:
File: textrank_id,text_rank G,0.973568 E,0.81696 A,0.3472 D,0.328 B,0.24 F,0.2 C,0.2
Writes the text_rank values from the results to the specified node property. The property type is double.
algo(text_rank).params({ projection: "my_hdc_graph", loop_num: 50, edge_schema_property: "@default.weight" }).write({ db:{ property: "rank" } })
exec{ algo(text_rank).params({ return_id_uuid: "id", init_value: 1, loop_num: 50, damping: 0.8, edge_schema_property: "weight", order: "desc", limit: 5 }) as TR return TR } on my_hdc_graph
Result:
| _id | text_rank |
|---|---|
| G | 0.973568 |
| E | 0.81696 |
| A | 0.3472 |
| D | 0.328 |
| B | 0.24 |
exec{ algo(text_rank).params({ return_id_uuid: "id", loop_num: 50, damping: 0.8, edge_schema_property: "weight", order: "desc", limit: 5 }).stream() as TR return TR } on my_hdc_graph
Result:
| _id | text_rank |
|---|---|
| G | 0.973568 |
| E | 0.81696 |
| A | 0.3472 |
| D | 0.328 |
| B | 0.24 |