## Overview

K-Hop Whole Graph algorithm calculates the number of K-hop neighbors for each node and returns those neighbors. The algorithm is widely used in scenarios such as relationship discovery, impact prediction and friend suggestion and so on.

K-Hop Whole Graph can be viewed as the batch execution of UQL command `khop()`

. Although Ultipa has optimized the K-Hop Whole Graph algorithm for high concurrency performance, massive system resources can still be consumed by the algorithm in large graphs (over tens of millions nodes and edges) or graphs with many super nodes. Therefore, it is recommended to avoid performing K-Hop Whole Graph calculation that is too deep.

In a graph which

`nodes/edges = 100`

, the theoretical computation amount required to query 5-hop neighbors of a node is 10^{5}(10 billion computational complexity) and it would take 100ms; then 1 million seconds (12 days) are needed to complete such query in a graph of 10 million nodes.

## Basic Concept

### BFS

BFS is short for Breadth First Search, together with DFS (Depth First Search), are two the most basic traversal principles. K-Hop query follows BFS principle, which means that when traversing the whole graph from one node, it first traverses all the neighbor nodes of the current node that have not arrived yet, and then traverses the neighbor nodes of these neighbor nodes in turns that have not arrived yet.

Starting from the red node and traversing by the BFS principle, the yellow, green, blue nodes are found in succession. Color of these nodes is classified by the distance to the start node, that is, the number of edges of the shortest path(s) from the start node to them. This distance is what the `K`

means in K-Hop query, which is the `K`

-hop neighbors of the start node.

Once the start node is given, K values of the other nodes in the graph are also determined. In other words, if a node is the 5th level neighbor of the start node, then it is impossible to appear in the 4th, 6th or other levels.

### Direction of K-Hop

K-Hop query in directed graph can be restrained by the direction of the edge where the neighbor is located, this is when some nodes do not have K values and not all nodes are found in the traversal result.

Starting from the red node in the directed graph on the left side, and traversing by the outbound edges and inbound edges; two distinct traversal results are obtained, the purple node does not belong to any level when traversing in the inbound direction and its K value does not exist.

## Special Case

### Lonely Node, Disconnected Graph

Lonely node will not appear in any K-Hop query result.

For disconnected graph, only nodes belong to the same connected component with the start node are involved in the K-Hop query result, while other nodes cannot be traversed.

### Self-loop Edge

Each node is traversed only once in K-Hop query, so the self-loop edge of node can be considered invalid.

### Directed Edge

When the K-Hop query is performed without specifying the direction of edges, the direction of edges is ignored, and certain K values are calculated for all nodes in the connected component where that start node is located.

## Results and Statistics

Take the transaction graph below as an example, the color of card represent its level, run the K-Hop Whole Graph algorithm against all nodes:

**Algorithm results: **Calculate 1-hop neighbors for each node and count the maximum level of the results, return `_id`

, `value`

or `_uuid`

, `value`

according to the execution method, column `value`

includes the aggregative counting result(s) and the number of neighbors

_uuid | _id | value |
---|---|---|

1 | Card1 | 3, 3 |

2 | Card2 | 3, 3 |

3 | Card3 | 2, 3 |

4 | Card4 | 3, 1 |

5 | Card5 | 2, 2 |

6 | Card6 | , 0 |

**Algorithm statistics: **N/A

## Command and Configuration

- Command:
`algo(khop_all)`

- Configurations for the parameter
`params()`

:

Name | Type | Default |
Specification | Description |
---|---|---|---|---|

ids / uuids | []`_id` / []`_uuid` |
/ | / | IDs or UUIDs of nodes to be calculated; all nodes to be calculated if not set |

k_start | int | 1 | >= 1 | The minimum K-Hop query depth, the query depth is [`k_start` , `k_end` ], `k_end` ≥ `k_start` |

k_end | int | 1 | >= 1 | The maximum K-Hop query depth, the query depth is [`k_start` , `k_end` ], `k_end` ≥ `k_start` |

src_include | int | 0 | 0 or 1 | Include the start node in the result or not, 1 means to include the start node, 0 or not set means to exclude the start node |

direction | string | / | in/out, case insensitive | Edge direction in the path; direction ignored if not set |

node_property | []`@<schema>?.<property>` |
/ | Numeric node property, LTE needed | Node property/properties to be aggregatively counted, schema can be either carried or not; must be used with `aggregate_opt` ; results without any specified property will not be aggregatively counted |

aggregate_opt | []string | / | max / min / mean / sum / var / dev, case insensitive | The method that each specified property of the results to be aggregatively counted; must be used with `node_property` and one property corresponds to one aggregative counting method; max means the maximum, min means the minimum, mean means the average, sum means the sum, var means the variance, and dev means the standard deviation |

limit | int | -1 | >=-1 | Number of results to return; return all results if sets to -1 or not set |

Example: Calculate the number of 2-hop neighbors in the outbound direction of all nodes

```
algo(khop_all).params({
k_start: 2,
k_end: 2,
direction: "out"
}) as k2
return k2
```

Example: Calculate 2-hop and 3-hop neighbors of nodes UUID = 3,4,5 and aggregatively count the maximum value of property *@card.level* and sum of property *@card.balance* of the results, also include the start node (UUID = 3,4,5) in each node's results

```
algo(khop_all).params({
uuids: [3,4,5],
k_start: 2,
k_end: 3,
src_include: 1,
node_property: [@card.level, @card.balance],
aggregate_opt: ["max", "sum"]
}) as k2
return k2
```

## Algorithm Execution

### Task Writeback

#### 1. File Writeback

Configuration | Data in Each Row | Description |
---|---|---|

filename_ids | `_id` ,`_id` |
Each result takes up one line, the first `_id` in each line represents the node that is calculated, the second `_id` represents the neighbor of the calculated node |

filename | `_id` ,`aggregate_result1` ,...,`aggregate_resultN` ,`count` |
`_id` represents the node that is calculated, the next is/are the one or multiple aggregative counting result(s) (`aggregate_result1` ~ `aggregate_resultN` ), the last `count` is the number of neighbors |

Example: Calculate 2-hop and 3-hop neighbors of all nodes and aggregatively count the maximum value of property *@card.level* and sum of property *@card.balance* of the results, write the IDs of neighbors of each node back to file named *neighbors*, write the aggregative counting results and the number of neighbors of each node back to file named *counts*

```
algo(khop_all).params({
k_start: 3,
k_end: 3,
node_property: [@card.level, @card.balance],
aggregate_opt: ["max", "sum"]
}).write({
file:{
filename_ids: "neighbors",
filename: "counts"
}
})
```

#### 2. Property Writeback

Configuration | Writeback Content | Type | Data Type |
---|---|---|---|

property | Number of neighbors | Node property | `double` |

Example: Calculate 2-hop neighbors of all nodes, write the number of neighbors back to node property named *khop2*

```
algo(khop_all).params({
k_start: 2,
k_end: 2
}).write({
db:{
property: "khop2"
}
})
```

#### 3. Statistics Writeback

This algorithm has no statistics.

### Direct Return

Alias Ordinal | Type | Description |
Column Name |
---|---|---|---|

0 | []perNode | Node and its aggregative counting result(s) and number of neighbors | `_uuid` , `value` |

Example: Calculate 3-hop neighbors of all nodes and aggregatively count the maximum value of property *@card.level* and sum of property *@card.balance* of the results, define algorithm results as alias named *nei*, and return the results

```
algo(khop_all).params({
k_start: 3,
k_end: 3,
node_property: [@card.level, @card.balance],
aggregate_opt: ["max", "sum"]
}) as nei
return nei
```

### Streaming Return

Alias Ordinal | Type | Description |
Column Name |
---|---|---|---|

0 | []perNode | Node and its aggregative counting result(s) and number of neighbors | `_uuid` , `value` |

Example: Calculate 1-hop to 3-hop neighbors of node UUID = 3 and aggregatively count the maximum value of property *@card.level* of the results, and return only the aggregative counting result and the number of neighbors

```
algo(khop_all).params({
uuids: [3],
k_start: 1,
k_end: 3,
node_property: @card.level,
aggregate_opt: max
}).stream() as results
return results.value
```

### Real-time Statistics

This algorithm has no statistics.