This section is dedicated to query results validation of 3 types of operations:
- K-hop
- Shortest-Path
- Graph Algorithms
Let’s start with K-hop queries. First, let’s be clear of the definition of K-hop, there have been 2 types of K-hop:
- Kth-Hop Neighbors, which are exactly K hops away from the source vertex
- All Neighbors from the 1st Hop all the way to the Kth Hop
No matter which type of K-hop you are looking at, 2 essential points affect the correctness of the query:
- K-hop should be implemented using BFS, instead of DFS
- Results Deduplication: The results will not contain duplicated vertices on the same hop or across different hops (Many graph DBMS do have this dedup problem as we speak.)
Some vendors use DFS to find shortest paths, this approach has 2 fatal problems:
- Outright Wrong: There is a high probability that the results are wrong as DFS can’t guarantee a vertex belongs to the right hop (depth).
- Inefficient: On large and densely populated datasets, it’s impractical to traverse all possible paths, for instance, Twitter-2010 has many hotspot nodes having millions of neighbors, any 2-hop or deeper queries mean astronomical computational complexity!
Let’s validate the 1-Hop result of vertex ID=27960125 in Twitter-2010, first, we start from the source file which shows 8 edges (rows of connecting neighbors), but, what exactly is its 1-hop?
The correct answer is 7! Because the node 27960125 has a neighbor ID=27498091 which appears twice, because there are 2 edges in between these 2 vertices. If we deduplicate, we have 7.
To expand here on K-hop queries, it is possible to launch K-hop per edge direction, the screenshots below show 4 ways for running K-hop: regular bidirectional K-hop, outbound K-hop, inbound K-hop and K-hop with all neighbors. These queries combined can help validate any K-hop query.
If we compare with publicly accessible Tigergraph’s K-hop query results, the vertex ID=27960125 has only 6 1-hop neighbors (shown in Graph 9), and the similar counting mistakes prevail throughout the entire published results on Github.