Making Good Use of the K-Hop Query - Technical

Change Password

Submit

Change Email

Submit

Change Nickname

Current Nickname:

Submit

Profile

Account ID:

Full Name:
Phone:
Company:
Company Email:

Change Password

Apply

You have no license application record.

Apply

Certificate	Issued at	Valid until	Serial No.	File

Serial No.	Valid until	File

Not having one? Apply now! >>>

Product	Created On	ID	Amount (USD)	Invoice

Product	Created On	ID	Amount (USD)	Invoice

No Invoice

Create Ultipa Account

I agree to the Privacy Policy .

Please agree to continue.

Already have an Ultipa account? Sign in now!

Forgot Password

Reset Password

Back to sign in

Making Good Use of the K-Hop Query

In the realm of graph computing, the query for neighbors of a node is also known as the K-hop query, with "K" defining the shortest distance between these neighbors and the focal node. K-Hop queries find interesting applications across industries and research fields. Impact or risk analysis, for instance, tracks the spread of a disease by identifying individuals within a certain proximity of an infected person.

A Real Test

K-hop queries yield definitive results. First of all, regardless of how many possible routes exist between nodes, K-hop query cares solely the shortest path(s). When ignoring edge weights, the number of edges in the shortest path defines the K. As the example shows, D is the 2-hop neighbor of A, but not a 3-hop neighbor.

Shortest Path

Secondly, K-hop neighbors should be de-duplicated. In cases where multiple shortest paths are found, each neighbor is still counted only once. Consequently, the 1-hop neighborhood of A is {B} rather than {B, B}.

These two points uphold the accuracy of K-hop queries. As the graph size expands to encompass millions or even billions of nodes and edges, and the required query depth increases to 6, 15 or even exceeds 20, the computational pressure loaded on the graph computing engine intensifies significantly. Manual verification of results becomes impractical, with many graph databases or graph-computing frameworks unable to guarantee returns in such scenarios, or correctness of results.

The ability to execute a 6-hop query for a node in a large graph (like Twitter-2010 dataset) within 10 seconds is indicative of a Real-time Graph System. Traversing 6 hops from most nodes in the Twitter dataset would touch over 99% of its all 1.5 billion edges and nodes, equating to traversing over 150 million nodes and edges per second.

For the K-hop query, Ultipa offers slews of optimization for the traversal and deduplication mechanisms, ensuring both accuracy and efficiency. Therefore, we recommend you use the K-hop queries whenever they can achieve the same outcome as other query types.

Example: Counting Distinct Values

Consider the following UQL query executed on a Display_Ad_Click dataset, containing over 1 million users and 0.8 million online ads. The dataset captures user behaviors of clicking ads over some days, represented by edges, resulting in approximately 1.3 million clicks recorded.

Number of nodes and edges in the dataset

find().nodes({@user.shopping_level == 1}) as u
call {
    with u
    n(u).e({@clicks}).n({@ad} as ads)
    with count(DISTINCT ads) as adsCount
    return adsCount
}
order by adsCount desc limit 5
return table(u._id, adsCount)

The objective of this query is to determine the distinct ads clicked by each user at the 1^st shopping level and rank the top 5 users based on the count of distinct ads.

u._id	adsCount
u853650	103
u341347	84
u642926	70
u74383	65
u950625	64

Results of the query

This query employs an n().e().n() path template to search for the ads clicked by individual users. The use of the operator DISTINCT is necessary when counting the ads due to the possibility of users clicking on the same ad repeatedly.

Ad-clicking records of the user u1060021

Ad-clicking records of an user

Let’s examine the query plan and time cost by adding the PROFILE prefix to the UQL statement. In my testing environment, the query took a total of 35,450ms (millisecond) to generate the final table, with a substantial 33,002ms spent on the path template queries.

Query plan and time cost for the UQL statement

Use the K-Hop Query

The requirement for distinct values makes us to consider using the performant K-hop query instead. Since the results of K-hop query is already de-duplicated, the DISTINCT used in the count() function is also saved.

find().nodes({@user.shopping_level == 1}) as u
call {
    with u
    khop().n(u).e({@clicks}).n({@ad}) as ads
    with count(ads) as adsCount
    return adsCount
}
order by adsCount desc limit 5
return table(u._id, adsCount)

This UQL yielded identical results but in just 2,030ms, with the K-hop path template query completing in only 771ms – a notable speed improvement of over 42 times. With 74,612 users at the 1^st shopping level involved, the K-hop query for each one took only about 0.01ms.

Final Remarks

The exemplary performance of Ultipa’s K-hop query is remarkable and competitive. Proper utilization of the K-hop query can achieve substantial speed improvement in the data analytics processes. When evaluating graph systems, it’s imperative to consider the effectiveness of the K-hop query.

ID
Product
Status
Cores
Maximum Shard Services
Maximum Total Cores for Shard Service
Maximum HDC Services
Maximum Total Cores for HDC Service
Applied Validity Period(days)
Effective Date
Expired Date
Mac Address
Reason for Application
Review Comment