Change Nickname

Current Nickname:

• Ultipa Graph V4

Standalone

The MAC address of the server you want to deploy.

Cancel
Apply
 ID Product Status Cores Applied Validity Period(days) Effective Date Excpired Date Mac Address Apply Comment Review Comment
Close
Profile
• Full Name:
• Phone:
• Company:
• Company Email:
• Country:
• Language:
Apply

You have no license application record.

Apply
Certificate Issued at Valid until Serial No. File
Serial No. Valid until

Not having one? Apply now! >>>

Product Created On ID Amount (USD) Invoice
Product Created On ID Amount (USD) Invoice

No Invoice

Overlap Similarity

✓ File Writeback ✕ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats

Overview

Overlap similarity is derived from Jaccard similarity, which is also called the Szymkiewicz–Simpson coefficient. It divides the size of the intersection of two sets by the size of the smaller set with the purpose to indicate how similar the two sets are.

Overlap similarity ranges from 0 to 1; 1 means that one set is the subset of the other or the two sets are exactly the same, 0 means that the two sets do not have any element in common.

Concepts

Overlap Similarity

Given two sets A and B, the overlap similarity between them is computed as:

In the following example, set A = {b,c,e,f,g}, set B = {a,d,b,g}, their intersection A⋂B = {b,g}, hence the overlap similarity between A and B is `2 / 4 = 0.5`.

Neighbor Set

In Ultipa's Overlap Similarity algorithm, the following points have to be noted when collecting the neighbor sets of two target nodes to compute their similarity:

• There is no repeated nodes in the neighbor set;
• Self-loop is ignored;
• Any edge between the two target nodes is ignored;
• Edge direction is ignored.

In the graph above, when computing the similarity between node u and node v, the neighbor sets for the two nodes are Nu = {a,b,c,d,e} and Nv = {d,e,f}, so their overlap similarity is `2 / 3 = 0.6667`.

In practice, you may need to convert some node properties into node schemas in order to calculate the similarity index that is based on common neighbors, just as the overlap Similarity. For instance, when considering the similarity between two applications, information like phone number, email, device IP, etc. of the application might have been stored as properties of @application node schema; they need to be designed as nodes and incorporated into the graph in order to be used for comparison.

Syntax

• Command: `algo(similarity)`
• Parameters:
Name
Type
Spec
Default
Optional
Description
ids / uuids []`_id` / []`_uuid` / / No ID/UUID of the first group of nodes to calculate
ids2 / uuids2 []`_id` / []`_uuid` / / Yes ID/UUID of the second group of nodes to calculate
type string `overlap` `cosine` No Type of similarity; for Overlap Similarity, keep it as `overlap`
limit int >=-1 `-1` Yes Number of results to return, `-1` to return all results
top_limit int >=-1 `-1` Yes Limit the length of `top_list`, `-1` to return the full `top_list`

This algorithm has two calculation modes:

1. Pairing: when `ids/uuids` and `ids2/uuids2` are both configured, pairing nodes in the first group with nodes in the second group (Cartesian product) to compute pair-wise similarities.
2. Selection: when only `ids/uuids` is configured, for each node in the group, computing pair-wise similarities between it and all other nodes in the graph in order to select the most similar nodes, the returned `top_list` includes all nodes that have similarity > 0 with it and is ordered by the descending similarity.

Examples

The example graph is as follows:

File Writeback

Calculation Mode Spec Content
Pairing filename `node1`,`node2`,`similarity`
Selection filename `node`,`top_list`
``````algo(similarity).params({
ids: "userC",
ids2: ["userA", "userB", "userD"],
type: "overlap"
}).write({
file:{
filename: "sc"
}
})
``````

Results: File sc

``````userC,userA,0.25
userC,userB,0.5
userC,userD,0
``````
``````algo(similarity).params({
uuids: [1,2,3,4],
type: "overlap"
}).write({
file:{
filename: "list"
}
})
``````

Results: File list

``````userA,userC:1.000000;userB:0.500000;userD:0.333333;
userB,userC:1.000000;userA:0.500000;userD:0.500000;
userC,userA:1.000000;userB:1.000000;
userD,userB:0.500000;userA:0.333333;
``````

Direct Return

Calculation Mode
Alias Ordinal
Type
Description Columns
Pairing 0 []perNodePair Node pair and its similarity `node1`, `node2`, `similarity`
Selection 0 []perNode Node and its selection results `node`, `top_list`
``````algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
type: "overlap"
}) as overlap
return overlap
order by overlap.similarity desc
``````

Results: overlap

node1 node2 similarity
1 3 1
1 2 0.5
1 4 0.333333333333333
``````algo(similarity).params({
uuids: [1,2],
type: "overlap",
top_limit: 1
}) as top
``````

Results: top

node top_list
1 3:1.000000,
2 3:1.000000,

Stream Return

Calculation Mode
Alias Ordinal
Type
Description Columns
Pairing 0 []perNodePair Node pair and its similarity `node1`, `node2`, `similarity`
Selection 0 []perNode Node and its selection results `node`, `top_list`
``````algo(similarity).params({
uuids: [3],
uuids2: [1,2,4],
type: "overlap"
}).stream() as overlap
where overlap.similarity > 0
return overlap
``````

Results: overlap

node1 node2 similarity
3 1 1
3 2 1
``````algo(similarity).params({
uuids: [1],
type: "overlap",
top_limit: 2
}).stream() as top
``````

Results: top

node top_list
1 3:1.000000,2:0.500000,