# Change Nickname

Current Nickname:

Apply New License

License Detail

Please complete this required field.

• Ultipa Graph V4

Standalone

Please complete this required field.

Please complete this required field.

The MAC address of the server you want to deploy.

Please complete this required field.

Please complete this required field.

Cancel
Apply
 ID Product Status Cores Applied Validity Period(days) Effective Date Excpired Date Mac Address Apply Comment Review Comment
Close
Profile
• Full Name:
• Phone:
• Company:
• Company Email:
• Country:
• Language:
Change Password
Apply

You have no license application record.

Apply
Certificate Issued at Valid until Serial No. File
Serial No. Valid until

Not having one? Apply now! >>>

Product Created On ID Amount (USD) Invoice
Product Created On ID Amount (USD) Invoice

No Invoice

# Jaccard Similarity

✓ File Writeback ✕ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats

## Overview

Jaccard similarity, or Jaccard index, was proposed by Paul Jaccard in 1901. It’s a metric of similarity for two sets of data. In the graph, collecting the neighbors of a node into a set, two nodes are considered similar if their neighbor sets are similar.

Jaccard similarity ranges from 0 to 1; 1 means that two sets are exactly the same, 0 means that the two sets do not have any element in common.

## Concepts

### Jaccard Similarity

Given two sets A and B, the Jaccard similarity between them is computed as:

In the following example, set A = {b,c,e,f,g}, set B = {a,d,b,g}, their intersection A⋂B = {b,g}, their union A⋃B = {a,b,c,d,e,f,g}, hence the Jaccard similarity between A and B is `2 / 7 = 0.2857`.

### Neighbor Set

In Ultipa's Jaccard Similarity algorithm, the following points have to be noted when collecting the neighbor sets of two target nodes to compute their similarity:

• There is no repeated nodes in the neighbor set;
• Self-loop is ignored;
• Any edge between the two target nodes is ignored;
• Edge direction is ignored.

In the graph above, when computing the similarity between node u and node v, the neighbor sets for the two nodes are Nu = {a,b,c,d,e} and Nv = {d,e,f}, so their Jaccard similarity is `2 / 6 = 0.3333`.

In practice, you may need to convert some node properties into node schemas in order to calculate the similarity index that is based on common neighbors, just as the Jaccard Similarity. For instance, when considering the similarity between two applications, information like phone number, email, device IP, etc. of the application might have been stored as properties of @application node schema; they need to be designed as nodes and incorporated into the graph in order to be used for comparison.

## Syntax

• Command: `algo(similarity)`
• Parameters:
Name
Type
Spec
Default
Optional
Description
ids / uuids []`_id` / []`_uuid` / / No ID/UUID of the first group of nodes to calculate
ids2 / uuids2 []`_id` / []`_uuid` / / Yes ID/UUID of the second group of nodes to calculate
type string `jaccard` `cosine` No Type of similarity; for Jaccard Similarity, keep it as `jaccard`
limit int >=-1 `-1` Yes Number of results to return, `-1` to return all results
top_limit int >=-1 `-1` Yes In the selection mode, limit the maximum number of results returned for each node specified in `ids`/`uuids`, `-1` to return all results with similarity > 0; in the pairing mode, this parameter is invalid

The algorithm has two calculation modes:

1. Pairing: when both `ids`/`uuids` and `ids2`/`uuids2` are configured, pairing each node in `ids`/`uuids` with each node in `ids2`/`uuids2` (ignore the same node) and computing pair-wise similarities.
2. Selection: when only `ids`/`uuids` is configured, for each target node in it, computing pair-wise similarities between it and all other nodes in the graph. The returned results include all or limited number of nodes that have similarity > 0 with the target node and is ordered by the descending similarity.

## Examples

The example graph is as follows:

### File Writeback

Spec Content
filename `node1`,`node2`,`similarity`
``````algo(similarity).params({
ids: 'userC',
ids2: ['userA', 'userB', 'userD'],
type: 'jaccard'
}).write({
file:{
filename: 'sc'
}
})
``````

Results: File sc

``````userC,userA,0.25
userC,userB,0.5
userC,userD,0
``````
``````algo(similarity).params({
uuids: [1,2,3,4],
type: 'jaccard'
}).write({
file:{
filename: 'list'
}
})
``````

Results: File list

``````userA,userC,0.25
userA,userB,0.2
userA,userD:0.166667
userB,userC:0.5
userB,userD,0.25
userB,userA,0.2
userC,userB,0.5
userC,userA,0.25
userD,userB:0.25
userD,userA:0.166667
``````

### Direct Return

Alias Ordinal
Type
Description Columns
0 []perNodePair Node pair and its similarity `node1`, `node2`, `similarity`
``````algo(similarity).params({
uuids: [1,2],
uuids2: [2,3,4],
type: 'jaccard'
}) as jacc
return jacc
``````

Results: jacc

node1 node2 similarity
1 2 0.2
1 3 0.25
1 4 0.166666666666667
2 3 0.5
2 4 0.25
``````algo(similarity).params({
uuids: [1,2],
type: 'jaccard',
top_limit: 1
}) as top
return top
``````

Results: top

node1 node2 similarity
1 3 0.25
2 3 0.5

### Stream Return

Alias Ordinal
Type
Description Columns
0 []perNodePair Node pair and its similarity `node1`, `node2`, `similarity`
``````algo(similarity).params({
uuids: [3],
uuids2: [1,2,4],
type: 'jaccard'
}).stream() as jacc
where jacc.similarity > 0
return jacc
``````

Results: jacc

node1 node2 similarity
3 1 0.25
3 2 0.5
``````algo(similarity).params({
uuids: [1],
type: 'jaccard',
top_limit: 2
}).stream() as top
return top
``````

Results: top

node1 node2 similarity
1 3 0.25
1 2 0.2
Please complete the following information to download this book