Overview
In cosine similarity, data objects in a dataset are treated as vectors, and it uses the cosine value of the angle between two vectors to indicate the similarity between them. In the graph, specifying N numeric properties (features) of nodes to form N-dimensional vectors, two nodes are considered similar if their vectors are similar.
Cosine similarity ranges from -1 to 1; 1 means that the two vectors have the same direction, -1 means that the two vectors have the opposite direction.
In 2-dimensional space, the cosine similarity between vectors A(a1, a2) and B(b1, b2) is computed as:
In 3-dimensional space, the cosine similarity between vectors A(a1, a2, a3) and B(b1, b2, b3) is computed as:
The following diagram shows the relationship between vectors A and B in 2D and 3D spaces, as well as the angle θ between them:
Generalize to N-dimensional space, the cosine similarity is computed as:
Considerations
- Theoretically, the calculation of cosine similarity between two nodes does not depend on their connectivity.
- The value of cosine similarity is independent of the length of the vectors, but only the direction of the vectors.
Syntax
- Command:
algo(similarity)
- Parameters:
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
ids / uuids | []_id / []_uuid |
/ | / | No | ID/UUID of the first group of nodes to calculate |
ids2 / uuids2 | []_id / []_uuid |
/ | / | Yes | ID/UUID of the second group of nodes to calculate |
type | string | cosine |
cosine |
Yes | Type of similarity; for Cosine Similarity, keep it as cosine |
node_schema_property | []@<schema>?.<property> |
Numeric type, must LTE | / | No | Specify two or more node properties to form the vectors, all properties must belong to the same (one) schema |
limit | int | ≥-1 | -1 |
Yes | Number of results to return, -1 to return all results |
top_limit | int | ≥-1 | -1 |
Yes | In the selection mode, limit the maximum number of results returned for each node specified in ids /uuids , -1 to return all results with similarity > 0; in the pairing mode, this parameter is invalid |
The algorithm has two calculation modes:
- Pairing: when both
ids
/uuids
andids2
/uuids2
are configured, pairing each node inids
/uuids
with each node inids2
/uuids2
(ignore the same node) and computing pair-wise similarities. - Selection: when only
ids
/uuids
is configured, for each target node in it, computing pair-wise similarities between it and all other nodes in the graph. The returned results include all or limited number of nodes that have similarity > 0 with the target node and is ordered by the descending similarity.
Examples
The example graph has 4 products (edges are ignored), each product has properties price, weight, weight and height:
File Writeback
Spec | Content |
---|---|
filename | node1 ,node2 ,similarity |
algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: ['price', 'weight', 'width', 'height']
}).write({
file:{
filename: 'cs_result'
}
})
Results: File cs_result
product1,product2,0.986529
product1,product3,0.878858
product1,product4,0.816876
algo(similarity).params({
uuids: [1,2,3,4],
node_schema_property: ['price', 'weight', 'width', 'height'],
type: 'cosine'
}).write({
file:{
filename: 'list'
}
})
Results: File list
product1,product2,0.986529
product1,product3,0.878858
product1,product4,0.816876
product2,product1,0.986529
product2,product3,0.934217
product2,product4,0.881988
product3,product2,0.934217
product3,product4,0.930153
product3,product1,0.878858
product4,product3,0.930153
product4,product2,0.881988
product4,product1,0.816876
Direct Return
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|
0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
algo(similarity).params({
uuids: [1,2],
uuids2: [2,3,4],
node_schema_property: ['price', 'weight', 'width', 'height'],
type: 'cosine'
}) as cs
return cs
Results: cs
node1 | node2 | similarity |
---|---|---|
1 | 2 | 0.986529413529119 |
1 | 3 | 0.878858407519654 |
1 | 4 | 0.816876150267203 |
2 | 3 | 0.934216530725663 |
2 | 4 | 0.88198819302226 |
algo(similarity).params({
uuids: [1,2],
type: 'cosine',
node_schema_property: ['price', 'weight', 'width', 'height'],
top_limit: 1
}) as top
return top
Results: top
node1 | node2 | similarity |
---|---|---|
1 | 2 | 0.986529413529119 |
2 | 1 | 0.986529413529119 |
Stream Return
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|
0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
algo(similarity).params({
uuids: [3],
uuids2: [1,2,4],
node_schema_property: ['@product.price', '@product.weight', '@product.width'],
type: 'cosine'
}).stream() as cs
where cs.similarity > 0.8
return cs
Results: cs
node1 | node2 | similarity |
---|---|---|
3 | 2 | 0.883292081301959 |
3 | 4 | 0.877834381494613 |
algo(similarity).params({
uuids: [1,3],
node_schema_property: ['price', 'weight', 'width', 'height'],
type: 'cosine',
top_limit: 1
}).stream() as top
return top
Results: top
node1 | node2 | similarity |
---|---|---|
1 | 2 | 0.986529413529119 |
3 | 2 | 0.934216530725663 |