✓ File Writeback ✕ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats
Overview
In cosine similarity, data objects in a dataset are treated as vectors, and it uses the cosine value of the angle between two vectors to indicate the similarity between them. In the graph, specifying N numeric properties (features) of nodes to form N-dimensional vectors, two nodes are considered similar if their vectors are similar.
Cosine similarity ranges from -1 to 1; 1 means that the two vectors have the same direction, -1 means that the two vectors have the opposite direction.
In 2-dimensional space, the cosine similarity between vectors A(a1, a2) and B(b1, b2) is computed as:
In 3-dimensional space, the cosine similarity between vectors A(a1, a2, a3) and B(b1, b2, b3) is computed as:
The following diagram shows the relationship between vectors A and B in 2D and 3D spaces, as well as the angle θ between them:
Generalize to N-dimensional space, the cosine similarity is computed as:
Considerations
- Theoretically, the calculation of cosine similarity between two nodes does not depend on their connectivity.
- The value of cosine similarity is independent of the length of the vectors, but only the direction of the vectors.
Syntax
- Command:
algo(similarity)
- Parameters:
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
ids / uuids | []_id / []_uuid |
/ | / | No | ID/UUID of the first group of nodes to calculate |
ids2 / uuids2 | []_id / []_uuid |
/ | / | Yes | ID/UUID of the second group of nodes to calculate |
type | string | cosine |
cosine |
Yes | Type of similarity; for Cosine Similarity, keep it as cosine |
node_schema_property | []@<schema>?.<property> |
Must LTE | / | No | Two or more numeric node properties must be specified to form the vector |
limit | int | >=-1 | -1 |
Yes | Number of results to return, -1 to return all results |
top_limit | int | >=-1 | -1 |
Yes | Limit the length of top_list , -1 to return the full top_list |
This algorithm has two calculation modes:
- Pairing: when
ids/uuids
andids2/uuids2
are both configured, pairing nodes in the first group with nodes in the second group (Cartesian product) to compute pair-wise similarities. - Selection: when only
ids/uuids
is configured, for each node in the group, computing pair-wise similarities between it and all other nodes in the graph in order to select the most similar nodes, the returnedtop_list
includes all nodes that have similarity > 0 with it and is ordered by the descending similarity.
Examples
The example graph has 4 products (edges are ignored), each product has properties price, weight, weight and height:
File Writeback
Calculation Mode | Spec | Content |
---|---|---|
Pairing | filename | node1 ,node2 ,similarity |
Selection | filename | node ,top_list |
algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: [price,weight,width,height]
}).write({
file:{
filename: "cs_result"
}
})
Results: File cs_result
product1,product2,0.986529
product1,product3,0.878858
product1,product4,0.816876
algo(similarity).params({
uuids: [1,2,3,4],
node_schema_property: [price,weight,width,height],
type: "cosine"
}).write({
file:{
filename: "list"
}
})
Results: File list
product1,product2:0.986529;product3:0.878858;product4:0.816876;
product2,product1:0.986529;product3:0.934217;product4:0.881988;
product3,product2:0.934217;product4:0.930153;product1:0.878858;
product4,product3:0.930153;product2:0.881988;product1:0.816876;
Direct Return
Calculation Mode |
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|---|
Pairing | 0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
Selection | 0 | []perNode | Node and its selection results | node , top_list |
algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: [price,weight,width,height],
type: "cosine"
}) as cs
return cs order by cs.similarity
Results: cs
node1 | node2 | similarity |
---|---|---|
1 | 4 | 0.816876150267203 |
1 | 3 | 0.878858407519654 |
1 | 2 | 0.986529413529119 |
algo(similarity).params({
uuids: [1,2],
type: "cosine",
node_schema_property: [price,weight,width,height],
top_limit: 1
}) as top
return top
Results: top
node | top_list |
---|---|
1 | 2:0.986529, |
2 | 1:0.986529, |
Stream Return
Calculation Mode |
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|---|
Pairing | 0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
Selection | 0 | []perNode | Node and its selection results | node , top_list |
algo(similarity).params({
uuids: [3],
uuids2: [1,2,4],
node_schema_property: [price,weight,width,height],
type: "cosine"
}).stream() as cs
where cs.similarity > 0.9
return cs
Results: cs
node1 | node2 | similarity |
---|---|---|
3 | 2 | 0.934216530725663 |
3 | 4 | 0.930152895706265 |
algo(similarity).params({
uuids: [1,3],
node_schema_property: [price,weight,width,height],
type: "cosine",
top_limit: 1
}).stream() as top
return top
Results: top
node | top_list |
---|---|
1 | 2:0.986529, |
3 | 2:0.934217, |