✓ File Writeback ✕ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats
Overview
The Pearson correlation coefficient is the most common way of measuring the strength and direction of the linear relationship between two quantitative variables. In the graph, nodes are quantified by N numeric properties (features) of them.
For two variables X= (x1, x2, ..., xn) and Y = (y1, y2, ..., yn) , Pearson correlation coefficient (r) is defined as the ratio of the covariance of them and the product of their standard deviations:

The Pearson correlation coefficient ranges from -1 to 1:
Pearson correlation coefficient |
Correlation type |
Interpretation |
---|---|---|
0 < r ≤ 1 | Positive correlation | As one variable becomes larger, the other variable becomes larger |
r = 0 | No linear correlation | (May exist some other types of correlation) |
-1 ≤ r < 0 | Negative correlation | As one variable becomes larger, the other variable becomes smaller |
Considerations
- Theoretically, the calculation of Pearson correlation coefficient between two nodes does not depend on their connectivity.
Syntax
- Command:
algo(similarity)
- Parameters:
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
ids / uuids | []_id / []_uuid |
/ | / | No | ID/UUID of the first group of nodes to calculate |
ids2 / uuids2 | []_id / []_uuid |
/ | / | Yes | ID/UUID of the second group of nodes to calculate |
type | string | pearson |
cosine |
No | Type of similarity; for Pearson Correlation Coefficient, keep it as pearson |
node_schema_property | []@<schema>?.<property> |
Must LTE | / | No | Two or more numeric node properties must be specified to to quantify the nodes |
limit | int | >=-1 | -1 |
Yes | Number of results to return, -1 to return all results |
top_limit | int | >=-1 | -1 |
Yes | Limit the length of top_list , -1 to return the full top_list |
This algorithm has two calculation modes:
- Pairing: when
ids/uuids
andids2/uuids2
are both configured, pairing nodes in the first group with nodes in the second group (Cartesian product) to compute pair-wise similarities. - Selection: when only
ids/uuids
is configured, for each node in the group, computing pair-wise similarities between it and all other nodes in the graph in order to select the most similar nodes, the returnedtop_list
includes all nodes that have similarity > 0 with it and is ordered by the descending similarity.
Examples
The example graph has 4 products (edges are ignored), each product has properties price, weight, weight and height:

File Writeback
Calculation Mode | Spec | Content |
---|---|---|
Pairing | filename | node1 ,node2 ,similarity |
Selection | filename | node ,top_list |
algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: [price,weight,width,height],
type: "pearson"
}).write({
file:{
filename: "pearson"
}
})
Results: File pearson
product1,product2,0.998785
product1,product3,0.474384
product1,product4,0.210494
algo(similarity).params({
uuids: [1,2,3,4],
node_schema_property: [price,weight,width,height],
type: "pearson"
}).write({
file:{
filename: "list"
}
})
Results: File list
product1,product2:0.998785;product3:0.474384;product4:0.210494;
product2,product1:0.998785;product3:0.507838;product4:0.253573;
product3,product2:0.507838;product1:0.474384;product4:0.474021;
product4,product3:0.474021;product2:0.253573;product1:0.210494;
Direct Return
Calculation Mode |
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|---|
Pairing | 0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
Selection | 0 | []perNode | Node and its selection results | node , top_list |
algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: [price,weight,width,height],
type: "pearson"
}) as p
return p order by p.similarity
Results: p
node1 | node2 | similarity |
---|---|---|
1 | 4 | 0.210494150169583 |
1 | 3 | 0.474383803132863 |
1 | 2 | 0.998785121601255 |
algo(similarity).params({
uuids: [1,2],
type: "pearson",
node_schema_property: [price,weight,width,height],
top_limit: 1
}) as top
return top
Results: top
node | top_list |
---|---|
1 | 2:0.998785, |
2 | 1:0.998785, |
Stream Return
Calculation Mode |
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|---|
Pairing | 0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
Selection | 0 | []perNode | Node and its selection results | node , top_list |
algo(similarity).params({
uuids: [3],
uuids2: [1,2,4],
node_schema_property: [price,weight,width,height],
type: "pearson"
}).stream() as p
where p.similarity > 0.5
return p
Results: p
node1 | node2 | similarity |
---|---|---|
3 | 2 | 0.50783775659896 |
algo(similarity).params({
uuids: [1,3],
node_schema_property: [price,weight,width,height],
type: "pearson",
top_limit: 1
}).stream() as top
return top
Results: top
node | top_list |
---|---|
1 | 2:0.998785, |
3 | 2:0.507838, |