✓ File Writeback ✕ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats
Overview
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. In the graph, specifying N numeric properties (features) of nodes to indicate the location of the node in an N-dimensional Euclidean space.
Concepts
Euclidean Distance
In 2-dimensional space, the formula to compute the Euclidean distance between points A(x1, y1) and B(x2, y2) is:
In 3-dimensional space, the formula to compute the Euclidean distance between points A(x1, y1, z1) and B(x2, y2, z2) is:
Generalize to N-dimensional space, the formula to compute the Euclidean distance is:
where xi1 represents the i-th dimensional coordinates of the first point, xi2 represents the i-th dimensional coordinates of the second point.
The Euclidean distance ranges from 0 to +∞; the smaller the value, the more similar the two nodes.
Normalized Euclidean Distance
Normalized Euclidean distance scales the Euclidean distance into range from 0 to 1; the closer to 1, the more similar the two nodes.
Ultipa adopts the following formula to normalize the Euclidean distance:
Considerations
- Theoretically, the calculation of Euclidean distance between two nodes does not depend on their connectivity.
Syntax
- Command:
algo(similarity)
- Parameters:
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
ids / uuids | []_id / []_uuid |
/ | / | No | ID/UUID of the first group of nodes to calculate |
ids2 / uuids2 | []_id / []_uuid |
/ | / | Yes | ID/UUID of the second group of nodes to calculate |
type | string | euclideanDistance , euclidean |
cosine |
No | Type of similarity; euclideanDistance is to compute Euclidean Distance, euclidean is to compute Normalized Euclidean Distance |
node_schema_property | []@<schema>?.<property> |
Must LTE | / | No | Two or more numeric node properties must be specified to represent the nodes |
limit | int | >=-1 | -1 |
Yes | Number of results to return, -1 to return all results |
top_limit | int | >=-1 | -1 |
Yes | Limit the length of top_list , -1 to return the full top_list |
This algorithm has two calculation modes:
- Pairing: when
ids/uuids
andids2/uuids2
are both configured, pairing nodes in the first group with nodes in the second group (Cartesian product) to compute pair-wise similarities. - Selection: when only
ids/uuids
is configured, for each node in the group, computing pair-wise similarities between it and all other nodes in the graph in order to select the most similar nodes, the returnedtop_list
includes all nodes that have similarity > 0 with it and is ordered by the descending similarity.
Examples
The example graph has 4 products (edges are ignored), each product has properties price, weight, weight and height:
File Writeback
Calculation Mode | Spec | Content |
---|---|---|
Pairing | filename | node1 ,node2 ,similarity |
Selection | filename | node ,top_list |
algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: [price,weight,width,height],
type: "euclideanDistance"
}).write({
file:{
filename: "ed"
}
})
Results: File ed
product1,product2,94.3822
product1,product3,143.962
product1,product4,165.179
algo(similarity).params({
uuids: [1,2,3,4],
node_schema_property: [price,weight,width,height],
type: "euclidean"
}).write({
file:{
filename: "ed_list"
}
})
Results: File ed_list
product1,product2:0.010484;product3:0.006898;product4:0.006018;
product2,product3:0.018082;product4:0.013309;product1:0.010484;
product3,product4:0.024091;product2:0.018082;product1:0.006898;
product4,product3:0.024091;product2:0.013309;product1:0.006018;
Direct Return
Calculation Mode |
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|---|
Pairing | 0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
Selection | 0 | []perNode | Node and its selection results | node , top_list |
algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: [price,weight,width,height],
type: "euclideanDistance"
}) as distance
return distance
order by distance.similarity desc
Results: distance
node1 | node2 | similarity |
---|---|---|
1 | 4 | 165.178691119648 |
1 | 3 | 143.96180048888 |
1 | 2 | 94.3822017119753 |
algo(similarity).params({
uuids: [1,2],
type: "euclidean",
node_schema_property: [price,weight,width,height],
top_limit: 1
}) as top
return top
Results: top
node | top_list |
---|---|
1 | 2:0.010484, |
2 | 3:0.018082, |
Stream Return
Calculation Mode |
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|---|
Pairing | 0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
Selection | 0 | []perNode | Node and its selection results | node , top_list |
algo(similarity).params({
uuids: [3],
uuids2: [1,2,4],
node_schema_property: [price,weight,width,height],
type: "euclidean"
}).stream() as distance
where distance.similarity > 0.01
return distance
Results: distance
node1 | node2 | similarity |
---|---|---|
3 | 2 | 0.0180816471945529 |
3 | 4 | 0.0240910110982062 |
algo(similarity).params({
uuids: [1,3],
node_schema_property: [price,weight,width,height],
type: "euclideanDistance",
top_limit: 1
}).stream() as top
return top
Results: top
node | top_list |
---|---|
1 | 4:165.178691, |
3 | 1:143.961800, |