Overview
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. In the graph, specifying N numeric properties (features) of nodes to indicate the location of the node in an N-dimensional Euclidean space.
Concepts
Euclidean Distance
In 2-dimensional space, the formula to compute the Euclidean distance between points A(x1, y1) and B(x2, y2) is:
In 3-dimensional space, the formula to compute the Euclidean distance between points A(x1, y1, z1) and B(x2, y2, z2) is:
Generalize to N-dimensional space, the formula to compute the Euclidean distance is:
where xi1 represents the i-th dimensional coordinates of the first point, xi2 represents the i-th dimensional coordinates of the second point.
The Euclidean distance ranges from 0 to +∞; the smaller the value, the more similar the two nodes.
Normalized Euclidean Distance
Normalized Euclidean distance scales the Euclidean distance into range from 0 to 1; the closer to 1, the more similar the two nodes.
Ultipa adopts the following formula to normalize the Euclidean distance:
Considerations
- Theoretically, the calculation of Euclidean distance between two nodes does not depend on their connectivity.
Syntax
- Command:
algo(similarity)
- Parameters:
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
ids / uuids | []_id / []_uuid |
/ | / | No | ID/UUID of the first group of nodes to calculate |
ids2 / uuids2 | []_id / []_uuid |
/ | / | Yes | ID/UUID of the second group of nodes to calculate |
type | string | euclideanDistance , euclidean |
cosine |
No | Type of similarity; euclideanDistance is to compute Euclidean Distance, euclidean is to compute Normalized Euclidean Distance |
node_schema_property | []@<schema>?.<property> |
Numeric type, must LTE | / | No | Specify two or more node properties to form the vectors, all properties must belong to the same (one) schema |
limit | int | ≥-1 | -1 |
Yes | Number of results to return, -1 to return all results |
top_limit | int | ≥-1 | -1 |
Yes | In the selection mode, limit the maximum number of results returned for each node specified in ids /uuids , -1 to return all results with similarity > 0; in the pairing mode, this parameter is invalid |
The algorithm has two calculation modes:
- Pairing: when both
ids
/uuids
andids2
/uuids2
are configured, pairing each node inids
/uuids
with each node inids2
/uuids2
(ignore the same node) and computing pair-wise similarities. - Selection: when only
ids
/uuids
is configured, for each target node in it, computing pair-wise similarities between it and all other nodes in the graph. The returned results include all or limited number of nodes that have similarity > 0 with the target node and is ordered by the descending similarity.
Examples
The example graph has 4 products (edges are ignored), each product has properties price, weight, weight and height:
File Writeback
Spec | Content |
---|---|
filename | node1 ,node2 ,similarity |
algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: ['price', 'weight', 'width', 'height'],
type: 'euclideanDistance'
}).write({
file:{
filename: 'ed'
}
})
Results: File ed
product1,product2,94.3822
product1,product3,143.962
product1,product4,165.179
algo(similarity).params({
uuids: [1,2,3,4],
node_schema_property: ['price', 'weight', 'width', 'height'],
type: 'euclidean'
}).write({
file:{
filename: 'ed_list'
}
})
Results: File ed_list
product1,product2,0.010484
product1,product3,0.006898
product1,product4,0.006018
product2,product3,0.018082
product2,product4,0.013309
product2,product1,0.010484
product3,product4,0.024091
product3,product2,0.018082
product3,product1,0.006898
product4,product3,0.024091
product4,product2,0.013309
product4,product1,0.006018
Direct Return
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|
0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
algo(similarity).params({
uuids: [1,2],
uuids2: [2,3,4],
node_schema_property: ['price', 'weight', 'width', 'height'],
type: 'euclideanDistance'
}) as distance
return distance
Results: distance
node1 | node2 | similarity |
---|---|---|
1 | 2 | 94.3822017119753 |
1 | 3 | 143.96180048888 |
1 | 4 | 165.178691119648 |
2 | 3 | 54.3046959295419 |
2 | 4 | 74.1350119714025 |
algo(similarity).params({
uuids: [1,2],
type: 'euclidean',
node_schema_property: ['price', 'weight', 'width', 'height'],
top_limit: 1
}) as top
return top
Results: top
node1 | node2 | similarity |
---|---|---|
1 | 2 | 0.0104841362649574 |
2 | 3 | 0.0180816471945529 |
Stream Return
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|
0 | []perNodePair | Node pair and its similarity | node1 , node2 , similarity |
algo(similarity).params({
uuids: [3],
uuids2: [1,2,4],
node_schema_property: ['@product.price', '@product.weight', '@product.width'],
type: 'euclidean'
}).stream() as distance
where distance.similarity > 0.01
return distance
Results: distance
node1 | node2 | similarity |
---|---|---|
3 | 2 | 0.0180816471945529 |
3 | 4 | 0.0240910110982062 |
algo(similarity).params({
uuids: [1,3],
node_schema_property: ['price', 'weight', 'width', 'height'],
type: 'euclideanDistance',
top_limit: 1
}).stream() as top
return top
Results: top
node1 | node2 | similarity |
---|---|---|
1 | 4 | 165.178691119648 |
3 | 1 | 143.96180048888 |