Overview
The k-Nearest Neighbors (kNN) algorithm is a classification technique that classifies the target node based on the classifications of its k nearest (most similar) nodes. kNN was proposed by T.M. Cover and P.E. Hart in 1967 and has since become one of the simplest and widely used classification algorithms:
- T.M. Cover, P.E. Hart, Nearest Neighbor Pattern Classification (1967)
Although containing the word neighbor in its name, kNN does not explicitly consider the edges between nodes when calculating similarity. Instead, it focuses solely on node properties.
Concepts
Similarity Metric
Ultipa's kNN algorithm computes pair-wise cosine similarity between the target node and all other nodes in the graph, then selects the k nodes with the highest similarity to the target node.
Vote on Classification
One node property is selected as the class label. After finding the nearest k nodes for the target node, assign the majority label among the k nodes to the target node.
If multiple labels occur with the highest frequency, the label of the node with the highest similarity will be selected.
Syntax
- Command:
algo(knn)
- Parameters:
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
node_id | _uuid |
/ | / | No | UUID of the target node |
node_schema_property | []@<schema>?.<property> |
Numeric type, must LTE, must be of the same schema as the target node | / | No | Two or more node properties to compute the cosine similarity |
top_k | int | >0 | / | No | The number of the nearest nodes to select |
target_schema_property | @<schema>?.<property> |
Numeric/String type, must LTE, must be of the same schema as the target node | / | No | Node property to use as the class label |
Examples
The example graph has 6 image nodes (edges are ignored), and each node has properties d1, d2, d3, d4 and type:
File Writeback
Spec |
Content |
Description |
---|---|---|
filename | First row: attribute_value Second row and later: _id ,similarity |
First row: The elected class label Second row and later: ID of the nearest node and its cosine similarity with the target node |
algo(knn).params({
node_id: 1,
node_schema_property: ['d1', 'd2', 'd3', 'd4'],
top_k: 4,
target_schema_property: @image.type
}).write({
file:{
filename: "knn"
}
})
Results: File knn
Gold
top k : image4,0.538975
image3,0.705072
image6,0.841922
image2,0.85516
Direct Return
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|
0 | KV | The elected class label and its number of occurrences among the k nearest neighbors | attribute_value , count |
1 | []perNode | The nearest node and its cosine similarity with the target node | node , similarity |
algo(knn).params({
node_id: 1,
node_schema_property: ['d1', 'd2', 'd3', 'd4'],
top_k: 4,
target_schema_property: @image.type
}) as a1, a2
return a1, a2
Results: a1 and a2
attribute_value | count |
---|---|
Gold | 2 |
node | similarity |
---|---|
4 | 0.538974677919475 |
3 | 0.705071517140301 |
6 | 0.841922130134788 |
2 | 0.855159652306166 |
Stream Return
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|
0 | KV | The elected class label and its number of occurrences among the k nearest neighbors | attribute_value , count |
algo(knn).params({
node_id: 2,
node_schema_property: ['@image.d1', '@image.d2', '@image.d3', '@image.d4'],
top_k: 5,
target_schema_property: @image.type
}).stream() as label
find().nodes({_uuid == 2}) as target
return case
when target.type == label.attribute_value then 'True'
else 'false'
end
Results: false
find().nodes({@image}) as images
call {
with images._uuid as target
algo(knn).params({
node_id: target,
node_schema_property: ['d1', 'd2', 'd3', 'd4'],
top_k: 3,
target_schema_property: 'type'
}).stream() as label
return label
}
return table(images._id, label.attribute_value)
Results: table(images._id, label.attribute_value)
images._id | label.attribute_value |
---|---|
image1 | Silver |
image2 | Silver |
image3 | Gold |
image4 | Silver |
image5 | Gold |
image6 | Gold |