Advanced
Overview
k-NearestNeighbor (kNN), also known as the nearest neighbor algorithm, classifies a given sample based on the classifications of the K most similar (e.g. cosine similarity) samples of the given sample. Proposed in 1967 by IEEE members T. M. Cover and P. E. HART, this algorithm is one of the simplest data classification techniques.
Although contains the word "neighbor" in its name, kNN does not rely on the relationship between samples when calcualting similar samples, which in graph means that kNN has nothing to do with the edges between nodes.
Related material of the algorithm is as below:
Basic Concept
Select Similar Nodes
Ultipa's kNN algorithm defines the sample as node in the graph, using the cosine similarity (read the Cosine Similarity chapter) of the node as the similarity of the sample, to select K nodes y
that are most similar to the given node x
as the K nearest neighbors Kx
.
Vote on the Classification
Using one property of the node as the classification label of the node, count the labels of the K nearest neighbor nodes selected, and the label with the most occurrences is used as the label of x
:

If there are more than one labels that appear the most, choose the label of the node with the highest similarity.
Special Case
Lonely Node, Disconnected Graph
Theoretically, the calculation of kNN does not depend on the existence of edges in the graph, and regardless of whether the node to be calculated is lonely node or which connected component it locates, it does not affect the calculation of kNN.
Self-loop Edge
The calculation of kNN has nothing to do with edges.
Directed Edge
The calculation of kNN has nothing to do with edges.
Command
algo(knn).params(<>)
Configuration Item | Type | Default Value | Specification | Description |
---|---|---|---|---|
node_id | _uuid |
0 | >0 | The UUID of the node to be calcualated |
node_schema_property | []@<schema>?.<property> |
/ | Numeric node property, LTE needed; at least two properties are needed, and the schema should be same with the schema of node node_id |
Properties of the node that participate in the calculation of cosine similarity |
top_k | int | / | >0 | To select how many similar nodes to vote on the classification label |
target_schema_property | @<schema>?.<property> |
/ | LTE needed, and the schema should be same with the schema of node node_id |
The property of node where label is located |
Example: Estimate the value of salary according to the property age and grade of node (UUID = 1)
algo(knn).params({
node_id: 1,
node_schema_property: [@student.age, @student.grade],
top_k: 10,
target_schema_property: salary
})
File Writeback
.write({file: {<>}})
Parameter | Type | Default Value | Specification | Description |
---|---|---|---|---|
filename | string | / | / | Name of the file path to be written back. The first row of the file is: attribute_value , columns from the second row of the file are: _id , similarity |
Property Writeback
(Not supported)
Statistics Writeback
(Not supported)
Direct Return
as <alias>, <alias>, ... return <alias>, <alias>, ...
Alias Number | Type | Description | Column Name |
---|---|---|---|
0 | KV | The elected label and the number of occurances it appears in the K nearest neighbors | attribute_value , count |
1 | []perNode | The K nearest neighbors that are selected and their similarities | node , similarity |
Streaming Return
.stream() as <alias> return <alias>
Alias Number | Type | Description | Column Name |
---|---|---|---|
0 | KV | The elected label and the number of occurances it appears in the K nearest neighbors | attribute_value , count |
Real-time Statistics
(Not Supported)