Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit
Search
v2.x
    v4.0

    k-NearestNeighbor

      Advanced  

    Overview

    k-NearestNeighbor (kNN), also known as the nearest neighbor algorithm, classifies a given sample based on the classifications of the K most similar (e.g. cosine similarity) samples of the given sample. Proposed in 1967 by IEEE members T. M. Cover and P. E. HART, this algorithm is one of the simplest data classification techniques.

    Although contains the word "neighbor" in its name, kNN does not rely on the relationship between samples when calcualting similar samples, which in graph means that kNN has nothing to do with the edges between nodes.

    Related material of the algorithm is as below:

    Basic Concept

    Select Similar Nodes

    Ultipa's kNN algorithm defines the sample as node in the graph, using the cosine similarity (read the Cosine Similarity chapter) of the node as the similarity of the sample, to select K nodes y that are most similar to the given node x as the K nearest neighbors Kx.

    Vote on the Classification

    Using one property of the node as the classification label of the node, count the labels of the K nearest neighbor nodes selected, and the label with the most occurrences is used as the label of x:

    If there are more than one labels that appear the most, choose the label of the node with the highest similarity.

    Special Case

    Lonely Node, Disconnected Graph

    Theoretically, the calculation of kNN does not depend on the existence of edges in the graph, and regardless of whether the node to be calculated is lonely node or which connected component it locates, it does not affect the calculation of kNN.

    Self-loop Edge

    The calculation of kNN has nothing to do with edges.

    Directed Edge

    The calculation of kNN has nothing to do with edges.

    Command

    algo(knn).params(<>)

    Configuration Item Type Default Value Specification Description
    node_id _uuid 0 >0 The UUID of the node to be calcualated
    node_schema_property []@<schema>?.<property> / Numeric node property, LTE needed; at least two properties are needed, and the schema should be same with the schema of node node_id Properties of the node that participate in the calculation of cosine similarity
    top_k int / >0 To select how many similar nodes to vote on the classification label
    target_schema_property @<schema>?.<property> / LTE needed, and the schema should be same with the schema of node node_id The property of node where label is located

    Example: Estimate the value of salary according to the property age and grade of node (UUID = 1)

    algo(knn).params({
      node_id: 1,
      node_schema_property: [@student.age, @student.grade],
      top_k: 10,
      target_schema_property: salary
    })
    

    File Writeback

    .write({file: {<>}})

    Parameter Type Default Value Specification Description
    filename string / / Name of the file path to be written back. The first row of the file is: attribute_value, columns from the second row of the file are: _id, similarity

    Property Writeback

    (Not supported)

    Statistics Writeback

    (Not supported)

    Direct Return

    as <alias>, <alias>, ... return <alias>, <alias>, ...

    Alias Number Type Description Column Name
    0 KV The elected label and the number of occurances it appears in the K nearest neighbors attribute_value, count
    1 []perNode The K nearest neighbors that are selected and their similarities node, similarity

    Streaming Return

    .stream() as <alias> return <alias>

    Alias Number Type Description Column Name
    0 KV The elected label and the number of occurances it appears in the K nearest neighbors attribute_value, count

    Real-time Statistics

    (Not Supported)

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写