Change Password

Please enter the password
Please enter the password Length between [8, 64] ASCII characters Not identical to your email address At least 3 character types from uppercase, lowercase, numbers, and single-byte character symbols
Please enter the password
Submit

Change Nickname

Current Nickname:
Submit
Search
v4.0
    v4.0

    Cosine Similarity

    Overview

    Cosine similarity uses the cosine value of the angle formed by two N-dimensional vectors in vector space to indicate the similarity between them. Cosine similarity between two nodes in graph is calculated by using N properties of node to form two N-dimensional vectors.

    The range of cosine similarity values is [0,1]; the larger the value, the more similar the two nodes are.

    Basic Concept

    Vector

    Vector is one of the basic concepts in Advanced Mathematics, vectors in low dimensional spaces are relatively easy to understand and express. The following diagram shows the relationship between vectors A, B and coordinate axes in 2- and 3-dimensional spaces respectively, as well as the angle θ between them:

    When comparing two nodes in graph, N properties of node are used to form the two N-dimensional vectors.

    Cosine Similarity

    In 2-dimensional space, the formula to calculate the cosine similarity is:

    In 3-dimensional space, the formula to calculate the cosine similarity is:

    Generalize to n-dimensional space, the formula to calculate the cosine similarity is:

    Special Case

    Lonely Node, Disconnected Graph

    Theoretically, the calculation of cosine similarity between two nodes does not depend on the existence of edges in the graph. Regardless of whether the two nodes to be calculated are lonely nodes or whether they are in the same connected component, it does not affect the calculation of their cosine similarity.

    Self-loop Edge

    The calculation of cosine similarity has nothing to do with edges.

    Directed Edge

    The calculation of cosine similarity has nothing to do with edges.

    Results and Statistics

    The graph below has 4 product nodes (edges are ignored), use properties price, weight, weight and height to form vector:

    Algorithm results: Calculate cosine similarity between product1 and other 3 products, return node1, node2 and similarity

    node1 node2 similarity
    1 2 0.9865294135291195
    1 3 0.8788584075196542
    1 4 0.8168761502672031

    Algorithm statistics: N/A

    Command and Configuration

    • Command: algo(similarity)
    • Configurations for the parameter params():
    Name Type
    Default
    Specification
    Description
    ids / uuids []_id / []_uuid / Mandatory IDs or UUIDs of the first set of nodes to be calculated, only need to configure one of them
    ids2 / uuids2 []_id / []_uuid / Mandatory IDs or UUIDs of the second set of nodes to be calculated, only need to configure one of them
    type string cosine jaccard / overlap / cosine / pearson / euclideanDistance / euclidean Measurement of the similarity; jaccard means to calculate Jaccard similarity, overlap means to calcualte overlap similarity, cosine means to calcualte cosine similarity, pearson means to calculate Pearson correlation coefficient, euclideanDistance means to calculate Euclidean distance, euclidean means to calcualte normalzied Euclidean distance
    node_schema_property []@<schema>?.<property> / Numeric node property, LTE needed When the type is cosine/pearson/euclideanDistance/euclidean, must specify two or more node properties to form the vector, schema can be either carried or not; when the type is jaccard/overlap, this parameter is invalid
    limit int -1 >=-1 Number of results to return; return all results if sets to -1 or not set
    top_limit int -1 -1 or >=0 Only available when ids2 and uuids2 are ignored, limit the length of selection results top_list of each node, return the full top_list if sets to -1 or not set

    Example: Calculate cosine similarity of nodes UUID = 1,2 and nodes UUID = 3,4 through properties price and weight

    algo(similarity).params({
      uuids: [1,2],
      uuids2: [3,4],
      node_schema_property: [price, weight],
      type: "cosine"
    }) as p
    return p
    

    Algorithm Execution

    Task Writeback

    1. File Writeback

    Configuration Data in Each Row
    filename node1,node2,similarity

    Example: Calculate cosine similarity between node UUID = 1 and other nodes through properties price, weight, width and height, write the algorithm results back to file named cs_result

    algo(similarity).params({
      uuids: [1], 
      uuids2: [2,3,4],
      node_schema_property: [price,weight,width,height]
    }).write({
      file:{ 
        filename: "cs_result"
      }
    })
    

    2. Property Writeback

    Not supported by this algorithm.

    3. Statistics Writeback

    This algorithm has no statistics.

    Direct Return

    Alias Ordinal Type
    Description
    Column Name
    0 []perNodePair Node pair and its similarity node1, node2, similarity

    Example: Calculate cosine similarity between node UUID = 1 and other nodes through properties price, weight, width and height, define algorithm results as alias named similarity and return the results

    algo(similarity).params({
      uuids: [1], 
      uuids2: [2,3,4],
      node_schema_property: [price,weight,width,height],
      type: "cosine"
    }) as similarity 
    return similarity
    

    Streaming Return

    Alias Ordinal Type
    Description
    Column Name
    0 []perNodePair Node pair and its similarity node1, node2, similarity

    Example: Calculate cosine similarity between node UUID = 1 and other nodes through properties price, weight, width and height, define algorithm results as alias named similarity, return 2 results

    algo(similarity).params({
      uuids: [1], 
      uuids2: [2,3,4],
      node_schema_property: [price,weight,width,height],
      type: "cosine"
    }).stream() as similarity 
    return similarity limit 2
    

    Real-time Statistics

    This algorithm has no statistics.

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写