UltipaDocs
Try Playground
  • Introduction
    • Show Algorithms
    • Install and Uninstall
    • Run Algorithms
    • Algorithm Results and Statistics
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Graph Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
    • CELF
    • PageRank
    • ArticleRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • K-Hop All
    • Bipartite Graph
    • HyperANF
    • Connected Component
    • Triangle Counting
    • Induced Subgraph
    • k-Core
    • k-Truss
    • p-Cohesion
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Topological Sort
    • Schema Overview
    • Dijkstra's Single-Source Shortest Path
    • Delta-Stepping Single-Source Shortest Path
    • Shortest Path Faster Algorithm (SPFA)
    • Minimum Spanning Tree
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Louvain
    • Leiden
    • Label Propagation
    • HANP
    • k-Means
    • kNN (k-Nearest Neighbors)
    • K-1 Coloring
    • Conductance
      • Random Walk
      • Node2Vec Walk
      • Node2Vec
      • Struc2Vec Walk
      • Struc2Vec
      • GraphSAGE
      • GraphSAGE Train
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Analytics & Algorithms
  4. /
  5. Similarity

Cosine Similarity

✓ File Writeback ✕ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats

Overview

In cosine similarity, data objects in a dataset are treated as vectors, and it uses the cosine value of the angle between two vectors to indicate the similarity between them. In the graph, specifying N numeric properties (features) of nodes to form N-dimensional vectors, two nodes are considered similar if their vectors are similar.

Cosine similarity ranges from -1 to 1; 1 means that the two vectors have the same direction, -1 means that the two vectors have the opposite direction.

In 2-dimensional space, the cosine similarity between vectors A(a1, a2) and B(b1, b2) is computed as:

In 3-dimensional space, the cosine similarity between vectors A(a1, a2, a3) and B(b1, b2, b3) is computed as:

The following diagram shows the relationship between vectors A and B in 2D and 3D spaces, as well as the angle θ between them:

Generalize to N-dimensional space, the cosine similarity is computed as:

Considerations

  • Theoretically, the calculation of cosine similarity between two nodes does not depend on their connectivity.
  • The value of cosine similarity is independent of the length of the vectors, but only the direction of the vectors.

Syntax

  • Command: algo(similarity)
  • Parameters:
Name
Type
Spec
Default
Optional
Description
ids / uuids[]_id / []_uuid//NoID/UUID of the first group of nodes to calculate
ids2 / uuids2[]_id / []_uuid//YesID/UUID of the second group of nodes to calculate
typestringcosinecosineYesType of similarity; for Cosine Similarity, keep it as cosine
node_schema_property[]@<schema>?.<property>Numeric type, must LTE/NoSpecify two or more node properties to form the vectors, all properties must belong to the same (one) schema
limitint≥-1-1YesNumber of results to return, -1 to return all results
top_limitint≥-1-1YesIn the selection mode, limit the maximum number of results returned for each node specified in ids/uuids, -1 to return all results with similarity > 0; in the pairing mode, this parameter is invalid

The algorithm has two calculation modes:

  1. Pairing: when both ids/uuids and ids2/uuids2 are configured, pairing each node in ids/uuids with each node in ids2/uuids2 (ignore the same node) and computing pair-wise similarities.
  2. Selection: when only ids/uuids is configured, for each target node in it, computing pair-wise similarities between it and all other nodes in the graph. The returned results include all or limited number of nodes that have similarity > 0 with the target node and is ordered by the descending similarity.

Examples

The example graph has 4 products (edges are ignored), each product has properties price, weight, weight and height:

File Writeback

SpecContent
filenamenode1,node2,similarity
UQL
algo(similarity).params({
  uuids: [1], 
  uuids2: [2,3,4],
  node_schema_property: ['price', 'weight', 'width', 'height']
}).write({
  file:{ 
    filename: 'cs_result'
  }
})

Results: File cs_result

File
product1,product2,0.986529
product1,product3,0.878858
product1,product4,0.816876
UQL
algo(similarity).params({
  uuids: [1,2,3,4],
  node_schema_property: ['price', 'weight', 'width', 'height'],
  type: 'cosine'
}).write({
  file:{ 
    filename: 'list'
  }
})

Results: File list

File
product1,product2,0.986529
product1,product3,0.878858
product1,product4,0.816876
product2,product1,0.986529
product2,product3,0.934217
product2,product4,0.881988
product3,product2,0.934217
product3,product4,0.930153
product3,product1,0.878858
product4,product3,0.930153
product4,product2,0.881988
product4,product1,0.816876

Direct Return

Alias Ordinal
Type
DescriptionColumns
0[]perNodePairNode pair and its similaritynode1, node2, similarity
UQL
algo(similarity).params({
  uuids: [1,2], 
  uuids2: [2,3,4],
  node_schema_property: ['price', 'weight', 'width', 'height'],
  type: 'cosine'
}) as cs
return cs

Results: cs

node1node2similarity
120.986529413529119
130.878858407519654
140.816876150267203
230.934216530725663
240.88198819302226
UQL
algo(similarity).params({
  uuids: [1,2],
  type: 'cosine',
  node_schema_property: ['price', 'weight', 'width', 'height'],
  top_limit: 1
}) as top
return top

Results: top

node1node2similarity
120.986529413529119
210.986529413529119

Stream Return

Alias Ordinal
Type
DescriptionColumns
0[]perNodePairNode pair and its similaritynode1, node2, similarity
UQL
algo(similarity).params({
  uuids: [3], 
  uuids2: [1,2,4],
  node_schema_property: ['@product.price', '@product.weight', '@product.width'],
  type: 'cosine'
}).stream() as cs
where cs.similarity > 0.8
return cs

Results: cs

node1node2similarity
320.883292081301959
340.877834381494613
UQL
algo(similarity).params({
  uuids: [1,3],
  node_schema_property: ['price', 'weight', 'width', 'height'],
  type: 'cosine',
  top_limit: 1
}).stream() as top
return top

Results: top

node1node2similarity
120.986529413529119
320.934216530725663