UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Introduction
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Eccentricity Centrality
    • Betweenness Centrality
    • Bridges
    • Articulation Points
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • KNN
    • Vector Similarity
    • Bipartite Graph
    • HyperANF
    • Weakly Connected Components (WCC)
    • Strongly Connected Components (SCC)
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Triangle Count
    • Clique Count
    • k-Core
    • k-Truss
    • p-Cohesion
    • Induced Subgraph
    • Topological Sort
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Dijkstra's Shortest Path
    • A* Shortest Path
    • Yen's K-Shortest Paths
    • Shortest Path (BFS)
    • Delta-Stepping SSSP
    • Shortest Path Faster Algorithm (SPFA)
    • All-Pairs Shortest Path (APSP)
    • Minimum Spanning Tree (MST)
    • K-Spanning Tree
    • Steiner Tree
    • Prize-Collecting Steiner Tree (PCST)
    • Minimum Cost Flow
    • Maximum Flow
    • K-Hop Fast
    • Longest Path (DAG)
    • Random Walk
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Same Community
    • Louvain
    • Leiden
    • Modularity Optimization
    • Label Propagation
    • HANP
    • SLPA
    • k-Means
    • HDBSCAN
    • K-1 Coloring
    • Modularity
    • Conductance
    • Max k-Cut
      • Node2Vec
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Algorithms
  4. /
  5. Similarity

Cosine Similarity

Overview

In cosine similarity, data objects in a dataset are treated as vectors, and it uses the cosine value of the angle between two vectors to indicate the similarity between them. In the graph, N numeric node properties (features) are specified to form N-dimensional vectors; two nodes are considered similar if their vectors are similar.

Cosine similarity ranges from -1 to 1, where 1 indicates that the two vectors point in the same direction, and -1 indicates they point in opposite directions.

Concepts

Cosine Similarity

In 2-dimensional space, the cosine similarity between vectors A = [a1, a2] and B = [b1, b2] is computed as:

In 3-dimensional space, the cosine similarity between vectors A = [a1, a2, a3] and B = [b1, b2, b3] is computed as:

The following diagram shows the relationship between vectors A and B in 2D and 3D spaces, as well as the angle θ between them:

Generalized to N-dimensional space, cosine similarity is computed as:

Considerations

  • The calculation of cosine similarity between two nodes is independent of their connectivity in the graph.
  • The value of cosine similarity is independent of the length of the vectors, but only the direction of the vectors.

Example Graph

GQL
INSERT (:product {_id: "product1", price: 50, weight: 160, width: 20, height: 152}),
       (:product {_id: "product2", price: 42, weight: 90, width: 30, height: 90}),
       (:product {_id: "product3", price: 24, weight: 50, width: 55, height: 70}),
       (:product {_id: "product4", price: 38, weight: 20, width: 32, height: 66})

Parameters

NameTypeDefaultDescription
typeSTRINGjaccardType of similarity to compute: cosine.
idsLIST/First group of node _ids. If empty, all nodes are used.
ids2LIST/Second group of node _ids for pairing mode. If empty, selection mode is used.
node_propertyLIST/Required. Numeric node properties to form a vector for each node.
degreeCutoffINT0Minimum degree to include a node (0 = no cutoff).
orderSTRING/Sorts results by similarity: asc or desc.
limitINT-1Maximum total results returned (-1 = all).
top_limitINT-1Maximum results per source node in selection mode (-1 = all).

Supports three computation modes:

  • All-pairs: When both ids and ids2 are empty, computes similarity between all node pairs in the graph.
  • Pairing: When both ids and ids2 are specified, computes similarity between each node in ids and each node in ids2.
  • Selection: When only ids is specified (no ids2), computes similarity between each node in ids and all other nodes. Use top_limit to limit results per source node.

Run Mode

Returns:

ColumnTypeDescription
node1STRINGFirst node identifier (_id)
node2STRINGSecond node identifier (_id)
similarityFLOATComputed cosine similarity score

Cosine similarity in pairing mode:

GQL
CALL algo.similarity({
  type: "cosine",
  ids: ["product1"],
  ids2: ["product2", "product3", "product4"],
  node_property: ["price", "weight", "width", "height"]
}) YIELD node1, node2, similarity

Result:

node1node2similarity
product1product20.9865294135291195
product1product30.8788584075196542
product1product40.8168761502672031

Cosine similarity in selection mode (top 1 per source node):

GQL
CALL algo.similarity({
  type: "cosine",
  ids: ["product1", "product3"],
  node_property: ["price", "weight", "width", "height"],
  top_limit: 1
}) YIELD node1, node2, similarity

Result:

node1node2similarity
product1product20.9865294135291195
product3product20.9342165307256634

Stream Mode

Returns the same columns as run mode, streamed for memory efficiency.

GQL
CALL algo.similarity.stream({
  type: "cosine",
  ids: ["product1"],
  node_property: ["price", "weight", "width", "height"],
  order: "desc"
}) YIELD node1, node2, similarity
RETURN node1, node2, similarity

Result:

node1node2similarity
product1product20.9865294135291195
product1product30.8788584075196542
product1product40.8168761502672031

Stats Mode

Returns:

ColumnTypeDescription
pairCountINTNumber of node pairs computed
minSimilarityFLOATMinimum similarity score
maxSimilarityFLOATMaximum similarity score
avgSimilarityFLOATAverage similarity score
GQL
CALL algo.similarity.stats({
  type: "cosine",
  node_property: ["price", "weight", "width", "height"]
}) YIELD pairCount, minSimilarity, maxSimilarity, avgSimilarity

Result:

pairCountminSimilaritymaxSimilarityavgSimilarity
120.81687615026720310.98652941352911950.9047702651283608

Write Mode

Computes results and writes them back to node properties. The write configuration is passed as a second argument map.

Write parameters:

NameTypeDescription
db.propertySTRING or MAPNode property to write results to. String: writes the similarity column in results to a property. Map: explicit column-to-property mapping (e.g., {similarity: 'cos_score'}).

Writable columns:

ColumnTypeDescription
similarityFLOATComputed cosine similarity score

Returns:

ColumnTypeDescription
task_idSTRINGTask identifier for tracking via SHOW TASKS
nodesWrittenINTNumber of nodes with properties written
computeTimeMsINTTime spent computing the algorithm (milliseconds)
writeTimeMsINTTime spent writing properties to storage (milliseconds)
GQL
CALL algo.similarity.write({
  type: "cosine",
  ids: ["product1", "product2"],
  node_property: ["price", "weight", "width", "height"]
}, {
  db: {
    property: "sim_score"
  }
}) YIELD task_id, nodesWritten, computeTimeMs, writeTimeMs