UltipaDocs
Try Playground
  • Introduction
  • Managing HDC Graphs
  • Managing Distributed Projections
  • Installing Algorithms
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Graph Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • K-Hop All
    • Bipartite Graph
    • HyperANF
    • Connected Component
    • Triangle Counting
    • Induced Subgraph
    • k-Core
    • k-Truss
    • p-Cohesion
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Topological Sort
    • Schema Overview
    • Dijkstra's Single-Source Shortest Path
    • Delta-Stepping Single-Source Shortest Path
    • Shortest Path Faster Algorithm (SPFA)
    • Minimum Spanning Tree
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Louvain
    • Leiden
    • Label Propagation
    • HANP
    • k-Means
    • kNN (k-Nearest Neighbors)
    • K-1 Coloring
    • Conductance
      • Random Walk
      • Node2Vec Walk
      • Node2Vec
      • Struc2Vec Walk
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Analytics & Algorithms
  4. /
  5. Similarity

Cosine Similarity

HDC

Overview

In cosine similarity, data objects in a dataset are treated as vectors, and it uses the cosine value of the angle between two vectors to indicate the similarity between them. In the graph, N numeric node properties (features) are specified to form N-dimensional vectors; two nodes are considered similar if their vectors are similar.

Cosine similarity ranges from -1 to 1, where 1 indicates that the two vectors point in the same direction, and -1 indicates they point in opposite directions.

In 2-dimensional space, the cosine similarity between vectors A(a1, a2) and B(b1, b2) is computed as:

In 3-dimensional space, the cosine similarity between vectors A(a1, a2, a3) and B(b1, b2, b3) is computed as:

The following diagram shows the relationship between vectors A and B in 2D and 3D spaces, as well as the angle θ between them:

Generalized to N-dimensional space, cosine similarity is computed as:

Considerations

  • Theoretically, the calculation of cosine similarity between two nodes is independent of their connectivity in the graph.
  • The value of cosine similarity is independent of the length of the vectors, but only the direction of the vectors.

Example Graph

Run the following statements on an empty graph to define its structure and insert data:

ALTER GRAPH CURRENT_GRAPH ADD NODE {
  product ({price int32, weight int32, width int32, height int32})
};
INSERT (:product {_id:"product1", price:50, weight:160, width:20, height:152}),
       (:product {_id:"product2", price:42, weight:90, width:30, height:90}),
       (:product {_id:"product3", price:24, weight:50, width:55, height:70}),
       (:product {_id:"product4", price:38, weight:20, width:32, height:66});

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as my_hdc_graph:

CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}

Parameters

Algorithm name: similarity

NameTypeSpecDefaultOptionalDescription
ids/uuids_id/_uuid
/
/
YesSpecifies the first group of nodes by their _id or _uuid. If unset, all nodes in the graph are used as the first group of nodes. The algorithm supports two calculation modes:

  • Pairing mode: When both ids/uuids and ids2/uuids2 are set, each node in ids/uuids is paired with each node in ids2/uuids2 (excluding self-pairs), and their pairwise similarities are computed.
  • Selection mode: When only ids/uuids is set, the algorithm computes similarities between each specified node and all other nodes in the graph. Results include all (or a limited number of) nodes with a similarity > 0, sorted in descending order.
ids2/uuids2_id/_uuid
/
/
YesSpecifies the second group of nodes for pairwise similarity by their _id or _uuid. If only ids2/uuids2 is set (and ids/uuids is not), the algorithm returns no result.
typeStringcosinecosineNoSpecifies the type of similarity to compute; for Cosine Similarity, keep it as cosine.
node_schema_property[]"<@schema.?><property>"
/
/
NoSpecifies numeric node properties to form a vector for each node; all specified properties must belong to the same label (schema).
return_id_uuidStringuuid,id,bothuuidYesIncludes _uuid, _id, or both to represent nodes in the results.
orderStringasc,desc
/
YesSorts the results by similarity.
limitInteger≥-1-1YesLimits the number of results returned. Set to -1 to include all results.
top_limitInteger≥-1-1YesLimits the number of results returned for each node specified with ids/uuids in selection mode. Set to -1 to include all results with a similarity greater than 0. This parameter is invalid in pairing mode.

File Writeback

CALL algo.similarity.write("my_hdc_graph", {
  return_id_uuid: "id",
  ids: "product1",
  ids2: ["product2", "product3", "product4"],
  node_schema_property: ["price", "weight", "width", "height"],
  type: "cosine"
}, {
  file: {
    filename: "cosine"
  }
})

Result:

File: cosine
_id1,_id2,similarity
product1,product2,0.986529
product1,product3,0.878858
product1,product4,0.816876

Full Return

CALL algo.similarity.run("my_hdc_graph", {
  return_id_uuid: "id",
  ids: ["product1","product2"], 
  ids2: ["product2","product3","product4"],
  node_schema_property: ["price", "weight", "width", "height"],
  type: "cosine"
}) YIELD cs
RETURN cs

Result:

_id1_id2similarity
product1product20.986529
product1product30.878858
product1product40.816876
product2product30.934217
product2product40.881988

Stream Return

CALL algo.similarity.stream("my_hdc_graph", {
  return_id_uuid: "id",
  ids: ["product1", "product3"], 
  node_schema_property: ["price", "weight", "width", "height"],
  type: "cosine",
  top_limit: 1    
}) YIELD top
RETURN top

Result:

_id1_id2similarity
product1product20.986529
product3product20.934217