UltipaDocs
Try Playground
  • Introduction
  • Managing HDC Graphs
  • Managing Distributed Projections
  • Installing Algorithms
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Graph Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • K-Hop All
    • Bipartite Graph
    • HyperANF
    • Connected Component
    • Triangle Counting
    • Induced Subgraph
    • k-Core
    • k-Truss
    • p-Cohesion
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Topological Sort
    • Schema Overview
    • Dijkstra's Single-Source Shortest Path
    • Delta-Stepping Single-Source Shortest Path
    • Shortest Path Faster Algorithm (SPFA)
    • Minimum Spanning Tree
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Louvain
    • Leiden
    • Label Propagation
    • HANP
    • k-Means
    • kNN (k-Nearest Neighbors)
    • K-1 Coloring
    • Conductance
      • Random Walk
      • Node2Vec Walk
      • Node2Vec
      • Struc2Vec Walk
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Analytics & Algorithms
  4. /
  5. Similarity

Euclidean Distance

HDC

Overview

In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. In the graph, N numeric node properties (features) are specified to represent each node's position in an N-dimensional Euclidean space.

Concepts

Euclidean Distance

In 2-dimensional space, the formula to compute the Euclidean distance between points A(x1, y1) and B(x2, y2) is:

In 3-dimensional space, the formula to compute the Euclidean distance between points A(x1, y1, z1) and B(x2, y2, z2) is:

Generalized to N-dimensional space, the formula to compute the Euclidean distance is:

where xi1 represents the i-th dimensional coordinates of the first point, and xi2 represents the i-th dimensional coordinates of the second point.

Euclidean distance ranges from 0 to +∞; smaller values indicate greater similarity between the two nodes.

Normalized Euclidean Distance

Normalized Euclidean distance scales the Euclidean distance into range from 0 to 1; the closer to 1, the more similar the two nodes.

Ultipa adopts the following formula to normalize the Euclidean distance:

Considerations

  • Theoretically, the calculation of Euclidean distance between two nodes is independent of their connectivity.

Example Graph

Run the following statements on an empty graph to define its structure and insert data:

ALTER GRAPH CURRENT_GRAPH ADD NODE {
  product ({price int32, weight int32, width int32, height int32})
};
INSERT (:product {_id:"product1", price:50, weight:160, width:20, height:152}),
       (:product {_id:"product2", price:42, weight:90, width:30, height:90}),
       (:product {_id:"product3", price:24, weight:50, width:55, height:70}),
       (:product {_id:"product4", price:38, weight:20, width:32, height:66});

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as my_hdc_graph:

CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}

Parameters

Algorithm name: similarity

NameTypeSpecDefaultOptionalDescription
ids/uuids_id/_uuid
/
/
YesSpecifies the first group of nodes by their _id or _uuid. If unset, all nodes in the graph are used as the first group of nodes. The algorithm supports two calculation modes:

  • Pairing mode: When both ids/uuids and ids2/uuids2 are set, each node in ids/uuids is paired with each node in ids2/uuids2 (excluding self-pairs), and their pairwise similarities are computed.
  • Selection mode: When only ids/uuids is set, the algorithm computes similarities between each specified node and all other nodes in the graph. Results include all (or a limited number of) nodes with a similarity > 0, sorted in descending order.
ids2/uuids2_id/_uuid
/
/
YesSpecifies the second group of nodes for pairwise similarity by their _id or _uuid. If only ids2/uuids2 is set (and ids/uuids is not), the algorithm returns no result.
typeStringeuclideanDistance,euclideancosineNoSpecifies the type of similarity to compute; use euclideanDistance to compute Euclidean Distance, and use euclidean to compute Normalized Euclidean Distance.
node_schema_property[]"<@schema.?><property>"
/
/
NoSpecifies numeric node properties to form a vector for each node; all specified properties must belong to the same label (schema).
return_id_uuidStringuuid,id,bothuuidYesIncludes _uuid, _id, or both to represent nodes in the results.
orderStringasc,desc
/
YesSorts the results by similarity.
limitInteger≥-1-1YesLimits the number of results returned. Set to -1 to include all results.
top_limitInteger≥-1-1YesLimits the number of results returned for each node specified with ids/uuids in selection mode. Set to -1 to include all results with a similarity greater than 0. This parameter is invalid in pairing mode.

File Writeback

Computes similarities in pairing mode:

CALL algo.similarity.write("my_hdc_graph", {
  return_id_uuid: "id",
  ids: "product1",
  ids2: ["product2", "product3", "product4"],
  node_schema_property: ["price", "weight", "width", "height"],
  type: "euclideanDistance"
}, {
  file: {
    filename: "euclideanDistance"
  }
})

Result:

File: euclideanDistance
_id1,_id2,similarity
product1,product2,94.3822
product1,product3,143.962
product1,product4,165.179

Full Return

CALL algo.similarity.run("my_hdc_graph", {
  return_id_uuid: "id",
  ids: ["product1","product2"], 
  ids2: ["product2","product3","product4"],
  node_schema_property: ["price", "weight", "width", "height"],
  type: "euclideanDistance"
}) YIELD distance
RETURN distance

Result:

_id1_id2similarity
product1product294.382202
product1product3143.961807
product1product4165.178696
product2product354.304695
product2product474.135010

Stream Return

CALL algo.similarity.stream("my_hdc_graph", {
  return_id_uuid: "id",
  ids: ["product1", "product3"], 
  node_schema_property: ["price", "weight", "width", "height"],
  type: "euclideanDistance",
  top_limit: 1
}) YIELD top
RETURN top

Result:

_id1_id2similarity
product1product4165.178696
product3product1143.961807