UltipaDocs
Try Playground
  • Introduction
  • Managing HDC Graphs
  • Managing Distributed Projections
  • Installing Algorithms
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Graph Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • K-Hop All
    • Bipartite Graph
    • HyperANF
    • Connected Component
    • Triangle Counting
    • Induced Subgraph
    • k-Core
    • k-Truss
    • p-Cohesion
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Topological Sort
    • Schema Overview
    • Dijkstra's Single-Source Shortest Path
    • Delta-Stepping Single-Source Shortest Path
    • Shortest Path Faster Algorithm (SPFA)
    • Minimum Spanning Tree
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Louvain
    • Leiden
    • Label Propagation
    • HANP
    • k-Means
    • kNN (k-Nearest Neighbors)
    • K-1 Coloring
    • Conductance
      • Random Walk
      • Node2Vec Walk
      • Node2Vec
      • Struc2Vec Walk
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Analytics & Algorithms
  4. /
  5. Similarity

Pearson Correlation Coefficient

HDC

Overview

The Pearson correlation coefficient is the most common way of measuring the strength and direction of the linear relationship between two quantitative variables. In the graph, nodes are quantified by N numeric properties (features) of them.

For two variables X= (x1, x2, ..., xn) and Y = (y1, y2, ..., yn) , Pearson correlation coefficient (r) is defined as the ratio of the covariance of them to the product of their standard deviations:

The Pearson correlation coefficient ranges from -1 to 1:

Pearson correlation coefficient
Correlation type
Interpretation
0 < r ≤ 1Positive correlationAs one variable becomes larger, the other variable becomes larger
r = 0No linear correlation(May exist some other types of correlation)
-1 ≤ r < 0Negative correlationAs one variable becomes larger, the other variable becomes smaller

Considerations

  • Theoretically, the calculation of Pearson correlation coefficient between two nodes is independent of their connectivity.

Example Graph

Run the following statements on an empty graph to define its structure and insert data:

ALTER GRAPH CURRENT_GRAPH ADD NODE {
  product ({price int32, weight int32, width int32, height int32})
};
INSERT (:product {_id:"product1", price:50, weight:160, width:20, height:152}),
       (:product {_id:"product2", price:42, weight:90, width:30, height:90}),
       (:product {_id:"product3", price:24, weight:50, width:55, height:70}),
       (:product {_id:"product4", price:38, weight:20, width:32, height:66});

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as my_hdc_graph:

CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}

Parameters

Algorithm name: similarity

NameTypeSpecDefaultOptionalDescription
ids/uuids_id/_uuid
/
/
YesSpecifies the first group of nodes by their _id or _uuid. If unset, all nodes in the graph are used as the first group of nodes. The algorithm supports two calculation modes:

  • Pairing mode: When both ids/uuids and ids2/uuids2 are set, each node in ids/uuids is paired with each node in ids2/uuids2 (excluding self-pairs), and their pairwise similarities are computed.
  • Selection mode: When only ids/uuids is set, the algorithm computes similarities between each specified node and all other nodes in the graph. Results include all (or a limited number of) nodes with a similarity > 0, sorted in descending order.
ids2/uuids2_id/_uuid
/
/
YesSpecifies the second group of nodes for pairwise similarity by their _id or _uuid. If only ids2/uuids2 is set (and ids/uuids is not), the algorithm returns no result.
typeStringpearsoncosineNoSpecifies the type of similarity to compute; for Pearson Correlation Coefficient, keep it as pearson.
node_schema_property[]"<@schema.?><property>"
/
/
NoSpecifies numeric node properties to form a vector for each node; all specified properties must belong to the same label (schema).
return_id_uuidStringuuid,id,bothuuidYesIncludes _uuid, _id, or both to represent nodes in the results.
orderStringasc,desc
/
YesSorts the results by similarity.
limitInteger≥-1-1YesLimits the number of results returned. Set to -1 to include all results.
top_limitInteger≥-1-1YesLimits the number of results returned for each node specified with ids/uuids in selection mode. Set to -1 to include all results with a similarity greater than 0. This parameter is invalid in pairing mode.

File Writeback

CALL algo.similarity.write("my_hdc_graph", {
  return_id_uuid: "id",
  ids: "product1",
  ids2: ["product2", "product3", "product4"],
  node_schema_property: ["price", "weight", "width", "height"],
  type: "pearson"
}, {
  file: {
    filename: "pearson"
  }
})

Result:

File: pearson
_id1,_id2,similarity
product1,product2,0.998785
product1,product3,0.474384
product1,product4,0.210494

Full Return

CALL algo.similarity.run("my_hdc_graph", {
  return_id_uuid: "id",
  ids: ["product1","product2"], 
  ids2: ["product2","product3","product4"],
  node_schema_property: ["price", "weight", "width", "height"],
  type: "pearson"
}) YIELD p
RETURN p

Result:

_id1_id2similarity
product1product20.998785
product1product30.474384
product1product40.210494
product2product30.507838
product2product40.253573

Stream Return

CALL algo.similarity.stream("my_hdc_graph", {
  return_id_uuid: "id",
  ids: ["product1", "product3"], 
  node_schema_property: ["price", "weight", "width", "height"],
  type: "pearson",
  top_limit: 1    
}) YIELD top
RETURN top

Result:

_id1_id2similarity
product1product20.998785
product3product20.507838