UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Introduction
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Eccentricity Centrality
    • Betweenness Centrality
    • Bridges
    • Articulation Points
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • KNN
    • Vector Similarity
    • Bipartite Graph
    • HyperANF
    • Weakly Connected Components (WCC)
    • Strongly Connected Components (SCC)
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Triangle Count
    • Clique Count
    • k-Core
    • k-Truss
    • p-Cohesion
    • Induced Subgraph
    • Topological Sort
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Dijkstra's Shortest Path
    • A* Shortest Path
    • Yen's K-Shortest Paths
    • Shortest Path (BFS)
    • Delta-Stepping SSSP
    • Shortest Path Faster Algorithm (SPFA)
    • All-Pairs Shortest Path (APSP)
    • Minimum Spanning Tree (MST)
    • K-Spanning Tree
    • Steiner Tree
    • Prize-Collecting Steiner Tree (PCST)
    • Minimum Cost Flow
    • Maximum Flow
    • K-Hop Fast
    • Longest Path (DAG)
    • Random Walk
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Same Community
    • Louvain
    • Leiden
    • Modularity Optimization
    • Label Propagation
    • HANP
    • SLPA
    • k-Means
    • HDBSCAN
    • K-1 Coloring
    • Modularity
    • Conductance
    • Max k-Cut
      • Node2Vec
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Algorithms
  4. /
  5. Similarity

Pearson Correlation Coefficient

Overview

The Pearson correlation coefficient is the most common way of measuring the strength and direction of the linear relationship between two quantitative variables. In the graph, nodes are quantified by N numeric properties (features) of them.

For two variables X = (x1, x2, ..., xn) and Y = (y1, y2, ..., yn) , Pearson correlation coefficient (r) is defined as the ratio of the covariance of them to the product of their standard deviations:

The Pearson correlation coefficient ranges from -1 to 1:

Pearson correlation coefficient
Correlation type
Interpretation
0 < r ≤ 1Positive correlationAs one variable becomes larger, the other variable becomes larger
r = 0No linear correlation(May exist some other types of correlation)
-1 ≤ r < 0Negative correlationAs one variable becomes larger, the other variable becomes smaller

Considerations

  • Theoretically, the calculation of Pearson correlation coefficient between two nodes is independent of their connectivity.

Example Graph

GQL
INSERT (:product {_id:"product1", price:50, weight:160, width:20, height:152}),
       (:product {_id:"product2", price:42, weight:90, width:30, height:90}),
       (:product {_id:"product3", price:24, weight:50, width:55, height:70}),
       (:product {_id:"product4", price:38, weight:20, width:32, height:66})

Parameters

NameTypeDefaultDescription
typeSTRINGjaccardType of similarity to compute: pearson.
idsLIST/First group of node _ids. If empty, all nodes are used.
ids2LIST/Second group of node _ids for pairing mode. If empty, selection mode is used.
node_propertyLIST/Required. Numeric node properties to form a vector for each node.
degreeCutoffINT0Minimum degree to include a node (0 = no cutoff).
orderSTRING/Sorts results by similarity: asc or desc.
limitINT-1Maximum total results returned (-1 = all).
top_limitINT-1Maximum results per source node in selection mode (-1 = all).

Supports three computation modes:

  • All-pairs: When both ids and ids2 are empty, computes similarity between all node pairs in the graph.
  • Pairing: When both ids and ids2 are specified, computes similarity between each node in ids and each node in ids2.
  • Selection: When only ids is specified (no ids2), computes similarity between each node in ids and all other nodes. Use top_limit to limit results per source node.

Run Mode

GQL
CALL algo.similarity({
  type: "pearson",
  ids: ["product1", "product2"],
  ids2: ["product2", "product3", "product4"],
  node_property: ["price", "weight", "width", "height"],
  order: "desc"
}) YIELD node1, node2, similarity

Result:

node1node2similarity
product1product20.9987851216012547
product2product30.5078377565989604
product1product30.4743838031328631
product2product40.25357307126950623
product1product40.21049415016958328

Stream Mode

GQL
CALL algo.similarity.stream({
  type: "pearson",
  ids: ["product1", "product3"],
  node_property: ["price", "weight", "width", "height"],
  top_limit: 1,
  order: "desc"
}) YIELD node1, node2, similarity
RETURN node1, node2, similarity
node1node2similarity
product1product20.9987851216012547
product3product20.5078377565989604

Stats Mode

Returns:

ColumnTypeDescription
pairCountINTNumber of node pairs computed
minSimilarityFLOATMinimum similarity score
maxSimilarityFLOATMaximum similarity score
avgSimilarityFLOATAverage similarity score
GQL
CALL algo.similarity.stats({
  type: "pearson",
  node_property: ["price", "weight", "width", "height"]
}) YIELD pairCount, minSimilarity, maxSimilarity, avgSimilarity

Result:

pairCountminSimilaritymaxSimilarityavgSimilarity
120.210494150169583280.99878512160125470.4865158473633962

Write Mode

Computes results and writes them back to node properties.

Write parameters:

NameTypeDescription
db.propertySTRING or MAPNode property to write results to.

Returns:

ColumnTypeDescription
task_idSTRINGTask identifier for tracking via SHOW TASKS
nodesWrittenINTNumber of nodes with properties written
computeTimeMsINTTime spent computing the algorithm (milliseconds)
writeTimeMsINTTime spent writing properties to storage (milliseconds)
GQL
CALL algo.similarity.write({
  type: "pearson",
  ids: ["product1"],
  node_property: ["price", "weight", "width", "height"]
}, {
  db: {
    property: "sim_score"
  }
}) YIELD task_id, nodesWritten, computeTimeMs, writeTimeMs