UltipaDocs
Try Playground
  • Introduction
    • Show Algorithms
    • Install and Uninstall
    • Run Algorithms
    • Algorithm Results and Statistics
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Graph Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
    • CELF
    • PageRank
    • ArticleRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • K-Hop All
    • Bipartite Graph
    • HyperANF
    • Connected Component
    • Triangle Counting
    • Induced Subgraph
    • k-Core
    • k-Truss
    • p-Cohesion
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Topological Sort
    • Schema Overview
    • Dijkstra's Single-Source Shortest Path
    • Delta-Stepping Single-Source Shortest Path
    • Shortest Path Faster Algorithm (SPFA)
    • Minimum Spanning Tree
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Louvain
    • Leiden
    • Label Propagation
    • HANP
    • k-Means
    • kNN (k-Nearest Neighbors)
    • K-1 Coloring
    • Conductance
      • Random Walk
      • Node2Vec Walk
      • Node2Vec
      • Struc2Vec Walk
      • Struc2Vec
      • GraphSAGE
      • GraphSAGE Train
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Analytics & Algorithms
  4. /
  5. Similarity

Pearson Correlation Coefficient

✓ File Writeback ✕ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats

Overview

The Pearson correlation coefficient is the most common way of measuring the strength and direction of the linear relationship between two quantitative variables. In the graph, nodes are quantified by N numeric properties (features) of them.

For two variables X= (x1, x2, ..., xn) and Y = (y1, y2, ..., yn) , Pearson correlation coefficient (r) is defined as the ratio of the covariance of them and the product of their standard deviations:

The Pearson correlation coefficient ranges from -1 to 1:

Pearson correlation coefficient
Correlation type
Interpretation
0 < r ≤ 1Positive correlationAs one variable becomes larger, the other variable becomes larger
r = 0No linear correlation(May exist some other types of correlation)
-1 ≤ r < 0Negative correlationAs one variable becomes larger, the other variable becomes smaller

Considerations

  • Theoretically, the calculation of Pearson correlation coefficient between two nodes does not depend on their connectivity.

Syntax

  • Command: algo(similarity)
  • Parameters:
Name
Type
Spec
Default
Optional
Description
ids / uuids[]_id / []_uuid//NoID/UUID of the first group of nodes to calculate
ids2 / uuids2[]_id / []_uuid//YesID/UUID of the second group of nodes to calculate
typestringpearsoncosineNoType of similarity; for Pearson Correlation Coefficient, keep it as pearson
node_schema_property[]@<schema>?.<property>Numeric type, must LTE/NoSpecify two or more node properties to form the vectors, all properties must belong to the same (one) schema
limitint≥-1-1YesNumber of results to return, -1 to return all results
top_limitint≥-1-1YesIn the selection mode, limit the maximum number of results returned for each node specified in ids/uuids, -1 to return all results with similarity > 0; in the pairing mode, this parameter is invalid

The algorithm has two calculation modes:

  1. Pairing: when both ids/uuids and ids2/uuids2 are configured, pairing each node in ids/uuids with each node in ids2/uuids2 (ignore the same node) and computing pair-wise similarities.
  2. Selection: when only ids/uuids is configured, for each target node in it, computing pair-wise similarities between it and all other nodes in the graph. The returned results include all or limited number of nodes that have similarity > 0 with the target node and is ordered by the descending similarity.

Examples

The example graph has 4 products (edges are ignored), each product has properties price, weight, weight and height:

File Writeback

SpecContent
filenamenode1,node2,similarity
UQL
algo(similarity).params({
  uuids: [1], 
  uuids2: [2,3,4],
  node_schema_property: ['price', 'weight', 'width', 'height'],
  type: 'pearson'
}).write({
  file:{ 
    filename: 'pearson'
  }
})

Results: File pearson

File
product1,product2,0.998785
product1,product3,0.474384
product1,product4,0.210494
UQL
algo(similarity).params({
  uuids: [1,2,3,4],
  node_schema_property: ['price', 'weight', 'width', 'height'],
  type: 'pearson'
}).write({
  file:{ 
    filename: 'list'
  }
})

Results: File list

File
product1,product2,0.998785
product1,product3,0.474384
product1,product4,0.210494
product2,product1,0.998785
product2,product3,0.507838
product2,product4,0.253573
product3,product2,0.507838
product3,product1,0.474384
product3,product4,0.474021
product4,product3,0.474021
product4,product2,0.253573
product4,product1,0.210494

Direct Return

Alias Ordinal
Type
DescriptionColumns
0[]perNodePairNode pair and its similaritynode1, node2, similarity
UQL
algo(similarity).params({
  uuids: [1,2], 
  uuids2: [2,3,4],
  node_schema_property: ['price', 'weight', 'width', 'height'],
  type: 'pearson'
}) as p
return p

Results: p

node1node2similarity
120.998785121601255
130.474383803132863
140.210494150169583
230.50783775659896
240.253573071269506
UQL
algo(similarity).params({
  uuids: [1,2],
  type: 'pearson',
  node_schema_property: ['price', 'weight', 'width', 'height'],
  top_limit: 1
}) as top
return top

Results: top

node1node2similarity
120.998785121601255
210.998785121601255

Stream Return

Alias Ordinal
Type
DescriptionColumns
0[]perNodePairNode pair and its similaritynode1, node2, similarity
UQL
algo(similarity).params({
  uuids: [3], 
  uuids2: [1,2,4],
  node_schema_property: ['@product.price', '@product.weight', '@product.width'],
  type: 'pearson'
}).stream() as p
where p.similarity > 0
return p

Results: p

node1node2similarity
310.167101674410905
320.181677473801374
UQL
algo(similarity).params({
  uuids: [1,3],
  node_schema_property: ['price', 'weight', 'width', 'height'],
  type: 'pearson',
  top_limit: 1
}).stream() as top
return top

Results: top

node1node2similarity
120.998785121601255
320.50783775659896