Change Password

Please enter the password.
Please enter the password. Between 8-64 characters. Not identical to your email address. Contain at least 3 of: uppercase, lowercase, numbers, and special characters.
Please enter the password.
Submit

Change Nickname

Current Nickname:
Submit

Apply New License

License Detail

Please complete this required field.

  • Ultipa Graph V4

Standalone

Please complete this required field.

Please complete this required field.

The MAC address of the server you want to deploy.

Please complete this required field.

Please complete this required field.

Cancel
Apply
ID
Product
Status
Cores
Applied Validity Period(days)
Effective Date
Excpired Date
Mac Address
Apply Comment
Review Comment
Close
Profile
  • Full Name:
  • Phone:
  • Company:
  • Company Email:
  • Country:
  • Language:
Change Password
Apply

You have no license application record.

Apply
Certificate Issued at Valid until Serial No. File
Serial No. Valid until File

Not having one? Apply now! >>>

Product Created On ID Amount (USD) Invoice
Product Created On ID Amount (USD) Invoice

No Invoice

Search
    English

      Pearson Correlation Coefficient

      Overview

      The Pearson correlation coefficient measures the linear correlation between two variables. The Pearson correlation coefficient between two nodes in graph is calculated by using N properties of node to form two N-dimensional vectors.

      Basic Concept

      Vector

      Vector is one of the basic concepts in Advanced Mathematics, vectors in low dimensional spaces are relatively easy to understand and express. The following diagram shows the relationship between vectors A, B and coordinate axes in 2- and 3-dimensional spaces respectively, as well as the angle θ between them:

      When comparing two nodes in graph, N properties of node are used to form the two N-dimensional vectors.

      Pearson Correlation Coefficient

      The range of Pearson correlation coefficient values is [-1,1]; let r to denote the Pearson correlation coefficient, then:

      • r > 0 indicates positive correlation, i.e. as one variable becomes larger, the other variable becomes larger;
      • r < 0 indicates negative correlation, i.e. as one variable becomes larger, the other variable becomes smaller;
      • r = 1 or r = -1 indicates that two variables can be described by a linear equation, i.e. them fall on the same line;
      • r = 0 indicates that there is no linear correlation (but may exist some other correlations).

      For two variables X= (x1, x2, ..., xn) and Y = (y1, y2, ..., yn) , Pearson correlation coefficient (r) is defined as the ratio of the covariance of them and the product of their standard deviations:

      Special Case

      Isolated Node, Disconnected Graph

      Theoretically, the calculation of Pearson Correlation Coefficient between two nodes does not depend on the existence of edges in the graph. Regardless of whether the two nodes to be calculated are isolated nodes or whether they are in the same connected component, it does not affect the calculation of their Pearson Correlation Coefficient.

      Self-loop Edge

      The calculation of Pearson Correlation Coefficient has nothing to do with edges.

      Directed Edge

      The calculation of Pearson Correlation Coefficient has nothing to do with edges.

      Command and Configuration

      • Command: algo(similarity)
      • Configurations for the parameter params():
      Name
      Type
      Default
      Specification
      Description
      ids / uuids []_id / []_uuid / Mandatory IDs or UUIDs of the first set of nodes to be calculated
      ids2 / uuids2 []_id / []_uuid / Optional IDs or UUIDs of the second set of nodes to be calculated
      type string cosine jaccard / overlap / cosine / pearson / euclideanDistance / euclidean Measurement of the similarity:
      jaccard: Jaccard Similarity
      overlap: Overlap Similarity
      cosine: Cosine Similarity
      pearson: Pearson Correlation Coefficient
      euclideanDistance: Euclidean Distance
      euclidean: Normalized Euclidean Distance
      node_schema_property []@<schema>?.<property> / Numeric node property; LTE needed; schema can be either carried or not When type is cosine / pearson / euclideanDistance / euclidean, must specify two or more node properties to form the vector; when type is jaccard / overlap, this parameter is invalid
      limit int -1 >=-1 Number of results to return; return all results if sets to -1
      top_limit int -1 >=-1 Only available in the selection mode, limit the length of selection results (top_list) of each node, return the full top_list if sets to -1

      Calculation Mode

      This algorithm has two calculation modes:

      1. Pairing mode: when two sets of valid nodes are configured, pair each node in the first set with each node in the second set (Cartesian product), similarities are calculated for all node pairs.
      2. Selection mode: when only one set (the first) of valid nodes are configured, for each node in the set, calculate its similarities with all other nodes in the graph, return the results if the similarity > 0, order the results the descending similarity.

      Examples

      Example Graph

      The example graph has product1, product2, product3 and product4 (UUIDs are 1, 2, 3 and 4 in order; edges are ignored), product node has properties price, weight, weight and height:

      Task Writeback

      1. File Writeback

      Calculation Mode
      Configuration
      Data in Each Row
      Pairing mode filename node1,node2,similarity
      Selection mode filename node,top_list

      Example: Calculate Pearson correlation coefficient between product UUID = 1 and products UUID = 2,3,4 through properties price, weight, width and height, write the algorithm results back to file

      algo(similarity).params({
        uuids: [1], 
        uuids2: [2,3,4],
        node_schema_property: [price,weight,width,height],
        type: "pearson"
      }).write({
        file:{ 
          filename: "pearson"
        }
      })
      

      Results: File pearson

      product1,product2,0.998785
      product1,product3,0.474384
      product1,product4,0.210494
      

      Example: Calculate Pearson correlation coefficient between products UUID = 1,2,3,4 and all other products in the graph respectively through properties price, weight, width and height, write the algorithm results back to file

      algo(similarity).params({
        uuids: [1,2,3,4],
        node_schema_property: [price,weight,width,height],
        type: "pearson"
      }).write({
        file:{ 
          filename: "list"
        }
      })
      

      Results: File list

      product1,product2:0.998785;product3:0.474384;product4:0.210494;
      product2,product1:0.998785;product3:0.507838;product4:0.253573;
      product3,product2:0.507838;product1:0.474384;product4:0.474021;
      product4,product3:0.474021;product2:0.253573;product1:0.210494;
      

      2. Property Writeback

      Not supported by this algorithm.

      3. Statistics Writeback

      This algorithm has no statistics.

      Direct Return

      Calculation Mode
      Alias Ordinal
      Type Description Column Name
      Pairing mode 0 []perNodePair Node pair and its similarity node1, node2, similarity
      Selection mode 0 []perNode Node and its selection results node, top_list

      Example: Calculate Pearson correlation coefficient between product UUID = 1 and products UUID = 2,3,4 through properties price, weight, width and height, order results in the ascending similarity

      algo(similarity).params({
        uuids: [1], 
        uuids2: [2,3,4],
        node_schema_property: [price,weight,width,height],
        type: "pearson"
      }) as p
      return p order by p.similarity
      

      Results:

      node1 node2 similarity
      1 4 0.210494150169583
      1 3 0.474383803132863
      1 2 0.998785121601255

      Example: Select the product with the highest Pearson correlation coefficient with products UUID = 1,2 respectively through properties price, weight, width and height,

      algo(similarity).params({
        uuids: [1,2],
        type: "pearson",
        node_schema_property: [price,weight,width,height],
        top_limit: 1
      }) as top
      return top
      

      Results:

      node top_list
      1 2:0.998785,
      2 1:0.998785,

      Streaming Return

      Calculation Mode
      Alias Ordinal
      Type Description Column Name
      Pairing mode 0 []perNodePair Node pair and its similarity node1, node2, similarity
      Selection mode 0 []perNode Node and its selection results node, top_list

      Example: Calculate Pearson correlation coefficient between product UUID = 3 and products UUID = 1,2,4 through properties price, weight, width and height, only return results that have similariy above 0.5

      algo(similarity).params({
        uuids: [3], 
        uuids2: [1,2,4],
        node_schema_property: [price,weight,width,height],
        type: "pearson"
      }).stream() as p
      where p.similarity > 0.5
      return p
      

      Results:

      node1 node2 similarity
      3 2 0.50783775659896

      Example: Select the product with the highest Pearson correlation coefficient with products UUID = 1,3 respectively

      algo(similarity).params({
        uuids: [1,3],
        node_schema_property: [price,weight,width,height],
        type: "pearson",
        top_limit: 1
      }).stream() as top
      return top
      

      Results:

      node top_list
      1 2:0.998785,
      3 2:0.507838,

      Real-time Statistics

      This algorithm has no statistics.

      Please complete the following information to download this book
      *
      公司名称不能为空
      *
      公司邮箱必须填写
      *
      你的名字必须填写
      *
      你的电话必须填写