# Change Nickname

Current Nickname:

• Ultipa Graph V4

Standalone

The MAC address of the server you want to deploy.

Cancel
Apply
 ID Product Status Cores Applied Validity Period(days) Effective Date Excpired Date Mac Address Apply Comment Review Comment
Close
Profile
• Full Name:
• Phone:
• Company:
• Company Email:
• Country:
• Language:
Apply

You have no license application record.

Apply
Certificate Issued at Valid until Serial No. File
Serial No. Valid until

Not having one? Apply now! >>>

Product Created On ID Amount (USD) Invoice
Product Created On ID Amount (USD) Invoice

No Invoice

# Cosine Similarity

## Overview

Cosine similarity uses the cosine value of the angle formed by two N-dimensional vectors in vector space to indicate the similarity between them. Cosine similarity between two nodes in graph is calculated by using N properties of node to form two N-dimensional vectors.

The range of cosine similarity values is [0,1]; the larger the value, the more similar the two nodes are.

## Basic Concept

### Vector

Vector is one of the basic concepts in Advanced Mathematics, vectors in low dimensional spaces are relatively easy to understand and express. The following diagram shows the relationship between vectors A, B and coordinate axes in 2- and 3-dimensional spaces respectively, as well as the angle `θ` between them:

When comparing two nodes in graph, N properties of node are used to form the two N-dimensional vectors.

### Cosine Similarity

In 2-dimensional space, the formula to calculate the cosine similarity is:

In 3-dimensional space, the formula to calculate the cosine similarity is:

Generalize to n-dimensional space, the formula to calculate the cosine similarity is:

## Special Case

### Isolated Node, Disconnected Graph

Theoretically, the calculation of cosine similarity between two nodes does not depend on the existence of edges in the graph. Regardless of whether the two nodes to be calculated are isolated nodes or whether they are in the same connected component, it does not affect the calculation of their cosine similarity.

### Self-loop Edge

The calculation of cosine similarity has nothing to do with edges.

### Directed Edge

The calculation of cosine similarity has nothing to do with edges.

## Command and Configuration

• Command: `algo(similarity)`
• Configurations for the parameter `params()`:
Name
Type
Default
Specification
Description
ids / uuids []`_id` / []`_uuid` / Mandatory IDs or UUIDs of the first set of nodes to be calculated
ids2 / uuids2 []`_id` / []`_uuid` / Optional IDs or UUIDs of the second set of nodes to be calculated
type string cosine jaccard / overlap / cosine / pearson / euclideanDistance / euclidean Measurement of the similarity:
jaccard: Jaccard Similarity
overlap: Overlap Similarity
cosine: Cosine Similarity
pearson: Pearson Correlation Coefficient
euclideanDistance: Euclidean Distance
euclidean: Normalized Euclidean Distance
node_schema_property []`@<schema>?.<property>` / Numeric node property; LTE needed; schema can be either carried or not When `type` is cosine / pearson / euclideanDistance / euclidean, must specify two or more node properties to form the vector; when `type` is jaccard / overlap, this parameter is invalid
limit int -1 >=-1 Number of results to return; return all results if sets to -1
top_limit int -1 >=-1 Only available in the selection mode, limit the length of selection results (`top_list`) of each node, return the full `top_list` if sets to -1

## Calculation Mode

This algorithm has two calculation modes:

1. Pairing mode: when two sets of valid nodes are configured, pair each node in the first set with each node in the second set (Cartesian product), similarities are calculated for all node pairs.
2. Selection mode: when only one set (the first) of valid nodes are configured, for each node in the set, calculate its similarities with all other nodes in the graph, return the results if the similarity > 0, order the results the descending similarity.

## Examples

### Example Graph

The example graph has product1, product2, product3 and product4 (UUIDs are 1, 2, 3 and 4 in order; edges are ignored), product node has properties price, weight, weight and height:

#### 1. File Writeback

Calculation Mode
Configuration
Data in Each Row
Pairing mode filename `node1`,`node2`,`similarity`
Selection mode filename `node`,`top_list`

Example: Calculate cosine similarity between product UUID = 1 and products UUID = 2,3,4 through properties price, weight, width and height, write the algorithm results back to file

``````algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: [price,weight,width,height]
}).write({
file:{
filename: "cs_result"
}
})
``````

Results: File cs_result

``````product1,product2,0.986529
product1,product3,0.878858
product1,product4,0.816876
``````

Example: Calculate cosine similarity between products UUID = 1,2,3,4 and all other products in the graph respectively through properties price, weight, width and height, write the algorithm results back to file

``````algo(similarity).params({
uuids: [1,2,3,4],
node_schema_property: [price,weight,width,height],
type: "cosine"
}).write({
file:{
filename: "list"
}
})
``````

Results: File list

``````product1,product2:0.986529;product3:0.878858;product4:0.816876;
product2,product1:0.986529;product3:0.934217;product4:0.881988;
product3,product2:0.934217;product4:0.930153;product1:0.878858;
product4,product3:0.930153;product2:0.881988;product1:0.816876;
``````

#### 2. Property Writeback

Not supported by this algorithm.

#### 3. Statistics Writeback

This algorithm has no statistics.

### Direct Return

Calculation Mode
Alias Ordinal
Type Description Column Name
Pairing mode 0 []perNodePair Node pair and its similarity `node1`, `node2`, `similarity`
Selection mode 0 []perNode Node and its selection results `node`, `top_list`

Example: Calculate cosine similarity between product UUID = 1 and products UUID = 2,3,4 through properties price, weight, width and height, order results in the ascending similarity

``````algo(similarity).params({
uuids: [1],
uuids2: [2,3,4],
node_schema_property: [price,weight,width,height],
type: "cosine"
}) as cs
return cs order by cs.similarity
``````

Results:

node1 node2 similarity
1 4 0.816876150267203
1 3 0.878858407519654
1 2 0.986529413529119

Example: Select the product with the highest cosine similarity with products UUID = 1,2 respectively through properties price, weight, width and height,

``````algo(similarity).params({
uuids: [1,2],
type: "cosine",
node_schema_property: [price,weight,width,height],
top_limit: 1
}) as top
``````

Results:

node top_list
1 2:0.986529,
2 1:0.986529,

### Streaming Return

Calculation Mode
Alias Ordinal
Type Description Column Name
Pairing mode 0 []perNodePair Node pair and its similarity `node1`, `node2`, `similarity`
Selection mode 0 []perNode Node and its selection results `node`, `top_list`

Example: Calculate cosine similarity between product UUID = 3 and products UUID = 1,2,4 through properties price, weight, width and height, only return results that have similariy above 0.9

``````algo(similarity).params({
uuids: [3],
uuids2: [1,2,4],
node_schema_property: [price,weight,width,height],
type: "cosine"
}).stream() as cs
where cs.similarity > 0.9
return cs
``````

Results:

node1 node2 similarity
3 2 0.934216530725663
3 4 0.930152895706265

Example: Select the product with the highest cosine similarity with products UUID = 1,3 respectively

``````algo(similarity).params({
uuids: [1,3],
node_schema_property: [price,weight,width,height],
type: "cosine",
top_limit: 1
}).stream() as top