  # Change Nickname

Current Nickname:
Search
v2.x
v2.x

# Algorithm

Ultipa offers an ever-growing rich set of algorithms, including various centrality, ranking, similarity, community-detection (such as LPA, Louvain, etc.), graph embedding (random walks, Node2Vec, Struc2Vec etc.) and graph traversal algorithms. This chapter will explain the details of invoking each algorithm.

• To list currently active algorithms:

``````from ultipa import Connection,ULTIPA_REQUEST
ret = conn.listAlgo()
print(ret.toJSON())
``````

## 10.1 Degree Calculation

In Ultipa Graph, all per-node degree operations are conducted in a pure real-time fashion. Only whole-graph degree operations are invoked as tasks (asynchronously) due to computational complexity, especially on large graphs.

`[Parameter]` of degree calculations:

Name Data Type Specification Description
node_id int Ultipa ID (Optional) To calculate the degree of a specific node (per-node degree)
edge_property_name string edge property (numeric type) (Optional) To calculate the degree of node using the specified property as the weight factor of edge

### Out Degree

Out Degree is a per-node degree operation that calculates the total number of edges pointing away from a node, or the sum of a certain edge property.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_out_degree(ALGO_REQUEST.Out_Degree(node_id=1))
print(ret.toJSON())
``````

### Out Degree All

Out Degree All is a whole-graph degree to calculate graph-wide out degrees by ignoring the parameter `node_id`, and writes back the calculation of each node to property `#out_degree_all`.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_out_degree_all(ALGO_REQUEST.Out_Degree_All())
print(ret.toJSON())
``````

### In Degree

In Degree is comparable to Out Degree, but with calculations done against the inbound edges (left pointing) of node.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_in_degree(ALGO_REQUEST.In_Degree(node_id=1))
print(ret.toJSON())
``````

### In Degree All

In Degree All calculates the graph-wide in degrees by ignoring the parameter `node_id`, and writes back the calculation of each node to property `#in_degree_all`.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_in_degree_all(ALGO_REQUEST.In_Degree_All())
print(ret.toJSON())
``````

### Degree

Degree is the summed total of in_degree + out_degree of a specific node.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_degree(ALGO_REQUEST.Degree(node_id=1))
print(ret.toJSON())
``````

### Degree All

Degree All is the summed total of in_degree + out_degree that applied to all the nodes, and writes back the calculation of each node to property `#degree_all`.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_degree_all(ALGO_REQUEST.Degree_All())
print(ret.toJSON())
``````

## 10.2 Centrality Calculation

`[Parameter]` of centrality calculations:

Name Data Type Specification Description
node_id int Ultipa ID (Optional) To calculate the centrality of a specific node, not to be used for betweenness calculation

### Closeness

Closeness of a node is a centrality that calculates the reciprocal of sum of distance (lengths of the shortest path) from all other nodes to the subject node.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_closeness(ALGO_REQUEST.Closeness(node_id=1))
print(ret.toJSON())
``````

### Out Closeness

Out Closeness is an out-degree closeness of which the calculation envolves the shortest-paths formed by out-edges only.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_out_closeness(ALGO_REQUEST.Out_Closeness(node_id=1))
print(ret.toJSON())
``````

### In Closeness

In Closeness, in contrast to Out Closeness, is an in-degree closeness of which the calculation envolves the shortest-paths formed by in-edges only.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_in_closeness(ALGO_REQUEST.In_Closeness(node_id=1))
print(ret.toJSON())
``````

### Graph Centrality

Graph Centrality in Ultipa Graph is calculated by finding out the longest shortest-path from the subject node to all other nodes within the connected component first, and take a multiplicative inverse (reciprocal) of the length of that longest shortest-path, the ending result (a floating value) indicates the level of centrality of the subject node.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_graph_centrality(ALGO_REQUEST.Graph_Centrality(node_id=1))
print(ret.toJSON())
``````

### Betweenness Centrality

Betweenness Centrality represents the degree to which nodes stand between each other. It calculates the graph-wide shortest paths passing the subject node, hence consumes lots of computational resources and is very time-consuming in highly connected graphsets. It writes back the calculation results to the database as a node property `#betweenness_centrality`.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_betweenness_centrality(ALGO_REQUEST.Betweenness_Centrality())
print(ret.toJSON())
``````

## 10.3 General Graph Algorithm

### Graph-Wide K-Hop

Whole-graph K-Hop calculates the number of K-Hop neighbors against all nocdes of the entire graphset, and writes back the results into node property `#khop_k`. For example, a graph-wide 3-hop calculation will generate a node property `#khop_3`.

`[Parameter]` of graph-wide K-Hop calculations:

Name Data Type Specification Description
depth int >0 A reasonable value, e.g., starting from 1 and grow incrementally
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_k_hop(ALGO_REQUEST.Khop(depth=3))
print(ret.toJSON())
``````

### Connected Component

A connected component is a maximal subset of interconnected nodes via edges of a graph. This algorithm calculates the number of connected components of a graph.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_connected_component(ALGO_REQUEST.Connected_Component())
print(ret.toJSON())
``````

### Triangle Counting

Triangle counting can be used to measure the cohesiveness of communities within social networks and the stability of a graph, and measure how closely connected are the accounts in the banking system.

``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_triangle_counting(ALGO_REQUEST.Triangle_Counting())
print(ret.toJSON())
``````

### Common Neighbours

This algorithm calculates the shared (common) neighbors between two nodes.

`[Parameter]` of common neigobours calculations:

Name Data Type Specification Description
node_id1 int >0 /
node_id1 int >0 /
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_common_neighbours(ALGO_REQUEST.Common_neighbours(node_id1=12,node_id2=21))
print(ret.toJSON())
``````

### Subgraph

This algorithm calculates the subgraph formed by the specified nodes.

`[Parameter]` of subgraph calculations:

Name Data Type Specification Description
`<node_ids>` []int Ultipa ID The nodes to form subgraph
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_subgraph(ALGO_REQUEST.Subgraph(node_ids=[12,21,1,123,33445,544]))
print(ret.toJSON())
``````

### hyperANF

The algorithm hyperANF is short for Approximating the Neighbourhood Function of very large graphs on a budget, which is invented by Italian scientists Paolo Boldi, Marco Rosa, Sebastiano Vigna in their 2011 paper.

`[Parameter]` of hyperANF calculations:

Name Data Type Specification Description
loop_num int >0 The number of loops or iterations
register_num int [4, 30] default: 10
``````from ultipa import Connection,ALGO_REQUEST
print(ret.toJSON())
``````

### kNN

The kNN algorithm is short for k-nearest neighbors which is a simple, supervised machine learning algorithm that can be used to solve both classification and regression problems. KNN works by finding the distances between a query against the subject node and all the examples in the data, selecting the specified number of nodes (K) closest to the queried node, then votes for the most frequent label (in the case of classification) or averages the labels (in the case of regression).

`[Parameter]` of KNN algorithm:

Name Data Type Specification Description
node_id int Ultipa ID The node to calculate
node_property_names []string node property A list of node properties that involve in the similarity computation
top_k int >0 The number of similar nodes to be used for calculation
target_property_name string node property The node property to be calculated
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_knn(ALGO_REQUEST.Knn(node_id=12,node_property_names=['name','age'],
top=50,target_property_name='name'))
print(ret.toJSON())
``````

### k-Core

A k-Core of a graph G is a maximal connected subgraph of G in which all vertices have a degree of at least k.

`[Parameter]` of k-Core algorithm:

Name Data Type Specification Description
k int / The minimum number of degrees of nodes in the resulting subgraphs
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_k_core(ALGO_REQUEST.Kcore(k=10))
print(ret.toJSON())
``````

### MST

MST, the short for Minimum Spanning Tree, connects all nodes in the graph to form a sub-graph that has minimum aggregated weight.

`[Parameter]` of MST algorithm:

Name Data Type Specification Description
start_node_id int Ultipa ID The startig node
edge_property_name string / The edge property to use. Default is 1 if unprovided
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_mst(ALGO_REQUEST.Mst(start_node_id=12,edge_property_name='rank'))
print(ret.toJSON())
``````

### k-Means

K-means clustering is a method of vector quantization, that partitions n observations into k clusters with each observation belong to the cluster with the nearest mean (cluster’s geometric center). K-means clustering tends to find comparable spatial extent. There are two distance types supported by Ultipa’s K-Means Algorithm: Euclidean distance or Cosine Similarity (Cosine Distance).

`[Parameter]` of k-Means algorithm:

Name Data Type Specification Description
k int >0 The number of clusters
start_ids []int Ultipa ID Appoint k nodes as the starting nodes. If empty, k start ids will be picked automatically
node_property_names []string node property A list of node properties that involve in the calculation
distance_type int 1; 2 1: Euclidean distance
2: Cosine Similarity (distance)
loop_num int >0 The number of loops or iterations, keep it small if possible
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_k_means(ALGO_REQUEST.K_Means(start_ids=[12,21,2,23,1],
k=2,node_property_names=['name','age'],distance_type=0,loop_num=3))
print(ret.toJSON())
``````

### Clustering Coefficient

Clustering Coefficient is a measure of the degree to which nodes in a graph tend to cluster together.

`[Parameter]` of Clustering Coefficient algorithm:

Name Data Type Specification Description
node_id int Ultipa ID The node to calculate
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_clustering_coefficient(ALGO_REQUEST.Clustering_Coefficient(node_id=12))
print(ret.toJSON())
``````

## 10.5 Community Detection & Propagation Algorithm

### Page Rank

Page Rank works by counting the number and quality of links to a page to determine a rough estimate of the importance of a website. Page Rank is usually run against the entire graph and it's invoked as a task.

`[Parameter]` of Page Rank algorithm:

Name Data Type Specification Description
loop_num int >0 The number of loops or iterations
damping float (0, 1) Damping factor
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_page_rank(ALGO_REQUEST.Page_Rank(loop_num=3,damping=0.5))
print(ret.toJSON())

``````

### Sybil Rank

Sybil stands for fake and malicious accounts of OSN (Online Social Networks), which tend to have limited social connections to real users(accounts or nodes). Sybil Rank algorithnm is invented to identify these Sybil accounts.

`[Parameter]` of Sybil Rank algorithm:

Name Data Type Specification Description
loop_num int >0 The number of loops or iterations
sybil_num int >0 The minimal number of sybils (nodes) to return
total_trust int >0 The maximal number of trusted nodes
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_sybil_rank(ALGO_REQUEST.Sybil_Rank(loop_num=3,sybil_num=3,
trust_seeds=[12,21],total_trust=10))
print(ret.toJSON())
``````

### LPA

LPA is short for Label Propagation Algorithm that predicts an unmarked node's label based on existing labels. LPA is widely adopted in many practical scenarios, such as virtual community mining, multi-media information categorization, etc.

`[Parameter]` of LPA:

Name Data Type Specification Description
loop_num int >0 The number of loops or iterations
node_property_name string / The property name that the label points to
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_lpa(ALGO_REQUEST.Lpa(loop_num=5,node_property_name='name'))
print(ret.toJSON())
``````

### HANP

HANP is short for Hop Attenuation & Node Preference. As an extension to LPA, HANP requires several more input parameters, notably the m and delta, which correspond to degree-of-preference to a current node and how far a particular label can spread.

`[Parameter]` of HANP algorithm:

Name Data Type Specification Description
loop_num int >0 The number of loops or iterations
edge_property_name string edge property The edge property to use
node_property_name string node property The node property to use
m float / Correspondent to a node’s neighbors;
m>0: more preference is given to node with more neighbors
m<0: more preference is given to node with less neighbors
delta float (0, 1) To govern how far a label can spread from its origin, the bigger delta value the faster
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_hanp(ALGO_REQUEST.Hanp(loop_num=3,edge_property_name='rank',
node_property_name='name',m=1,delta=1.5))
print(ret.toJSON())
``````

### Louvain

Louvain is a modularity-based community detection algorithm. It was named after the place where it was invented: University of Louvain, back in 2008 – Modularity is a scale value that measures the relative density of edges inside communities with respect to edges outside communities. Optimizing modularity (theoretically) results in high grouping of nodes in a given network (or graph), for instance, a modularity value of 0.92 is significantly higher than 0.75, therefore indicating that communities are finely grouped.

Louvain algorithm will write back the community id that each node belongs to both the database as a new node attribute (if it's not already created) as well as a disk file that contains all nodes and community ids.

`[Parameter]` of Louvain algorithm:

Name Data Type Specification Description
phase1_loop_num int >0 The number of loops or iterations, starting from 1 and grow incrementally
min_modularity_increase float >0 Start from 0.01 or 0.001
edge_property_name string edge property The edge property to use
write_back bool default: False Whether to write back to the database
visualization bool default: False Whether to visualize the result
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_louvain(ALGO_REQUEST.Louvain(phase1_loop_num=3,
min_modularity_increase=0.5,edge_property_name='rank'))
print(ret.toJSON())
``````

### Louvain Visualization

Louvain visualization works on the calculated results from Louvain algorithm and displays them inside Ultipa Manager.

`[Parameter]` of Louvain visualization:

Name Data Type Specification Description
id int / The id of Algorithm
algo_name string / The name of algorithm
top int (0,10] Top N communities
total int (0,10000] The nubmer of nodes to sample
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_dv(ALGO_REQUEST.Dv(algo_name='louvain',id=7,top=10))
print(ret.toJSON())
``````

## 10.6 Similarity

### Jaccard

Jaccard similarity measures the similarity between finite samples sets (for instance, 2 nodes), and is defined as the size of the intersection divided by the size of the union of the sample sets. Obviously, the larger the coefficient is, the higher the similarity.

`[Parameter]` of Jaccard similarity:

Item Data Type Specification Description
`<node_id1>` int Ultipa ID The subject node
`<node_id2>` int Ultipa ID (1/2, optional) To calculate the similarity of node1 and node2, does not coexist with `<top>`
`<top>` int >0 (1/2, optional) To find <top> nodes that are most similar to node1, does not coexist with `<node_id2>`
``````from ultipa import Connection,ALGO_REQUEST
# Calculate the similarity of two nodes:
ret = conn.algo_jaccard(ALGO_REQUEST.Jaccard(node_id1=12,node_id2=21))
print(ret.toJSON())
# Calculate N nodes that are similar to the subject node:
ret = conn.algo_jaccard(ALGO_REQUEST.Jaccard(node_id1=12,top=10))
print(ret.toJSON())
``````

### Cosine Similarity

Cosine similarity measures the similarity between two non-zero vectors of inner product space. In the context of a graphset, the two non-zero vectors are comprised of the two sets of properties of two nodes. The result of computed cosine similarity ranges from 0 to 1, where 1 means 100% similar, 0 means no-similarity at all.

`[Parameter]` of Cosine similarity:

Name Data Type Specification Description
`<node_id1>` int Ultipa ID The node1
`<node_id2>` int Ultipa ID The node2
`<node_property_names>` []string comma (,) separated, at least two properties The node properties to be used for calculation
``````  from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_cosine_similarity(ALGO_REQUEST.Cosine_Similarity(
node_id1=12,
node_id2=21,
node_property_names=['name','age']
))
print(ret.toJSON())
``````

## 10.7 Graph Embedding

Graph embedding is to extract high-dimensional data and morph it into low-dimensional data, the work is usually done against and to represent a node or a struct.

Randon Walk is a graph embedding algorithm that starts from a random node in the graph, and randomly walks on the graph by following certain rule(s) and eventually forms a set of paths, which can be used as a training sample for embedding. Ultipa offers multiple random walking mechanism:

• Random Walk
• node2vec random walk
• path-template based random walk
• struc2vec random walk

### Random Walk

`[Parameter]` of Random Walk:

Item Data Type Specification Description
`<walk_num>` int >0 Number of walks
`<walk_length>` int >0 Length per walk
`<edge_property_name>` string edge property (Optional) The edge property to Random-walk along
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_random_walk(ALGO_REQUEST.Random_Walk(walk_num=2,
walk_length=3,edge_property_name='rank'))
print(ret.toJSON())
``````

### Node2Vec Random Walk

`[Parameter]` of Node2Vec random walk:

Name Data Type Specification Description
`<walk_num>` int >0 Number of walks
`<walk_length>` int >0 Length per walk
`<p>` float >0 The bigger the value, the lower the probability to walk backward
`<q>` float >0 The probability to walk away (farther or deeper)
>1：same-depth walk
<1：deeper
`<edge_property_name>` string edge property (Optional) The edge property to Random-walk along
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_random_walk_node2vec(ALGO_REQUEST.Random_Walk_Node2vec(
walk_num=3,walk_length=2,p=0.1,q=0.1,edge_property_name='rank'))
print(ret.toJSON())
``````

### Struc2Vec Random Walk

`[Parameter]` of Struc2Vec random walk:

Name Data Type Specification Description
`<walk_num>` int >0 Number of walks
`<walk_length>` int >0 Length per walk
`<k>` int >0 The number of layer(depth). Use a number smaller or equal to graph-wide avg. shortest path amongst all nodes.Bigger k means much longer time to process! Start from 1 and grow incrementally
`<stay_probability>` float (0, 1) The probability to stay at the current layer(depth)
``````from ultipa import Connection,ALGO_REQUEST
ret = conn.algo_random_walk_struc2vec(ALGO_REQUEST.Random_Walk_Stuc2vec(
walk_num=3,walk_length=2,k=3,stay_probability=0.1))
print(ret.toJSON())
``````

### Node2Vec Embedding

Node2Vec is a graph embedding method that combines DFS neighborhood and BFS neighborhood - by leveraging both DFS and BFS random walk to generate training samples, and using skip-gram and SDG’s word2vec training model to assign embedding vector to vertices in the graph.

Node2Vec relies on Node2Vec random walk algorithm, more specifically, node2vec random walk is considered the 1st step of execution during the overall node2vec execution process. And this is true to Struc2Vec vs. Struc2Vec Random Walk as well.

`[Parameter]` of Node2Vec embedding:

Name Data Type Specification Description
`<walk_num>` int >0 Number of walks
`<walk_length>` int >0 Length per walk
`<p>` float >0 The bigger the value, the lower the probability to walk backward
`<q>` float >0 The probability to walk away (farther or deeper)
>1：same-depth walk
<1：deeper
`<window_size>` int >0 The window size
`<dimension>` int >0 example: 100
`<learning_rate>` double (0, 1) Recommend: 0.025
`<min_learning_rate>` double (0, 1) example: learning_rate * 0.0001
`<sub_sample_alpha>` double / (Optional) -1 as default,can be 0.001 in word2vec, the equation is (sqrt(x / 0.001) + 1) * (0.001 / x)
`<resolution>` int / Recommend: 10 or 100
`<neg_num>` int (0, 10) (Optional) The number of words in negative sampling phrase
`<iter_num>` int (0, 100) The loop number
`<min_frequency>` int >=0 (Optional) To remove nodes that occur fewer than 'min_frequency'
`<edge_property_name>` string edge property (numeric type) (Optional) The edge property to Random-walk along
``````from ultipa import Connection,ALGO_REQUEST
ret=conn.algo_node2vec(ALGO_REQUEST.Node2vec(walk_num=2,loop_num=2,
walk_length=2,p=0.1,q=0.1,context_size=2,dimension=10,learning_rate=0.3
min_learning_rate=0.3,sub_sample_alpha=0.001,resolution=2,
neg_num=2,iter_num=2,min_frequency=0,edge_property_name='rank'))
print(ret.toJSON())
``````

### Struc2Vec Embedding

Struc2Vec embedding uses a novel and flexible framework for learning latent representations for the structural identity of nodes. It uses a hierarchy to measure node similarity at different scales, and constructs a multilayer graph to encode structural similarities and generate a structural context for nodes.

`[Parameter]` of Struc2Vec embedding:

Name Data Type Specification Description
`<walk_num>` int >0 Number of walks
`<walk_length>` int >0 Length per walk
`<k>` int >0 The number of layer(depth). Use a number smaller or equal to graph-wide avg. shortest path amongst all nodes.Bigger k means much longer time to process! Start from 1 and grow incrementally
`<stay_probability>` float (0, 1) The probability to stay at the current layer(depth)
`<window_size>` int > 0 window size
`<dimension>` int > 0 example: 100
`<learning_rate>` double (0, 1) Recommend: 0.025
`<min_learning_rate>` double (0, 1) example: learning * rate * 0.0001
`<sub_sample_alpha>` double / (Optional) -1 as default,can be 0.001 in word2vec, the equation is (sqrt(x / 0.001) + 1) _ (0.001 / x)
`<resolution>` int Recommend: 10 or 100
`<neg_num>` int >0 (Optional) number of words in negative sampling phrase
`<loop_num>` int >0 loop number
`<min_frequency>` int >= 0 (Optional) remove nodes that occur fewer than 'min_frequency'
``````from ultipa import Connection,ALGO_REQUEST
ret=conn.algo_struc2Vec(ALGO_REQUEST.Struc2Vec(walk_num=3,
walk_length=2,k=2,stay_probability=0.2,context_size=0.5,
dimension=10,learning_rate=0.025,min_learning_rate=0.001,
sub_sample_alpha=0.001,resolution=10,neg_num=2,loop_num=3,
min_frequency=3))
print(ret.toJSON())
``````

### LINE (Large Information Network Embedding)

LINE is also a graph-embedding training model, it leverages BFS information (1-hop or 2-hop similarity) to construct training samples, and uses a training method that's similar to Node2Vec's to generate node's embedding vector.

`[Parameter]` of LINE embedding:

Name Data Type Specification Description
`<edge_property_name>` string edge property edge property name
`<dimension>` int >0 dimension of the embedded vector, 10~n10 (e.g : 100)
`<resolution>` int >0 Recommend: 10 or 100
`<start_alpha>` double > 0 0.025 recommended
`<neg_num>` int >0 (Optional) number of words in negative sampling phrase
`<total_sample>` int (0,10) number of samples
`<train_order>` int 1; 2 (Optional) 1: first order proximity
2: second order proximity
``````from ultipa import Connection,ALGO_REQUEST