Fast Random Projection

✓ File Writeback ✕ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats

Overview

Fast Random Projection (or FastRP) is a scalable and performant algorithm for learning node representations in a graph. It achieves an iterative computation of node embeddings with two components: first, node similarity matrix construction; second, dimension reduction by random projection.

FastRP was proposed by H. Chen et al. in 2019:

H. Chen, S.F. Sultan, Y. Tian, M. Chen, S. Skiena, Fast and Accurate Network Embeddings via Very Sparse Random Projection (2019)

FastRP Framework

The authors of FastRP propose that most network embedding methods essentially consist of two components: (1) construct a node similarity matrix or implicitly sample node pairs from it, and then (2) apply dimension reduction techniques to this matrix.

Regarding the dimension reduction, many popular algorithms, like Node2Vec, rely on time-consuming methods such as Skip-gram. The authors argue that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed.

Therefore, FastRP utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction.

Very Sparse Random Projection

Random projection is a dimension reduction method that roughly preserves the relative pairwise distances between data points when embedding them from the high-dimensional space into a lower-dimensional space. The theoretical guarantees of random projection are mainly derived from the Johnson-Lindenstrauss Lemma.

The idea behind random projection is very simple: to reduce a matrix M_n×m (for graph data, we have n = m) to a matrix N_n×d where d ≪ n, we can simply multiply M with an random projection matrix R_m×d: N = M ⋅ R**.

The matrix R is generated by sampling its entries independently from a random distribution. Specifically, FastRP considers the very sparse random projection method for dimension reduction, where entries of R are sampled from the following distribution:

and s=sqrt(m).

Very sparse random projection has superior computation efficiency as it only requires matrix multiplication, and (1-1/s) of the entries of R are zero.

Additionally, Ultipa supports to initialize the matrix R with a combination with some selected numeric node properties (features). In this case, each row in matrix R is formed by concatenation of two parts:

Elements d₁ ~ d_x are generated by the Very Sparse Random Projection algorithm.
Elements p₁ ~ p_y is a linear combination of the selected node property values.
x + y equals to the dimension of the final embeddings, where y is referred to as the property dimension.
The number of selected node features doesn't have to be the same as the property dimension.

Node Similarity Matrix Construction

FastRP keeps two key features when constructing node similarity matrix:

Preserve high-order proximity in the graph
Perform element-wise normalization on the matrix

Let S be the adjacency matrix of graph G =(V,E), D be the degree matrix, and A be the transition matrix and A = D^-1S. See the example below, the number on each edge represents the edge weight:

The fact is that in matrix A, entry A_ij stands for the probability of node i reaching to node j with 1-step random walk. Furthermore, if raises A to k-th power to get matrix A^k, then A^k_ij denotes the probability of reaching node j from node i in k steps of random walk. Hence, matrix A^k preserves high-order proximity of the graph.

However, as k → ∞, it can be proved that A^k_ij → d_j/2m where m = |E|, and d_j is the degree of the j-th node. Since many real-world graphs are scale-free, the entries in A^k follow a heavy-tailed distribution. This implies that the pairwise distances between data points in A^k are predominantly influenced by columns with exceptionally large values, making them less meaningful and posing challenges for the next dimension reduction step.

To address this, FastRP further performs normalization on matrix A^k by multiplying it with a diagonal normalization matrix L, where L_ij = (d_j/2m)^β, and β is the normalization strength. The normalization strength controls the impact of node degree on the embeddings: when β is negative, the importance of high-degree neighbors is weakened, and the opposite when β is positive.

Algorithm Process

The FastRP algorithm follows these steps:

Generate the random projection matrix R.
Reduce the dimension of the normalized 1-step transition matrix: N₁ = A ⋅ L ⋅ R
Reduce the dimension of the normalized 2-step transition matrix: N₂ = A ⋅ (A ⋅ L ⋅ R) = A ⋅ N₁
Repeat step 3 until the normalized k-step transition matrix: N_k = A ⋅ N_k-1
Generate the weighted combination of different powers of A: N = α₁N₁ + ... + α_kN_k, where α₁,...,α_k are the iteration weights.

NOTE

The efficiency of FastRP also benefits from the associative property of matrix multiplication. It doesn't calculate A^k from scratch as the computation can be done iteratively.

Considerations

The FastRP algorithm ignores the direction of edges but calculates them as undirected edges.

Syntax

Command: algo(fastRP)
Parameters:

Name	Type	Spec	Default	Optional	Description
dimension	int	≥2	/	No	Dimensionality of the embeddings
normalizationStrength	float	/	/	Yes	Normalization strength β
iterationWeights	[]float	>0	[0.0,1.0,1.0]	Yes	Weight for each iteration, the size of the array is the number of iterations
edge_schema_property	[]`@<schema>?.<property>`	Numeric type, must LTE	/	Yes	Edge property(-ies) to be used as edge weight(s), where the values of multiple properties are summed up
node_schema_property	[]`@<schema>?.<property>`	Numeric type, must LTE	/	Yes	Node property(-ies) to be used as feature(s); `propertyDimension` and `node_schema_property` should be configured together or both ignored
propertyDimension	int	(0,`dimension`]	/	Yes	Length of the property dimension; `propertyDimension` and `node_schema_property` should be configured together or both ignored
limit	int	≥-1	`-1`	Yes	Number of results to return, `-1` to return all results

Example

File Writeback

Spec	Content
filename	`_id`, `embedding_result`

UQL
algo(fastRP).params({
  dimension : 10,
  normalizationStrength: 1.5,
  iterationWeights: [0, 0.8, 0.2, 1],
  edge_schema_property: ['w1', 'w2'],
  node_schema_property: ['@default.f1', '@default.f2'],
  propertyDimension: 3
}).write({
  file:{
    filename: 'embedding'
  }
})

Property Writeback

Spec	Content	Write to	Data Type
property	`embedding_result`	Node Property	`string`

UQL
algo(fastRP).params({
  dimension : 10,
  normalizationStrength: 1.5,
  iterationWeights: [0, 0.8, 0.2, 1],
  edge_schema_property: ['w1', 'w2']
}).write({
  db:{
    property: 'embedding'
  }
})

Direct Return

Alias Ordinal	Type	Description	Columns
0	[]perNode	Node and its embeddings	`_uuid`, `embedding_result`

UQL
algo(fastRP).params({
  dimension : 10,
  normalizationStrength: 1.5,
  iterationWeights: [0, 0.8, 0.2, 1]
}) as embeddings
return embeddings

Stream Return

Alias Ordinal	Type	Description	Columns
0	[]perNode	Node and its embeddings	`_uuid`, `embedding_result`

UQL
algo(fastRP).params({
  dimension : 10,
  normalizationStrength: 1.5,
  iterationWeights: [0, 0.8, 0.2, 1]
}).stream() as embeddings
return embeddings