Struc2Vec - Graph Analytics & Algorithms

Change Password

Submit

Change Email

Submit

Change Nickname

Current Nickname:

Submit

Profile

Account ID:

Full Name:
Phone:
Company:
Company Email:
Country:
Language:

Change Password

Apply

You have no license application record.

Apply

Certificate	Issued at	Valid until	Serial No.	File

Serial No.	Valid until	File

Not having one? Apply now! >>>

Product	Created On	ID	Amount (USD)	Invoice

Product	Created On	ID	Amount (USD)	Invoice

No Invoice

Create Ultipa Account

Already have an Ultipa account? Sign in now!

Forgot Password

Reset Password

Back to sign in

Struc2Vec

HDC

Overview

Struc2Vec stands for "structure to vector". This algorithm revolutionizes graph embeddings by generating node vectors while retaining the inherent graph structure, focusing on preserving topological similarities.

L. Ribeiro, P. Saverese, D. Figueiredo, struc2vec: Learning Node Representations from Structural Identity (2017)

While Node2Vec captures a certain degree of structural similarity among nodes, it is limited by the depth of random walks used during the generation process. On the other hand, Struc2Vec overcomes this limitation in its framework. It ensures that nodes with similar structural characteristics are represented close to each other in the embedding space.

The choice between Node2Vec and Struc2Vec depends on the nature of downstream tasks:

Node2Vec suits tasks prioritizing node homophily, capturing similarity in attributes and connections.
Struc2Vec excels when tasks demand a focus on local topology similarity, preserving the structural relationships among nodes.

Concepts

Structural Similarity

In various networks, nodes often possess distinct structural identities shaped by their specific functions or roles. Nodes performing similar functions are naturally belong to the same class, signifying their structural similarity. For instance, in a company's social network, all interns might exhibit similar roles.

Structural similarity among nodes implies that their neighborhood topologies are homogenous or symmetrical. This indicates that nodes with similar functions have analogous connections and relationships with their neighboring nodes.

As illustrated here, nodes u and v are structurally similar (degrees 5 and 4, connected to 3 and 2 triangles, connected to the rest of the network by 2 nodes). Although they lack a direct link or shared neighbor, and they can be very far apart in the network.

When the distance between nodes exceeds the depth of random walks, it becomes challenging to generate similar representations for them using methods like Node2Vec. This limitation is effectively addressed by the Struc2Vec algorithm.

Struc2Vec Framework

1. Measure structural similarity

Intuitively, two nodes that have the same degrees are considered structurally similar, but if their neighbors also have the same degree, then they are even more structurally similar.

Consider an undirected, unweighted graph G = (V, E), its diameter is denoted as k*. Let R_k(u) denote the set of nodes located at an exact distance (hop count) of k ∈ [0, k*] from node u within G. Let s(S) denote the ordered degree sequence of a node set S ⊂ V. Here is an example:

Let f_k(u,v) denote the structural distance between u and v when considering their k-hop neighborhoods (all nodes at distance less than or equal to k):

where function g() ≥ 0 measures the distance between two degree sequences. Note that f_k(u,v) is non-decreasing in k and is defined only when both u and v have neighbors at distance k.

To assess distance between sequences s(R_k(u)) and s(R_k(v)), which can be of different sizes, Dynamic Time Wrapping (DTW), or any other appliable function, can be adopted. Note that if the k-hop neighborhoods of node u and v are isomorphic, then f_k-1(u,v) = 0.

2. Construct a multilayer weighted graph

Struc2Vec constructs a multilayer weighted graph M that encodes the structural similarity between nodes, where layer k is defined using the k-hop neighborhoods of the nodes.

Each layer k is formed by a weighted undirected complete graph with node set V, and thus $\frac{|V|*(|V|-1)}{2}$ edges. The edge weight between nodes u and v is inversely proportional to their structural distance, as given by:

Note that edges are defined only if f_k(u,v) is defined.

Layers are connected by directed edges. Every node is connected to its corresponding node in the layer above and below (layer permitting), and the edge weight between layers are as follows:

where Γ_k(u) is the number of edges incident to u that have weight larger than the average edge weight of the complete graph in layer k. Γ_k(u) actually measures the similarity of node u to other nodes in layer k. Note that if node u has many similar nodes in layer k, then it should change to higher layers to obtain a more refined context.

3. Generate context for nodes

Struc2Vec uses random walks to generate sequence of nodes to determine the context of a gievn node.

Consider a biased random walk that moves in graph M. Each node starts the walk in its corresponding node in layer 0, and when it reaches node u in layer k (denoted as u_k), the random walk first decides if it will (1) stay in the current layer, or (2) change layer:

(1) With probability q the random walk stays in the current layer: the probability of moving to v_k is proportional to w_k(u,v). Note that the random walk will prefer to step onto nodes that are structurally more similar to the current node.

(2) With probability 1 − q, the random walk changes layer: the probabilities of moving to u_k+1 or u_k-1 are proportional to w_k(u_k,u_k+1) and w_k(u_k,u_k-1). It's important to note that in this case, the node u is recorded only once in the random walk sequence.

The random walks have a fixed and relatively short depth (number of steps), and the process is repeated a certain number of times.

4. Train the model

The node sequences obtained from the random walks serve as input to the Skip-gram model. SGD is used to optimize the model's parameters based on the prediction error, and the model is optimized by techniques such as negative sampling and subsampling.

Considerations

When considering the degree of a node, any self-loop is counted twice.
The Struc2Vec algorithm ignores the direction of edges but calculates them as undirected edges.

Example Graph

To create this graph:

// Runs each row separately in order in an empty graphset
insert().into(@default).nodes([{_id:"A"},{_id:"B"},{_id:"C"},{_id:"D"},{_id:"E"},{_id:"F"},{_id:"G"},{_id:"H"},{_id:"I"},{_id:"J"}])
insert().into(@default).edges([{_from:"A", _to:"B"}, {_from:"A", _to:"C"}, {_from:"D", _to:"C"}, {_from:"D", _to:"F"}, {_from:"E", _to:"C"}, {_from:"E", _to:"F"}, {_from:"F", _to:"G"}, {_from:"G", _to:"J"}, {_from:"H", _to:"G"}, {_from:"H", _to:"I"}])

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as hdc_struc2vec:

CALL hdc.graph.create("hdc-server-1", "hdc_struc2vec", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static",
  query: "query",
  default: false
})

hdc.graph.create("hdc_struc2vec", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static",
  query: "query",
  default: false
}).to("hdc-server-1")

Parameters

Algorithm name: struc2vec

Name	Type	Spec	Default	Optional	Description
`ids`	[]`_id`	/	/	Yes	Specifies nodes to start random walk by their `_id`; computes for all nodes if it is unset.
`uuids`	[]`_uuid`	/	/	Yes	Specifies nodes to start random walk by their `_uuid`; computes for all nodes if it is unset.
`walk_length`	Integer	≥1	`1`	Yes	Depth of each walk, i.e., the number of nodes to visit.
`walk_num`	Integer	≥1	`1`	Yes	Number of walks to perform for each specified node.
`k`	Integer	[1, 10]	/	No	Number of layers in the constructed multilayer weighted graph, which should not exceed the diameter of the original graph.
`stay_probability`	Float	(0,1]	/	No	The probability of walking in the current level.
`window_size`	Integer	≥1	/	No	The maximum size of context.
`dimension`	Integer	≥2	/	No	Dimensionality of the embeddings.
`loop_num`	Integer	≥1	/	No	Number of training iterations.
`learning_rate`	Float	(0,1)	/	No	Learning rate used initially for training the model, which decreases after each training iteration until reaches `min_learning_rate`.
`min_learning_rate`	Float	(0,`learning_rate`)	/	No	Minimum threshold for the learning rate as it is gradually reduced during the training.
`neg_num`	Integer	≥1	`5`	Yes	Number of negative samples to produce for each positive sample, it is suggested to set between 1 to 10.
`resolution`	Integer	≥1	`1`	Yes	The parameter used to enhance negative sampling efficiency; a higher value offers a better approximation to the original noise distribution; it is suggested to set as 10, 100, etc.
`sub_sample_alpha`	Float	/	`0.001`	Yes	The factor affecting the probability of down-sampling frequent nodes; a higher value increases this probability; a value ≤0 means not to apply subsampling
`min_frequency`	Integer	/	/	No	Nodes that appear less times than this threshold in the training "corpus" will be excluded from the "vocabulary" and disregarded in the embedding training; a value ≤0 means to keep all nodes.
`return_id_uuid`	String	`uuid`, `id`, `both`	`uuid`	Yes	Includes `_uuid`, `_id`, or both values to represent nodes in the results.
`limit`	Integer	≥-1	`-1`	Yes	Limits the number of results returned; `-1` includes all results.

File Writeback

CALL algo.struc2vec.write("hdc_struc2vec", {
  params: {
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    k: 10,
    stay_probability: 0.4,
    window_size: 5,
    dimension: 5,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  },
  return_params: {
    file: {
      filename: "embeddings"
    }
  }
})

algo(struc2vec).params({
  projection: "hdc_struc2vec",
  return_id_uuid: "id",
  walk_length: 10,
  walk_num: 20,
  k: 10,
  stay_probability: 0.4,
  window_size: 5,
  dimension: 5,
  loop_number: 10,
  learning_rate: 0.01,
  min_learning_rate: 0.0001,
  neg_number: 9,
  resolution: 100,
  sub_sample_alpha: 0.001,
  min_frequency: 3
}).write({
  file:{
    filename: 'embeddings'
  }
})

DB Writeback

Writes the embedding_result values from the results to the specified node property. The property type is float[].

CALL algo.struc2vec.write("hdc_struc2vec", {
  params: {
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    k: 10,
    stay_probability: 0.4,
    window_size: 5,
    dimension: 4,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  },
  return_params: {
    db: {
      property: "vector"
    }
  }
})

algo(struc2vec).params({
  projection: "hdc_struc2vec",
  return_id_uuid: "id",
  walk_length: 10,
  walk_num: 20,
  k: 10,
  stay_probability: 0.4,
  window_size: 5,
  dimension: 4,
  loop_number: 10,
  learning_rate: 0.01,
  min_learning_rate: 0.0001,
  neg_number: 9,
  resolution: 100,
  sub_sample_alpha: 0.001,
  min_frequency: 3
}).write({
  db: {
      property: "vector"
    }
})

Full Return

CALL algo.struc2vec("hdc_struc2vec", {
  params: {
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    k: 10,
    stay_probability: 0.4,
    window_size: 5,
    dimension: 4,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  },
  return_params: {}
}) YIELD embeddings
RETURN embeddings

exec{
  algo(struc2vec).params({
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    k: 10,
    stay_probability: 0.4,
    window_size: 5,
    dimension: 4,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  }) as embeddings
  return embeddings
} on hdc_struc2vec

Stream Return

CALL algo.struc2vec("hdc_struc2vec", {
  params: {
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    k: 10,
    stay_probability: 0.4,
    window_size: 5,
    dimension: 5,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  },
  return_params: {
    stream: {}
  }
}) YIELD embeddings
RETURN embeddings

exec{
  algo(struc2vec).params({
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    k: 10,
    stay_probability: 0.4,
    window_size: 5,
    dimension: 5,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  }) as embeddings
  return embeddings
} on hdc_struc2vec

ID
Product
Status
Cores
Applied Validity Period(days)
Effective Date
Excpired Date
Mac Address
Apply Comment
Review Comment