Change Password

Please enter the password.
Please enter the password. Between 8-64 characters. Not identical to your email address. Contain at least 3 of uppercase, lowercase, numbers, and special characters (such as @*&#).
Please enter the password.
Submit

Change Nickname

Current Nickname:
Submit

v4.2
Search
中文EN
v4.2

    Struc2Vec

    Overview

    Struc2Vec graph embedding algorithm was first published in 2017 SIGMOD conference. Struc2vec literally means 'structure to vector'. This algorithm establishes a framework for generating node vectors from graph while preserving the graph structure (i.e., topological similarities). Although Node2Vec results also reflect structural similarity between nodes to some extent, they are still limited to the random walk depth. Nodes with similar structures may be far apart in the network, Struc2Vec ensures that those nodes are close in the embedding space.

    In general, Node2Vec is adequate if the downstream tasks (e.g., classification) value the homophily of nodes more. However, if the downstream tasks are to perform on the similarity of the local topology of nodes, Struc2Vec is more applicable.

    Related material of the algorithm:

    Basic Concept

    Structural Identity

    In almost all networks, nodes tend to have different structural identities determined by their functions. Nodes perform similar functions are partitioned into the same class, that is, have structural similarity. For example, in the social network of a company, the roles of all interns are found similar.

    Structural similarity means that the neighborhood topology of two nodes is homogeneous or symmetrical. As shown in the graph below, nodes v and u are structural similar (degrees 5 and 4, connected to 3 and 2 triangles, connected to the rest of the network by two nodes), but very far apart in the network, they neither have direct connection nor any common neighbor.

    Key Ideas of Struc2Vec

    When generating node sequences from random walks similar to Node2Vec, if two nodes have similar sequences, their embedding results are also similar. It is easy to think that nodes that are closer together in the graph are more likely to produce similar walk sequences, while nodes that are farther apart (beyond the walk depth) are almost impossible to produce similar sequences, even if they have similar structures.

    Struc2Vec overcomes this limitation, and the key ideas of Struc2Vec are:

    • Assess structural similarity between nodes independently of node and edge properties as well as their position in the network.
    • Establish a hierarchy to measure structural similarity: at the bottom of the hierarchy, structural similarity between nodes depend only on their degrees; while at the top of the hierarchy, similarity depends on the entire network.
    • Generate random walk sequences for nodes, nodes with similar sequences have similar structure.

    Struc2Vec Framework

    1. Measuring Structural Similarity

    Intuitively, two nodes that have the same degrees are structurally similar, but if their neighbors also have the same degree, then they are even more structurally similar.

    Let Rk(u) denote the set of nodes at exactly distance (hop) k from node u in graph (0 ≤ k ≤ graph diameter), s(S) denote the ordered degree sequence of a node set S, below is an example:

    When neighbors at distance k both exist for node u and v, let fk(u,v) denote the structural distance between u and v when considering their k-hop neighborhoods (all nodes at distance less than or equal to k):

    where g() ≥ 0 measures the distance between two sequences. For nodes u and v, sequences s(Rk(u)) and s(Rk(v)) can be of different sizes. To assess distance between two sequences of different sizes, Dynamic Time Wrapping (DTW), or any other appliable function can be adopted. Note that if the k-hop neighborhoods of node u and v are isomorphic, then fk-1(u,v) = 0.

    2. Constructing Multilayer Weighted Graph

    For graph G = (V,E), construct a multilayer weighted graph M as follows:

    Each layer k of M is formed by a weighted undirected complete graph, the edge weight between every node pair u and v is inversely proportional to the structural distance between them, as defined below:

    Note that edges are defined only if fk(u,v) is defined.

    Every node u in layer k is connected to the corresponding node u in layer k+1 and k-1. The directed edge weight between layers are as follows:

    where Γk(u) is the number of edges incident to u that have weight larger than the average edge weight of the complete graph in layer k. Γk(u) actually measures the similarity of node u to other nodes in layer k. Note that if node u has many similar nodes in layer k, then it should change to some higher layer to obtain a more refined context.

    3. Generating Context For Nodes

    Each node performs random walk in the multilayer weighted graph M to generate structural context, that is, the sequence of nodes. Each node starts random walks in its corresponding node in layer 0. When reaches uk:

    • The random walk first decides if it will change layers or walk in the current layer k, the probability of staying in the current layer k is given by the parameter stay_probability (i.e. p in the graph below)
    • If stays in the current layer, the probability of uk jumping to each neighbor node is proportional to the edge weights between them; that is, the random walk will prefer to step onto nodes that are structurally more similar to the current node.
    • With probability 1 − p, the random walk decides to change layers, and moves to the corresponding node either in layer k+1 or layer k−1 with probability proportional to the edge weights.

    4. Training the Language Model

    Struc2Vec uses the Skip-gram model used by Word2Vec to learn the embeddings of nodes.

    To learn more about Skip-gram Model, please read Skip-gram Model and Skip-gram Model Optimization.

    Special Case

    Lonely Node, Disconnected Graph

    Lonely node does not have any neighbor, thus the structural distance cannot be defined from 1-hop neighborhood. Thus, from layer 1 to each higher layer in the multilayer weighted graph, the edge weight between lonely node and other nodes are extremely small.

    Connectivity of the graph does not affect the structural similarity of nodes, nodes that are highly structural similar might locate in different connected component.

    Self-loop Edge

    The way Struc2Vec algorithm counts the degree of self-loop edge is the same with the Degree algorithm algo(degree), each self-loop edge is counted twice.

    Directed Edge

    For directed edges, Struc2Vec algorithm ignores the direction of edges but calculates them as undirected edges.

    Command and Configuration

    • Command:algo(struc2vec)
    • Configurations for the parameter params():
    Name Type
    Default
    Specification
    Description
    ids / uuids []_id / []_uuid / / IDs or UUIDs of nodes to start the walk; all nodes to be selected if not set
    walk_length int 1 >=1 Depth of each walk, i.e. the number of nodes walking through
    walk_num int 1 >=1 Number of walks
    k int / [1,10] Number of layers of the multilayer graph, it should be equal or lower than the diameter of the graph
    stay_probability float / [0,1] The probability of staying at the current level
    window_size int / >=1 Size of the sliding window, to sample window_size nodes on left and window_size nodes on right
    dimension int / >=1 Dimension of node embedding vector
    learning_rate float / (0,1) Initial learning rate, it decreases until reaches min_learning_rate with increase in training iterations
    min_learning_rate float / (0,learning_rate) Minimum learning rate
    min_frequency int / >=0 Minimum number of occurrences of a node in the walk sequences to be included in the model, <=1 means to keep all nodes
    sub_sample_alpha float / / Threshold for sub sampling, <=0 means no sub sampling
    resolution int / >=1 Such as 10, 100
    neg_num int / >=0 Number of negative samples, it is suggested 0~10
    loop_num int / >=1 Number of training iterations (epochs)
    limit int -1 >=-1 Number of results to return; return all results if sets to -1 or not set

    Example: Run Struc2Vec in the whole graph

    algo(struc2vec).params({
        walk_length: 3,
        walk_num: 2,
        k: 2,
        stay_probability: 0.5,
        window_size: 5,
        dimension: 5,
        learning_rate: 0.025,
        min_learning_rate: 0.00025,
        min_frequency: -1,
        sub_sample_alpha: -1,
        resolution: 2,
        neg_num: 0,
        loop_num: 2,
        limit: -1
    }) as results
    return results
    

    Algorithm Execution

    Task Writeback

    1. File Writeback

    Configuration Data in Each Row
    filename _id,embedding_result

    2. Property Writeback

    Not supported by this algorithm.

    3. Statistics Writeback

    This algorithm has no statistics.

    Direct Return

    Alias Ordinal Type
    Description
    Column Name
    0 []perNode Array of UUIDs and embeddings of nodes _uuid, embedding_result

    Streaming Return

    Alias Ordinal Type
    Description
    Column Name
    0 []perNode Array of UUIDs and embeddings of nodes _uuid, embedding_result

    Real-time Statistics

    This algorithm has no statistics.

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写