Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit
Search
v2.x
    v4.0

    Label Propagation

      Advanced     Whole Graph     Community Detection     Label Propagation  

    Overview

    LPA (Label Propagation Algorithm) is a community recognition algorithm based on label propagation, which is an iterative process of aggregating nodes with the goal of stablizing the distribution of labels. Although the results of this algorithm have a certain degree of randomness, it is very suitable for large complex networks which owes to its low time complexity and it dispenses with the need to specify the number of communities in advance, and it is often used as benchmark in academia and industry.

    Related material of the algorithm is as below:

    Basic Concept

    Label

    Selecting the value of a property of node as label, when the label of a node is valid, the label can be propagated to the 1-step neighbors of the node; whether the label is valid or not, the node can receive valid labels from its 1-step neighbors. When dividing the community, the label represents the community to which the node belongs, the propagation of label is the expansion of the community scope.

    As shown in the graph above, the red and yellow nodes hold valid labels a and b respectively, while other nodes have no labels, or interpreted as labels of other nodes are invalid. During the first round of propagation, label a and b are propagated to the 1-step neighbors of the red and yellow nodes respectively; the blue node, as the common 1-step neighbor of the two nodes, will decide which label to receive based on the weight of label a and b (see the explanation later); the self-loop edge of the red node propagates and receives the label a of itself at the same time; the green node will not receive any label during the first propagation, but will receive label b from the node on its left (its only neighbor) during the second propagation.

    Ultipa's LPA supports full label propagation (all nodes' labels are valid), if there is a need to specify the initial valid labels of some nodes in the graph, please contact Ultipa team for algorithm custimization.

    Label Weight

    For node i, the weight of the label l on its neighbor node j equals to the product of the weight of j and the sum of the weights of all edges between i and j; when i has multiple neighbor nodes with label l, the weights of each l need to be summed:

    where j is any neighbor node of node i and it holds label l , w(j) is the weight of j, A(ij) is the sum of weights of edges between i and j.

    As shown in the graph above, for the blue node, the weight of label a is 3 * 0.8 + 2 * 0.3 = 3, the weight of label b is 1 * (0.5 + 1.2) = 1.7, if the blue node can receive only one label, it will be label a with a larger weight. For the red node, the weight of label a is 3 * (0.7 * 2) = 4.2, that is, for the node that has self-loop edge, the weight of the label to the node itself is the product of the weight of the node and the doubled weight of the self-loop edge.

    Label Selection

    The most common label propagation is single-label propagation, which is to select one label of the largest weight from the neighbor labels as the new label for the node. When there are multiple neighbor labels with the largest weight at the same time, in principle, one of them will be randomly selected as the new label, but if the current label of the node is contained in these labels, i.e. the randomly selected label might equal to the current label of the node, then the current label will continue to be used without updating.

    Executing single-label propagation on the graph above: Assuming that the weights of all nodes and edges are 1, for the red node, weights of label a and b are both the largest weight 3, since the current label of the red node is c, the red node will randomly select one label from a and b; for the yellow node, weights of label a and c are both the largest weight 2, since the current label of the yellow node is a, the yellow node will keep label a.

    Similar to single-label propagation, multi-label propagation refers to select k labels of the largest weights from the neighbor labels as the new labels for the node. When there are more than k neighbor labels with the largest weights at the same time, in principle, the random selection will be done among the labels of the least weight, but if the randomly selected label might be equal to the current label of the node, the current label will continue to be used without updating.

    Executing multi-label propagation on the graph above and k = 2: Assuming that the weights of all nodes and edges are 1, for the red node, weight of label a is the largest weight 4, weight of labels b and c is the second largest weight 3, if randomly selects c from b and c, together with a the label combination is the same with the current labels of the red node, thus the red node will keep labels a and c; for the yellow node, weight of label a is the largest weight 4, weight of labels c and d is the second largest weight 2, thus it will randomly select a label from c and d to combine with label a as the new labels.

    Ultipa's LPA supoorts both single-label propagation and multi-label propagation.

    Algorithm Process

    Formula

    For any node i in the graph, l is the valid label of i, w(l) is the weight of l, the new label l of i can be denoted as:

    Under the circumstances of multi-label propagation, the algorithm normalizes the weights of node labels, that is, converts the weight of each node label into a weight ratio probability. For example, for a certain node i, the weights of its valid neighbor labels a, b, c are 7, 10, 3 respectively, the weight ratios after the normalization are 0.35, 0.5, 0.15 respectively.

    Iteration

    In each iteration, it calculates for each node whether labels that are different from its current labels can be selected from its neighbor labels, and after all nodes are calculated, updates the nodes which can select new labels. Iterating and looping by this rule until no node could select new labels, or the number of iterations reaches the limit.

    Ultipa's LPA follows the synchronous update principle when updating node labels, the algorithm has a corresponding interrupt mechanism against the the resulting label oscillations (mostly found in bipartite graph).

    Example

    Executing single-label propagation in the graph below, the weights of all nodes and edges are 1.

    a b c d e f g Result of Label Propagation
    1st Iteration a a a b d e e
    2nd Iteration Same as above Same as above Same as above a e Same as above Same as above
    3rd Iteration Same as above Same as above Same as above Same as above Same as above Same as above Same as above Same as above

    Conclusion: The original graph is divided into two communities, {a,b,c,d} and {e,f,g}

    Special Case

    Lonely Node, Disconnected Graph

    For disconnected graph, there is no edge between its lonely nodes and various disconnected areas to propagate labels, and connected components that do not contain the same initial labels will not be divided into the same community.

    Self-loop Edge

    LPA treats the self-loop edge in the same way with the Degree algorithm algo(degree), which counts each self-loop edge twice.

    Directed Edge

    For directed edges, LPA ignores the direction of edges but calculates them as undirected edges.

    Command

    algo(lpa).params({<>})

    Configuration Item Type Default Value Specification Description
    loop_num int 5 >=1 The maximum number of iterations of the algorithm
    node_label_property @<schema>.<property> (Random unique number) Numeric or string class node property, LTE needed Name of the node schema and property where the label locates; nodes without this property will not participate in the label propagation calculation
    node_weight_property @<schema>.<property> (Weight as 1) Numeric node property, LTE needed Name of the node schema and property where the node weight locates; nodes without this property will not participate in the label propagation calculation
    edge_weight_property @<schema>.<property> (Weight as 1) Numeric edge property, LTE needed Name of the edge schema and property where the edge weight locates; edges without this property will not participate in the label propagation calculation
    k int 1 1~10 The k value in multi-label propagation, i.e. the maximum number of labels that kept for each node

    Example: Run LPA within 10 iterations, property of label is @account.type, and keep two labels for each node the maximum

    algo(lpa).params({loop_num: 10, node_label_property: @account.type, k: 2})
    

    File Writeback

    .write({file: {<>}})

    Parameter Type Default Value Specification Description
    filename string / / Name of the file path to be written back. Columns of the file are: _idlabel_1probability_1,... label_kprobability_k

    Property Writeback

    .write({db: {<>}})

    Parameter Type Default Value Specification Description
    property string "" / Name of the node property to be written back. Write the multiple labels and ratios back to: <property>_1, probability_1, ... <property>_k, probability_k

    Statistics Writeback

    .write() and when executing other writeback operations:

    Name Type Description
    label_count int Number of labels

    Note: The number of labels means the number of valid labels after the algorithm is finished; when k = 1, since each node only keeps one label, thus the number of labels is the number of communities, the same below.

    Direct Return

    as <alias>, <alias>, ... return <alias>, <alias>, ...

    Alias Number Type Description Column Name
    0 []perNode Node and its label, ratio _uUid, label_1, probability_1, ... label_k, probability_k
    1 KV Number of labels label_count

    Streaming Return

    .stream() as <alias> return <alias>

    Alias Number Type Description Column Name
    0 []perNode Node and its label, ratio _uuid, label_1, probability_1, ... label_k, probability_k

    Real-time Statistics

    .stats() as <alias> return <alias>

    Alias Number Type Description Column Name
    0 KV Number of labels label_count

    Algorithm Efficiency

    Consistentency of Results

    Affected by factors such as the order of nodes, the random selection of labels with the same weights, the parallel calculation and so on, the community division results of LPA may be different.

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写