Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit
Search
v4.0
    v4.0

    Label Propagation

    Overview

    Label Propagation Algorithm (LPA) is a community detection algorithm based on label propagation; it is an iterative process of nodes aggregation with the goal of label distribution stabilization. Although the results of the algorithm have a certain degree of randomness, it is very suitable for large complex networks which owes to its low time complexity and no need to specify the number of communities in advance. It is often included in the benchmark testing report in both academia and industry.

    Related material of the algorithm:

    Basic Concept

    Label

    Label is the value of a specified property of node. Label can be propagated to the 1-step neighbors of the node, nodes with label can adopt label from its 1-step neighbors. When detecting the communities, label represents the community of the node, the propagation of label is the expansion of the community.

    Consider the propagation of label a and b of the red and yellow nodes in the graph above:

    • Blue node is the common neighbor of the two nodes, it will decide which label to adopt based on the weight of label a and b, or to adopt both labels if it is allowed.
    • Red node has self-loop edge, so label a also propagates to itself.
    • Label b cannot propagate to the green node during one round of propagation; if the purple node (green node's only neighbor) adopts label b, the green node will adopt label b during the next round of propagation.

    Ultipa's LPA supports full label propagation, that is, all initial labels are valid and able to propagate. In case there is a need to specify part of the labels as valid, please contact Ultipa team for algorithm customization.

    Label Probability

    If a node can only adopt only one label from its neighbors, it is called single-label propagation, while if multiple labels are allowed to adopt, it is called multi-label propagation.

    In multi-label propagation, after a node adopts multiple labels, each label will be given a probability which is proportional to the label weight, and the sum of all label probabilities of a node equals to 1. For single-label propagation, the case is much easier as the node's only label takes a probability of 1.

    For example, if a node only adopts one label, the label probability is 1; if it adopts labels a and b with weights 1.5 and 1 respectively, the probabilities of a and b are 1.5/(1+1.5) = 0.6 and 1/(1+1.5) = 0.4 respectively.

    Label Weight

    Node j has label l, when it propagates to its neighbor node i, the weight of label l equals to the product of the probability of l, the weight of j, and the total weight of all edges between i and j; when i has multiple neighbor nodes with label l, the weight of each l needs to be summed:

    where j is any neighbor node of i that has label l , p(l) is the probability of l, w(j) is the weight of j, A(ij) is the total weight of all edges between i and j.

    It is needed to specify a numeric property of node as node weight, and a numeric property of edge as edge weight.

    Label Adoption

    The principle of label adoption is to keep the top one or several labels with the largest weights among the neighbor labels, and to calculate the label probabilities. When some labels have the same probabilities which affects the adoption decision, the system would select randomly from these parallel labels.

    For any node i in the graph, l is the neighbor label of i, w(l) is the weight of l, the new label l of i can be denoted as:

    As mentioned previously, under the circumstances of multi-label propagation, the algorithm normalizes the label weight, that is, converts label weight into a weight probability.

    When the algorithm begins, initial labels are assigned to nodes based on the algorithm parameters; during each iteration, it calculates for each node whether there is any label different from its current labels can be adopted from its neighbors, and after all nodes are calculated, updates the nodes which can adopt new labels. Iterating and looping by this rule until no node could adopt new labels, or the number of iterations reaches the limit.

    Ultipa's LPA follows the synchronous update principle when updating node labels, the algorithm has a corresponding interrupt mechanism against the label oscillations (mostly found in bipartite graph) that might occur.

    Special Case

    Lonely Node, Disconnected Graph

    For disconnected graph, there is no edge between its lonely nodes and connected components to propagate labels, connected components do not contain the same initial labels will not be divided into the same community.

    Self-loop Edge

    LPA treats the self-loop edge in the same way with the Degree algorithm algo(degree), which counts each self-loop edge twice.

    Directed Edge

    For directed edges, LPA ignores the direction of edges but calculates them as undirected edges.

    Results and Statistics

    Run LPA in the graph below, all node and edge weights are 1, the letter wrote in the node is the initial label of node, node 16 has no label, the algorithm iterates 5 rounds:

    Algorithm results 1: Keep maximum one label for each node, return _uuid, label_1 and probability_1

    _uuid label_1 probability_1
    1 C 1.000000
    2 C 1.000000
    3 C 1.000000
    4 C 1.000000
    5 C 1.000000
    6 H 1.000000
    7 H 1.000000
    8 H 1.000000
    9 H 1.000000
    10 H 1.000000
    11 N 1.000000
    12 N 1.000000
    13 N 1.000000
    14 N 1.000000
    15 O 1.000000
    16

    Algorithm statistics 1: Number of labels label_count

    label_count
    4

    Algorithm results 2: Keep maximum two labels for each node, return _uuid, label_1, probability_1, label_2 and probability_2

    _uuid label_1 probability_1 label_2 probability_2
    1 C 0.705578 A 0.294422
    2 C 0.658854 A 0.341145
    3 A 0.508263 C 0.491737
    4 C 0.688864 E 0.311136
    5 A 0.572178 C 0.427822
    6 H 0.723903 A 0.276097
    7 H 0.797693 I 0.202307
    8 H 0.645654 I 0.354346
    9 H 0.779780 A 0.220220
    10 H 0.617815 I 0.382185
    11 N 0.611782 M 0.388218
    12 N 0.611782 M 0.388218
    13 N 0.766861 M 0.233139
    14 N 0.663694 M 0.336306
    15 O 1.000000
    16

    Algorithm statistics 2: Number of labels label_count

    label_count
    8

    Command and Configuration

    • Command: algo(lpa)
    • Configurations for the parameter params():
    Name Type
    Default
    Specification Description
    node_label_property @<schema>.<property> / Numeric or string class node property, LTE needed Name of the node schema and property as label; node without this property will not participate in the calculation; random number is used as label if not set
    node_weight_property @<schema>.<property> / Numeric node property, LTE needed Name of the node schema and property as node weight; node without this property will not participate in the calculation; weight is 1 if not set
    edge_weight_property @<schema>.<property> / Numeric edge property, LTE needed Name of the edge schema and property as edge weight; edge without this property will not participate in the calculation; weight is 1 if not set
    k int 1 1~10 The maximum number of labels each node can keep; labels are ordered by probability
    loop_num int 5 >=1 Number of iterations

    Example: Run LPA, label locates on node property @default.name, iterate 5 rounds and keep no more than two labels for each node

    algo(lpa).params({
      node_label_property: @default.name,
      k: 2,
      loop_num: 5
      }) as a,b return a,b
    

    Algorithm Execution

    Task Writeback

    1. File Writeback

    Configuration
    Data in Each Row
    filename _id,label_1,probability_1,...label_k,probability_k

    Example: Run LPA, label locates on node property @default.name, iterate 5 rounds and keep no more than two labels for each node, write the algorithm results back to file named lpa

    algo(lpa).params({
      node_label_property: @default.name,
      k: 2,
      loop_num: 5
    }).write({
      file:{
        filename: "lpa"
      }
    })
    

    2. Property Writeback

    Configuration
    Writeback Content
    Type
    Data Type
    property label_1, probability_1, ... label_k, probability_k Node property Data type of label is string, data type of label probability is float

    Example: Run LPA, label locates on node property @default.name, iterate 5 rounds and keep no more than two labels for each node, write the algorithm results back to node properties named newName_1, probability_1, newName_2 and probability_2

    algo(lpa).params({
      node_label_property: @default.name,
      k: 2,
      loop_num: 5
    }).write({
      db:{
        property: "newName"
      }
    })
    

    3. Statistics Writeback

    Name Data Type Description
    label_count int Number of labels

    The number of labels means the number of labels left after the algorithm is finished; when k = 1, since each node only keeps one label, the number of labels is the number of communities.

    Example: Run LPA, label locates on node property @default.name, iterate 5 rounds and keep no more than two labels for each node, write the algorithm statistics back to task information

    algo(lpa).params({
      node_label_property: @default.name,
      k: 2,
      loop_num: 5
    }).write()
    

    Direct Return

    Alias Ordinal
    Type
    Description Column Name
    0 []perNode Node and its label, label probability _uuid, label_1, probability_1, ... label_k, probability_k
    1 KV Number of labels label_count

    Example: Run LPA, label locates on node property @default.name, iterate 5 rounds and keep no more than one label for each node, define algorithm results and statistics as aliases named results and count, and return the results and statistics

    algo(lpa).params({
      node_label_property: @default.name,
      k: 1
    }) as results, count 
    return results, count
    

    Streaming Return

    Alias Ordinal
    Type
    Description Column Name
    0 []perNode Node and its label, label probability _uuid, label_1, probability_1, ... label_k, probability_k

    Example: Run LPA, label locates on node property @default.name, iterate 5 rounds and keep no more than one label for each node, and return the results ordered by the ascending label name and node UUID

    algo(lpa).params({
      node_label_property: @default.name,
      k: 1,
      loop_num: 5
    }).stream() as lpa 
    return lpa order by lpa.label_1, lpa._uuid
    

    Real-time Statistics

    Alias Ordinal
    Type
    Description Column Name
    0 KV Number of labels label_count

    Example: Run LPA, label locates on node property @default.name, iterate 5 rounds and keep no more than one label for each node, and return the number of communities

    algo(lpa).params({
      node_label_property: @default.name,
      k: 1,
      loop_num: 5
    }).stats() as count 
    return count
    

    Consistency of Results

    Affected by factors such as the order of nodes, the random selection of labels with the same weights, the parallel calculation and so on, the community division results of LPA may be different.

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写