Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit
Search
v2.x
    v4.0

    Louvain Community Recognition

      Advanced     Whole Graph     Community Detection  

    Overview

    Louvain algorithm is a community recognition algorithm based on modularity calculations, which is an iterative process of aggregating nodes with the goal of maximizing the modularity. Proposed in 2008 by Vincent D.Blondel et al. of Université catholique de Louvain in Belgium, this algorithm has become the most mentioned and used community recognition algorithm in recent years because of its high efficiency as well as the satisfying outcomes.

    Related materials of the algorithm are as below:

    Basic Concept

    Weighted Degree

    Weighted degree is the caculation of degree that takes the weight of edge into consideration. Louvain algorithm uses the concepts of weighted degree of node and weighted degree of community when calculating modularity.

    Weighted degree of node refers to the sum of the weights of all edges related to a node (the node as an endpoint), including adjacent edges (connect to other nodes) and self-loop edges (connect to the node itself) of the node.

    As shown in the graph above, the red node has 3 adjacent edges and 1 self-loop edge, thus the weighted degree of the node is 1 + 0.5 + 3 + 1.5 = 6 (Note: the weight of self-loop edge is counted only once).

    Weighted degree of community refers to the sum of weighted degrees of all nodes in a community.

    As shown in the graph above, the weighted degree of the red node is 1 + 0.5 + 3 + 1.5 = 6, the weighted degree of the green node is 1 + 1.7 = 2.7, the weighted degree of the blue node is 0.5 + 0.3 + 2 = 2.8, the weighted degree of the yellow node is 3, thus the weighted degree of community I is 6 + 2.7 + 2.8 + 3 = 14.5.

    Weighted degree inside community refers to when calculating the weighted degree of a community, only to consider the edges whose both endpoints are in the community; or, weighted degree inside community can be obtained by deducting the weights of edges between the community and other communities from the weighted degree of community.

    As shown in the previous graph, there are 3 edges between community I and the other two communities, with weights of 1.7, 0.3 and 2 respectively, thus the weighted degree inside community I is 14.5 - 1.7 - 0.3 - 2 = 10.5.

    Note that the weighted degree inside community is not the sum of weights of edges whose both endpoints are in the community, but is the doubled sum of weights of non-self-loop edges plus the sum of weights of self-loop edges. The reason is that the two endpoints of a non-self-loop edge would cause the edge to be counted twice. In other words, inside the community, except the self-loop edges, which are only counted once, other edges are counted twice - because each edge connects two nodes, according to the definition of the weighted degree inside community, the weight of edge needs to x2.

    Using the previous graph to verify, the weighted degree inside community I can be calculated as (1 + 0.5 + 3) * 2 + 1.5 = 10.5.

    Weighted degree of whole graph refers to the sum of weighted degrees of all nodes in the graph. If divides the whole graph into multiple communities, weighted degree of whole graph also equals to the sum of weighted degrees of those communities, since each node belongs to one - and only one - community.

    As shown in the graph above, the weighted degree of community I is 14.5, the weighted degree of community II is 0.7 * 3 * 2 + 1.7 = 5.9, the weighted degree of community III is 1 * 6 * 2 + 0.3 + 2 = 14.3, thus the weighted degree of whole graph is 14.5 + 5.9 + 14.3 = 34.7.

    If the whole graph is regarded as one community, the weighted degree of whole graph can also be understood as the weighted degrees inside the community. Also verified with the above graph, the weighted degree of whole graph is (1 + 0.5 + 3 + 1.7 + 0.7 * 3 + 0.3 + 2 + 1 * 6) * 2 + 1.5 = 34.7.

    The results of the above two calculation methods can be corroborated by each other and should be consistent.

    Community Compression

    Louvain algorithm uses a large amount of community compression to improve the subsequent (iteration) computation speed by minimizing the number of nodes and edges in the graph, without changing the weighted degree of the local or the whole graph; after compression, nodes in the community will be calculated as a whole, but no longer separated, for modularity optimaization, thus achieving a hierarchical (iterative) community division effect.

    Community compression is to use one aggregated node to represent all nodes in every community, the weighted degree inside community is the weight of the self-loop edge of the aggregated node, and the sum of weights of edges between every two communities is the weight of the edge between the two corresponding aggregated nodes.

    As shown in the graph above, after compressing the three communities on the left, the weight of the self-loop edge of community I is the weighted degree inside community I, which is 10.5; the weight of the edge between it and community II is 1.7, and the weight of the edge between it and community III is 0.3 + 2 = 2.3; the weight of the self-loop edge of community II is 0.7 * 3 * 2 = 4.2, the weight of the self-loop edge of community III is 1 * 6 * 2 = 12; thus the weighted degree of whole graph after compression is 10.5 + 4.2 + 12 + (1.7 + 2.3) * 2 = 34.7, same as before compression.

    Modularity

    Geometrically, modularity attempts to calculate the weighted degree to compare the closeness of node connections within and between communities.

    Let 2m denote the weighted degree of the whole graph, if C is any community in the graph, Σ(tot) is the weighted degree of C, Σ(in) is the weighted degree inside C, then the modularity Q can be computed by:

    The value range of modularity is [-1, 1]; for connected graph (or connected subgraph), the value range of modularity is [-1/2, 1].

    Modularity has been used to measure the quality of the community division, the higher the value of modularity, the more reasonable the community division. Comparing the two methods of community division in the following graph, the difference is to which community the red node belongs, intuitively we might think that the division on the left side is more reasonable.

    Assuming that all weights of edges in the graph above are 1, then the weighted degree of whole graph is 1 + 8 * 2 = 17, and

    • on the left side, the weighted degree of community I is 10, the weighted degree inside community I is 9; the weighted degree of community II is 7, the weighted degree inside community II is 6; the modularity after division is 0.3668;
    • on the right side, the weighted degree of community I is 13, the weighted degree inside community I is 11; the weighted degree of community II is 4, the weighted degree inside community II is 2; the modularity after division is 0.1246.

    The result of the calculation is in line with visual expectations.

    Gain of Modularity

    Gain of modularity refers to how much the modularity has increased by changing the community division. When adjusting the community that a node belongs to, Louvain algorithm decides whether to move the node by evaluating the gain of modularity.

    Let i denote a node in the graph and c a community which does not have i; k(i) is the weighted degree of i, which is the contribution of i to the weighted degree of community when it joins the community; k(i,in) is 2 times of the sum of weights of edges between i and a community, which is the contribution of i to the weighted degree inside community when it joins that community; if i does not belong to any community, the gain of modularity ∆Q obtained by moving i into community c can be computed by:

    Let a and b denote two communities that do not contain i, the gain of modularity obtained by moving i from community {a,i} to community b can be computed by:

    Reviewing the previous graph, calculate the gain of modularity produced by moving the red node from community a to community b:

    As shown in the graph, the weighted degree of whole graph 2m is 17, k(i) of the red node is 3; Σ(tot) of community a is 10, k(i,in) is 2; Σ(tot) of community b is 4, k(i,in) is 4; the gain of modularity obtained by the formula is (4 - 2) / 17 - 2 * 3 * (4 - 10) / (17 * 17) = 0.2422, same with the previous calculation result of 0.3668 - 0.1246 = 0.2422.

    Algorithm Process

    Formula

    When programming the algorithm, the modularity can be calculated using the following formula:

    where 2m is the weighted degree of whole graph, i and j are any two nodes in the graph, δ is 1 when i and j belong to the same community, otherwise δ is 0, i.e. the summation calcualtion in the formula only performs when i and j belong to the same community:

    • when i and j are differnet, Aij is the sum of weights of edges between i and j, and since Aij = Aji, the sum of Aij is 2 times of the sum of weights of non-self-loop edges in the community;
    • when i and j are the same, Aij is the sum of weights of self-loop edges of the node, the sum of Aij after is the sum of weights of self-loop edges in the community;
    • combining the above two cases, the sum of Aij is the weighted degree inside community Σin;

    k is the weighted degree of a node, the sum of ki·kj can be disassembled into the production of the summation result of ki and the summation result of kj, that is, the squre of the weighted degree of community (Σtot)^2; so far, the above formula is fully equivalent to the previously defined modularity.

    Iteration

    Modularity optimation is an NP-Hard problem. Louvain algorithm adopts the heuristic algorithm to optimize the modularity in multiple rounds of compound iterations. Each large cycle is divided into two phases:

    • 1st phase: Iteration. At the beginning of this phrase, each node is considered as an independent community; in each round of iteration, calculate for each node whether there is a community of its neighbors, a maximum and positive ΔQ can be obtained by placing the node to that cummunity (i.e. to maximize the modularity); if found, move the node to the new community; the same calculation and adjustment are appied to the next node and all nodes, then enter the next round of iteration. The first phrase iteratively loops by this rule until no nodes can be reallocated, or the number of iterations reaches the limit.
    • 2nd phase: Community compression. Compress the communities divided in the first phase to get a new graph. If the new graph after compression is consistent with the graph structure at the beginning of the current round of large cycle, that is, the modularity is not improved, then the algorithm ends, otherwise the new graph is used as the initial graph of the next round of large cycle.

    Considering the computational convergence of large graphs, the program introduces a threshold of the gain of modularity when judging whether the modularity has improved, which is a floating-point data greater than 0, and the effect is to determine that the modularity has not improved when ΔQ is below the threshold.

    Example

    Executing Louvain community recognition on the graph below, the weight of each edge is 1, and the threshold of the gain of modularity is 0.

    • 1st Cycle, 1st phase, 1st iteration:
    Current Node Move to ΔQ Selection Community Adjustment Result
    a →{b}
    →{c}
    22/(17*17)
    22/(17*17)

     

    b →{d} 0 Same as above
    c →{a,b}
    →{d}
    14/(17*17)
    22/(17*17)


    d →{a,b}
    →{e}
    -18/(17*17)
    -6/(17*17)
    Same as above
    e →{c,d}
    →{f}
    →{g}
    4/(17*17)
    22/(17*17)
    22/(17*17)


     

    f →{g} 4/(17*17)
    g →{e} -4/(17*17) Same as above
    • 1st Cycle, 1st phase, 2nd iteration:
    Current Node Move to ΔQ Selection Community Adjustment Result
    a →{c,d} -18/(17*17) Same as above
    b →{c,d} -8/(17*17) Same as above
    c →{a,b} -8/(17*17) Same as above
    d →{a,b}
    →{e}
    -18/(17*17)
    -6/(17*17)
    Same as above
    e →{c,d}
    →{f,g}
    4/(17*17)
    44/(17*17)


    f N/A
    g N/A
    • 1st Cycle, 1st phase, 3rd iteration: no node can be moved, the process is omitted.

    • 1st Cycle, 2nd phase:

    • 2nd Cycle, 1st phase, 1st iteration:
    Current Node Move to ΔQ Selection Community Adjustment Result
    A →{B} 18/(17*17)
    B →{C} -54/(17*17) Same as above
    C →{A.B} -106/(17*17)) Same as above
    • 2nd Cycle, 1st phase, 2nd iteration: no node can be moved, the process is omitted.

    • 2nd Cycle, 2nd phase:

    • 3rd Cycle, 1st phase, 1st iteration: no node can be moved, the process is omitted.

    • 3rd Cycle, 2nd phase: Before and after the community compression, the structure of the graph is the same, so the algorithm ends.

    Conclusion: The original graph is divided into two communities, {a,b,c,d} and {e,f,g}

    Special Case

    Lonely Node, Disconnected Graph

    If lonely node exists in the graph, the lonely node must form a community of its own, and it can not be combined with other nodes no matter how many rounds of cyclic iterations it undergoes. The reason is that lonely node has no adjacent edge, means that k(i,in) of lonely node to any other community or node is 0, the ΔQ that generated when it is moved to other community is negative.

    For disconnected graph, there is no edge between different disconnected areas, and nodes from different areas can not be combined, thus each disconnected area is an independent community, the community division of Louvain algorithm only meaningful inside the connected area.

    Self-loop Edge

    The way Louvain algorithm counts the weighted degree of self-loop edge is different from the Degree algorithm algo(degree). In Degree algorithm, each self-loop edge is counted twice, while each self-loop edge is counted only once in Louvain algorithm.

    As shown in the graph above, the weighted degree of the red node in Degree algorithm is 1 + 0.5 + 3 + 1.5 * 2 = 7.5, however, the weighted degree of the node in Louvain algorithm is 1 + 0.5 + 3 + 1.5 = 6.

    Directed Edge

    For directed edges, Louvain algorithm ignores the direction of edges but calculates them as undirected edges.

    Command

    algo(louvain).params(<>)

    Configuration Item Type Default Value Specification Description
    phase1_loop_num int 5 1~20 The maximum number of iterations in the first phase of the algorithm
    min_modularity_increase float 0.01 0.01~1 The threshold of the gain of modularity
    edge_schema_property []@<schema>?.<property> (Weight of 1) Numeric edge property, LTE needed Name of the property where the edge weight locates, schema can be either carried or not, multiple properties can be specified; edges without this property will not participate in the community recognition calculation
    limit int -1 -1 or >=0 Number of results to return, -1 means to return all results
    order string / ASC or DESC, case insensitive The sorting rule under the "community:id/count" mode

    Example: Run Louvain, set the threshold of the gain of modularity as 0.1, the rounds of iterations in the first phase of the algorithm as 3

    algo(louvain).params({ phase1_loop_num: 3, min_modularity_increase: 0.1 })
    

    File Writeback

    .write({file: {<>}})

    Parameter Type Default Value Specification Description
    filename_community_id string / / Name of the file path to be written back. Columns of the file are: _idcommunity_id
    filename_ids string / / Name of the file path to be written back. Columns of the file are: community_id , _id, _id, ...
    filename_num string / / Name of the file path to be written back. Columns of the file are: community_id, count

    Example: Run Louvain and write back two files, the community ID of each point and the number of nodes in each community

    algo(louvain).params({ phase1_loop_num: 5, min_modularity_increase: 0.01 })
      .write({ 
        file:{
          filename_community_id: "my_community.csv",
          filename_num: "my_community_stats.csv"
        }
      })
    

    Property Writeback

    .write({db: {<>}})

    Parameter Type Default Value Specification Description
    property string "" / Name of the node property to be written back. Write the community ID back to: <property>

    Example: Run Louvain and write the community ID back to the property community_id of all nodes

    algo(louvain).params({ phase1_loop_num: 5, min_modularity_increase: 0.01 })
      .write({ 
        db:{
          property: "community_id"
        }
      })
    

    Statistics Writeback

    .write() and when executing other writeback operations:

    Name Type Description
    community_count int Number of communities
    modularity double Modularity

    Direct Return

    as <alias>, <alias>, ... return <alias>, <alias>, ...

    Alias Number Type Description Column Name
    0 []perNode Node and its community ID _uuid, community_id
    1 KV Number of communities, modularity community_count, modularity

    Example: Run Louvain and return the community IDs in real-time

    algo(louvain).params({ phase1_loop_num: 5, min_modularity_increase: 0.01 }) as row
    return row._uuid, row.community_id limit 10
    

    Example: Run Louvain and return the number of communities in real-time

    algo(louvain).params({ phase1_loop_num: 5, min_modularity_increase: 0.01 }) as row, stats
    return stats.community_count, stats.modularity limit 10
    

    Streaming Return

    .stream() as <alias> return <alias>

    Configuration Item Type Default Value Specification Description
    mode int 1 1 or 2 1: Return the community ID of each node
    2: Return the number of nodes in each community
    Alias Number Type Description Column Name
    0 []perNode or
    []perCommunity
    Node and its community ID or
    Community and its number of nodes
    _uuid, community_id or
    community_id, count

    Example: Run Louvain and return the statistics of community ID and the number of nodes in each community in the form of homologous data streams in real-time

    algo(louvain).params({ phase1_loop_num: 5, min_modularity_increase: 0.01 })
      .stream({ 
        mode: 2
      }) as row
    return row.community_id, row.count limit 10
    

    Note: When the lengths of homologous data streams are different, the lengths of various streams stay unchanged when following return, but are trimmed by the length of the shortest stream when following with. About the difference between homologous and non-homologous data stream, please read the related chapter Data Stream, Clause in UQL documentation.

    Real-time Statistics

    .stats() as <alias> return <alias>

    Alias Number Type Description Column Name
    0 KV Number of communities, modularity community_count, modularity

    Example: Run Louvain and return modularity and number of communities in real-time

    algo(louvain).params({ phase1_loop_num: 5, min_modularity_increase: 0.01 })
      .stats() as row
    return row.modularity, row.community_count
    

    Algorithm Efficiency

    Louvain community recognition algorithm achieves lower time complexity than previous community division algorithms through its imporved greedy optimization, which is usually regarded as O(N*LogN), where N is the number of nodes in the graph, and the results are more intuitive. That is to say, in theory, on the graph with 10,000 nodes, the complexity of Louvain algorithm is O(40000), while on the connected graph with 100 million nodes, the algorithm complexity is O(800000000). In fact, according to the above detailed break down of the algorithms steps, the complexity of Louvain algorithm depends on both the number of nodes and the number of edges, roughly speaking, it should be O(N * E / N) = O(E), where E is the number of edges in the graph - because the dominant algorithm logic in Louvain community recognition is to calcualte the weights of edges related with each node (Refer to the Algorithm Process - Example above).

    The table below shows the compared performance of Louvain algorithm and other community recognition algorithms, it is given by the original author of Louvain algorithm in his paper. It can be seen that a significant (exponential) increase has been achieved in both modulariy and efficiency:

    louvain comlexity.png

    The architecture, data structure, programming lanaguage, etc. of diversed systems would cause huge difference in the final result and efficiency of Louvain algorithm, even though they are all implementations of Louvain algorithm. For example, the serial Louvain community recognition achieved in Python can cost hours on just 10,000 grade small graphs; besides, the difference in data structure also leads to significant performance gaps when calculating the degree of nodes and the weight of edges frequently. The native Louvain algorithm adopts C++, but it is a serial implementation, so the time consumption can be reduced by using parallel computation as much as possible, thereby improving the efficiency of the algorithm.

    For medium-sized graphset with tens of millions of nodes and edges, Ultipa's Louvain algorithm can be completed in real-time literally; for large graphs above 100 million, it can be implemented in seconds to minutes. In addition, operations such as whether to writeback to database property and whether to write disk file, also have impact on the time consumption of the entire algorithm. The table below demonstrates the modularity and execution time of Louvain algorithm running on a graphset with 5 million nodes and 100 million edges, note that the computation process costs ~1 minute, and the need to writeback to the database and generate disk file costs additional ~1 minute.

    louvain execution time.png

    Consistentency of Results

    Affected by factors such as the order of nodes, the parallel computation, and the execution logic of the heuristic algorithm on part of the data, the community division results of Louvain algorithm may vary with each execution. But the overall trend will not change much (as indicated in the table above).

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写