Change Password

Please enter the password.
Please enter the password. Between 8-64 characters. Not identical to your email address. Contain at least 3 of: uppercase, lowercase, numbers, and special characters.
Please enter the password.
Submit

Change Nickname

Current Nickname:
Submit

Apply New License

License Detail

Please complete this required field.

  • Ultipa Graph V4

Standalone

Please complete this required field.

Please complete this required field.

The MAC address of the server you want to deploy.

Please complete this required field.

Please complete this required field.

Cancel
Apply
ID
Product
Status
Cores
Applied Validity Period(days)
Effective Date
Excpired Date
Mac Address
Apply Comment
Review Comment
Close
Profile
  • Full Name:
  • Phone:
  • Company:
  • Company Email:
  • Country:
  • Language:
Change Password
Apply

You have no license application record.

Apply
Certificate Issued at Valid until Serial No. File
Serial No. Valid until File

Not having one? Apply now! >>>

Product Created On ID Amount (USD) Invoice
Product Created On ID Amount (USD) Invoice

No Invoice

Search
    English

      Louvain

      Overview

      Louvain algorithm is a community detection algorithm based on modularity calculations, which is an iterative process of nodes aggregation with the goal of maximizing the modularity. Proposed in 2008 by Vincent D.Blondel et al. of Université catholique de Louvain in Belgium, this algorithm has become the most mentioned and used community detection algorithm in recent years because of its high efficiency and satisfying outcomes.

      Related materials of the algorithm:

      Basic Concept

      Weighted Degree

      Weighted degree is the degree that takes the weight of edge into consideration. Louvain algorithm uses the concepts of weighted degree of node and weighted degree of community when calculating modularity.

      • Weighted degree of node refers to the sum of the weights of edges attached to the node, including adjacent edges (connect to other nodes) and self-loop edges (connect to the node itself).
      • Weighted degree of community refers to the sum of weighted degrees of all nodes in a community.
      • Weighted degree inside community refers to when calculating the weighted degree of a community, only to consider the edges whose both endpoints are in the community; or, weighted degree inside community can be obtained by deducting the weights of edges between the community and other communities from the weighted degree of community.
      • Weighted degree of whole graph refers to the sum of weighted degrees of all nodes in the graph. If divides the whole graph into multiple communities, weighted degree of whole graph also equals to the sum of weighted degrees of those communities, since each node belongs to one and only one community.

      Community Compression

      Louvain algorithm uses a large amount of community compression to improve the subsequent (iteration) computing speed by minimizing the number of nodes and edges in the graph without changing the weighted degree of the local or the whole graph. After compression, nodes in the community will be calculated as a whole, but no longer separated, for modularity optimization, thus achieving a hierarchical (iterative) community division effect.

      Community compression is to use one aggregated node to represent all nodes in a community, the weighted degree inside community is the weight of the self-loop edge of the aggregated node, and the sum of weights of edges between every two communities is the weight of the edge between the two corresponding aggregated nodes.

      Modularity

      Geometrically, modularity attempts to calculate the weighted degree to compare the closeness of nodes within and between communities.

      Let 2m denote the weighted degree of the whole graph, if C is any community in the graph, Σ(tot) is the weighted degree of C, Σ(in) is the weighted degree inside C, then the modularity Q can be calculated by:

      The value range of modularity is [-1, 1]; for connected graph (or connected subgraph), the value range of modularity is [-1/2, 1]. Modularity has been used to measure the quality of the community division, the higher the modularity, the more reasonable the community division.

      When programming the algorithm, the modularity can be calculated using the following formula:

      where 2m is the weighted degree of whole graph; i and j are any two nodes in the graph; δ is 1 when i and j belong to the same community, otherwise δ is 0.

      Gain of Modularity

      Gain of modularity refers to how much the modularity has increased by changing the community division. When adjusting the community of a node, Louvain algorithm decides whether to move the node by evaluating the gain of modularity. The threshold of the gain of modularity is a float number greater than 0, the modularity is regarded as not improved when ΔQ is under the threshold.

      Modularity optimization is an NP-Hard problem. Louvain algorithm adopts the heuristic algorithm to optimize the modularity in multiple rounds of compound iterations. Each large cycle is divided into two phases:

      • 1st phase: Iteration. At the beginning of this phrase, each node is considered as an independent community; in each round of iteration, calculate for each node whether there is a community of its neighbors, a maximum and positive ΔQ can be obtained by placing the node to that community (i.e. to maximize the modularity); if found, move the node to the new community; the same calculation and adjustment are applied sequentially to all nodes, then enter the next round of iteration. The first phase iteratively loops by this rule until no node can be reallocated, or the number of iterations reaches the limit.
      • 2nd phase: Community compression. Compress the communities divided in the first phase to get a new graph. If the structure of the new compressed graph is consistent with the graph at the beginning of the current round of large cycle, that is, the modularity is not improved, then the algorithm ends, otherwise the new graph is used as the initial graph of the next round of large cycle.

      Special Case

      Lonely Node, Disconnected Graph

      If lonely node exists in the graph, the lonely node must form a community of its own, it can not be combined with other nodes no matter how many rounds of cyclic iterations it undergoes. The reason is that lonely node has no adjacent edge, means that k(i,in) of lonely node to any other community or node is 0, the ΔQ that generated when it is moved to other community is negative.

      For disconnected graph, there is no edge between different disconnected areas, nodes in different components cannot be combined, thus each disconnected area is an independent community, the community division by the Louvain algorithm is only meaningful inside the connected component.

      Self-loop Edge

      The way Louvain algorithm counts the weighted degree of self-loop edge is different from the Degree algorithm algo(degree). In Degree algorithm, each self-loop edge is counted twice, while each self-loop edge is counted only once in Louvain algorithm.

      As shown in the graph above, the weighted degree of the red node in Degree algorithm is 1 + 0.5 + 3 + 1.5 * 2 = 7.5, the weighted degree of the red node in Louvain algorithm is 1 + 0.5 + 3 + 1.5 = 6.

      Directed Edge

      For directed edges, Louvain algorithm ignores the direction of edges but calculates them as undirected edges.

      Results and Statistics

      Run Louvain algorithm on the graph below, the weight of each edge is 1, the threshold of the gain of modularity is 0.01, and the maximum number of iterations of the 1st phase is 5:

      Algorithm results 1: Return node and its community ID, i.e. _uuid, community

      _uuid community_id
      11 2
      12 2
      13 2
      14 2
      6 13
      7 13
      8 13
      9 13
      10 13
      1 11
      2 11
      3 11
      4 11
      5 11

      Algorithm results 2: Return community and its number of nodes, i.e. community_id, count

      community_id count
      2 4
      13 5
      11 5

      Algorithm statistics: The number pf communities community_count and modularity modularity

      community_count modularity
      3 0.434948979591837

      Command and Configuration

      • Command: algo(louvain)
      • Configurations for the parameter params():
      Name
      Type
      Default
      Specification
      Description
      phase1_loop_num int 5 >=1 The maximum number of iterations in the first phase of the algorithm
      min_modularity_increase float 0.01 0~1 The threshold of the gain of modularity
      edge_schema_property []@<schema>?.<property> / Numeric edge property, LTE needed Edge weight property/properties, schema can be either carried or not; edge without any specified property will not participate in the calculation; all weights are 1 if not set
      limit int -1 >=-1 Number of results to return; return all results if sets to -1 or not set
      order string / ASC or DESC, case insensitive The sorting rule, only effective with the Streaming Return execution method when the mode set to 2 (community:id/count)

      Example: Run Louvain algorithm in the graph, set the threshold of the gain of modularity as 0.1, the number of iterations in the first phase as 3

      algo(louvain).params({ 
        phase1_loop_num: 3, 
        min_modularity_increase: 0.1 
      }) as com
      return com
      

      Algorithm Execution

      Task Writeback

      1. File Writeback

      Configuration Data in Each Row
      filename_community_id _id``community_id
      filename_ids community_id,_id,_id,...
      filename_num community_id,count

      Example: Run Louvain algorithm in the graph, set the threshold of the gain of modularity as 0.1, the number of iterations in the first phase as 5, write the algorithm results back to files named communityID, ids and num

      algo(louvain).params({ 
        phase1_loop_num: 5, 
        min_modularity_increase: 0.1 
      }).write({
        file:{
          filename_community_id: "communityID",
          filename_ids: "ids",
          filename_num: "num"
        }
      })
      

      2. Property Writeback

      Configuration Writeback Content Type Data Type
      property community_id Node property int64

      Example: Run Louvain algorithm in the graph, set the threshold of the gain of modularity as 0.1, the number of iterations in the first phase as 5, write the algorithm results back to node property named communityID

      algo(louvain).params({ 
        phase1_loop_num: 5, 
        min_modularity_increase: 0.1 
      }).write({
        db:{
          property: "communityID"
        }
      })
      

      3. Statistics Writeback

      Name Data Type Description
      community_count int Number of communities
      modularity double Modularity

      Example: Run Louvain algorithm in the graph, use the default threshold of the gain of modularity as 0.01 and 5 iterations in the first phase, write the algorithm statistics back to task information

      algo(louvain).params().write()
      

      Direct Return

      Alias Ordinal
      Type
      Description Column Name
      0 []perNode Node and its community ID _uuid, community_id
      1 KV Number of communities, modularity community_count, modularity

      Example: Run Louvain algorithm in the graph, set the threshold of the gain of modularity as 0.1, the number of iterations in the first phase as 5, define algorithm results and statistics as aliases named results and stats, and return the results and statistics

      algo(louvain).params({ 
        phase1_loop_num: 5, 
        min_modularity_increase: 0.1 
      }) as results, stats
      return results, stats
      

      Streaming Return

      Configurations for the parameter stream():

      Name
      Type
      Default
      Specification
      Description
      mode int 1 1 or 2 1 means to return the community ID of each node, 2 means to return the number of nodes in each community
      Alias Ordinal
      Type Description Column Name
      0 []perNode or
      []perCommunity
      Node and its community ID or Community and its number of nodes _uuid, community_id or community_id, count

      Example: Run Louvain algorithm in the graph, set the threshold of the gain of modularity as 0.1, the number of iterations in the first phase as 5, define algorithm results (community and its number of nodes) as alias named results, and return the results ordered by the descending number of nodes in the community

      algo(louvain).params({ 
        phase1_loop_num: 5, 
        min_modularity_increase: 0.1,
        order: "desc"
      }).stream({
        mode: 2
      }) as results
      return results
      

      Example: Run Louvain algorithm in the graph, set the threshold of the gain of modularity as 0.1, the number of iterations in the first phase as 5, define algorithm results (node and its community ID) as alias named results, and return the results ordered by the ascending UUID of nodes

      algo(louvain).params({ 
        phase1_loop_num: 5, 
        min_modularity_increase: 0.1 
      }).stream() as results
      return results order by results._uuid
      

      Real-time Statistics

      Alias Ordinal
      Type
      Description Column Name
      0 KV Number of communities, modularity community_count, modularity

      Example: Run Louvain algorithm in the graph, set the threshold of the gain of modularity as 0.1, the number of iterations in the first phase as 5, define algorithm statistics as alias named stats, and return the statistics

      algo(louvain).params({ 
        phase1_loop_num: 5, 
        min_modularity_increase: 0.1 
      }).stats() as stats
      return stats
      

      Algorithm Efficiency

      Louvain algorithm achieves lower time complexity than previous community detection algorithms through its improved greedy optimization, which is usually regarded as O(N*LogN), where N is the number of nodes in the graph, and the result is more intuitive. That is to say, in theory, in a graph with 10,000 nodes, the complexity of Louvain algorithm is O(40000); in a connected graph with 100 million nodes, the algorithm complexity is O(800000000). In fact, according to the above detailed breakdown of the algorithm process, we could see that the complexity of Louvain algorithm depends on both the number of nodes and the number of edges, roughly speaking, it should be O(N * E/N) = O(E), where E is the number of edges in the graph - because the dominant algorithm logic of Louvain is to calculate the weights of edges attached to each node.

      The table below shows the performance comparison of Louvain algorithm and other community detection algorithms, it is given by the original author of Louvain algorithm in his paper. It can be seen that Louvain achieves some significant (exponential) increase in both modularity and efficiency:

      This table gives the performances of the algorithm of Clauset, Newman and Moore, of Pons and Latapy, of Wakita and Tsurumi and of Louvain for community detection in networks of various sizes. For each method/network, the table displays the modularity that is achieved and the computation time. Empty cells correspond to a computation time over 24 hours.

      The system architecture, data structure, programming language, etc. would cause huge difference in the final result and efficiency of Louvain algorithm, even though they are all implementations of Louvain algorithm. For example, the serial Louvain algorithm achieved by Python can cost hours in just 10,000 grade small graphs; besides, the difference in data structure also leads to significant performance gaps as the algorithm calculates the degree of nodes and the weight of edges frequently. The native Louvain algorithm adopts C++, but it is a serial implementation, so the time consumption can be reduced by using parallel computation as much as possible, thereby improving the efficiency of the algorithm.

      For medium-sized graphset with tens of millions of nodes and edges, Ultipa's Louvain algorithm can be completed in real time literally; for large graphs with over 100 million nodes and edges, it can be implemented in seconds to minutes. In addition, operations such as whether to writeback to database property or to disk file also have impact on the time consumption of the entire algorithm. The table below demonstrates the modularity and execution time of Louvain algorithm running in a graphset with 5 million nodes and 100 million edges, note that the computation process costs ~1 minute, and operations to writeback to the database or generate disk file costs additional ~1 minute.

      Consistency of Results

      Affected by factors such as the order of nodes, the parallel computation, and the execution logic of the heuristic algorithm on local data, the community division results of Louvain algorithm may vary with each execution. But the overall trend will not change much (as indicated in the table above).

      Please complete the following information to download this book
      *
      公司名称不能为空
      *
      公司邮箱必须填写
      *
      你的名字必须填写
      *
      你的电话必须填写