Louvain

Change Password

Submit

Change Email

Submit

Change Nickname

Current Nickname:

Submit

Profile

Account ID:

Full Name:
Phone:
Company:
Company Email:

Change Password

Apply

You have no license application record.

Apply

Certificate	Issued at	Valid until	Serial No.	File

Serial No.	Valid until	File

Not having one? Apply now! >>>

Product	Created On	ID	Amount (USD)	Invoice

Product	Created On	ID	Amount (USD)	Invoice

No Invoice

Create Ultipa Account

I agree to the Privacy Policy and the

Data Processing Agreement .

Please agree to continue.

Already have an Ultipa account? Sign in now!

Forgot Password

Reset Password

Back to sign in

Louvain

HDC

Overview

The Louvain algorithm is a widely recognized and extensively used algorithm for community detection in graphs. It is named after the location of its authors - Vincent D. Blondel et al. from Université catholique de Louvain in Belgium. The primary objective of the algorithm is to maximize the modularity of the graph, and it has gained popularity due to its high efficiency and the quality of its results.

V.D. Blondel, J. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks (2008)
H. Lu, Parallel Heuristics for Scalable Community Detection (2014)

Concepts

Modularity

In many networks, nodes tend to naturally form groups or communities, characterized by dense connections within a community and relatively sparse connections between communities.

Consider an equivalent network G' to G, where G' remains the same community partition and the same number of edges as in G, but the edges are placed randomly. If G exhibits a good community structure, the ratio of intra-community edges to the total number of edges in G should be higher than the expected ratio in G'. A larger disparity between the actual ratio and the expected value indicates a more pronounced presence of a community structure in G. This forms the original concept of modularity. The modularity is one of the widely used methods to evaluate the quality of a community partition, the Louvain algorithm is designed to find partitions that maximize the modularity.

The modularity ranges from -1 to 1. A value close to 1 indicates a stronger community structure, while negative values suggest that the partition is not meaningful. For a connected graph, the modularity value ranges from -0.5 to 1.

Considering the weights of edges in the graph, the modularity (Q) is defined as

where,

m is the total sum of edge weights in the graph;
A_ij is the sum of edge weights between nodes i and j, and 2m = ∑_ijA_ij;
k_i is the sum of weights of all edges attached to node i;
C_i represents the community to which node iis assigned, δ(C_i,C_j) is 1 if C_i= C_j, and 0 otherwise.

Note, $\frac{k_{i} k_{j}}{2m}$ is the expected sum of weights of edges between nodes i and j if edges are placed at random. Both A_ij and $\frac{k_{i} k_{j}}{2m}$ are divided by 2m because each pair of distinct nodes in a community is considered twice, such as A_ab = A_ba, $\frac{k_{a} k_{b}}{2m}$ = $\frac{k_{b} k_{a}}{2m}$ .

We can also write the above formula as the following:

where,

$\sum_{in}^{c}$ is the sum of weights of edges inside community C, i.e., the intra-community weight;
$\sum_{tot}^{c}$ is the sum of weights of edges incident to nodes in community C, i.e, the total-community weight;
m has the same meaning as above, and 2m = ∑_c $\sum_{tot}^{c}$ .

Nodes in this graph are assigned into 3 communities, take community C₁ as example:

$\sum_{in}^{C_{1}}$ = A_aa + A_ab + A_ac + A_ad + A_ba + A_ca + A_da = 1.5 + 1 + 0.5 + 3 + 1 + 0.5 + 3 = 10.5
( $\sum_{tot}^{C_{1}}$ )² = k_ak_a + k_ak_b + k_ak_c + k_ak_d + k_bk_a + k_bk_b + k_bk_c + k_bk_d + k_ck_a + k_ck_b + k_ck_c + k_ck_d + k_dk_a + k_dk_b + k_dk_c + k_dk_d + = (k_a + k_b + k_c + k_d)² = (6 + 2.7 + 2.8 + 3)² = 14.5²

The Louvain algorithm starts from a singleton partition, in which each node is in its own community. Then algorithm iteratively runs through passes, and each pass is made of two phases.

First Phase: Modularity Optimization

For each node i, consider its neighbors j of i, compute the gain of modularity (ΔQ) that would take place by reassigning i from its current community to the community of j.

Node i is then placed in the community that offers the maximum ΔQ, but only if ΔQ is greater than a predefined positive threshold. If no such gain is possible, node i stays in its original community.

Take the above graph as example, nodes in the same community are denoted in the same color. If now considers node d, the respective gains of modularity of moving it to the community {a,b}, {c}, and {e,f} are:

ΔQ_d→{a,b} = Q_{a,b,d} - (Q_{a,b} + Q_{d}) = 52/900
ΔQ_d→{c} = Q_{c,d} - (Q_{c} + Q_{d}) = 72/900
ΔQ_d→{e,f} = Q_{d,e,f} - (Q_{e,f} + Q_{d}) = 36/900

If ΔQ_d→{c} is greater than the predefined threshold of ΔQ, node d will be moved to community {c}, otherwise it stays in its original community.

This process is sequentially applied for all nodes and repeated until no individual move can improve the modularity or the maximum loop number is reached, completing the first phase.

Second Phase: Community Aggregation

In the second phase, each community is aggregated into a node, each aggregated node has a self-loop with weight corresponds to the intra-community weight. The weights of edges between these new nodes are given by the sum of weights of the edges between nodes in the corresponding two communities.

Community aggregation reduces the number of nodes and edges in the graph without altering the weight of the local or the entire graph. After compression, nodes within a community are treated as a whole, but they are no longer isolated for modularity optimization, achieving a hierarchical (iterative) community division effect.

Once this second phase is completed, another pass is applied to the aggregate graph. The passes are iterated until there are no more changes, and a maximum modularity is attained.

Considerations

If node i has any self-loop, when calculating k_i, the weight of self-loop is counted only once.
The Louvain algorithm ignores the direction of edges but calculates them as undirected edges.
The output of the Louvain algorithm may vary with each execution, depending on the order in which the nodes are considered. However, it does not have a significant influence on the modularity that is obtained.

Example Graph

To create this graph:

// Runs each row separately in order in an empty graphset
create().edge_property(@default, "weight", float)
insert().into(@default).nodes([{_id:"A"}, {_id:"B"}, {_id:"C"}, {_id:"D"}, {_id:"E"}, {_id:"F"}, {_id:"G"}, {_id:"H"},{_id:"I"},{_id:"J"},{_id:"K"},{_id:"L"},{_id:"M"},{_id:"N"}])
insert().into(@default).edges([{_from:"A", _to:"B", weight:1}, {_from:"A", _to:"C", weight:1.7}, {_from:"A", _to:"D", weight:0.6}, {_from:"A", _to:"E", weight:1}, {_from:"B", _to:"G", weight:3}, {_from:"F", _to:"A", weight:1.6}, {_from:"F", _to:"H", weight:0.3}, {_from:"F", _to:"J", weight:2}, {_from:"F", _to:"K", weight:0.5}, {_from:"G", _to:"F", weight:2}, {_from:"I", _to:"F", weight:1}, {_from:"K", _to:"A", weight:0.3}, {_from:"K", _to:"M", weight:1.2}, {_from:"K", _to:"N", weight:2}, {_from:"K", _to:"L", weight:0.8}])

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as hdc_louvain:

CALL hdc.graph.create("hdc-server-1", "hdc_louvain", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static",
  query: "query",
  default: false
})

hdc.graph.create("hdc_louvain", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static",
  query: "query",
  default: false
}).to("hdc-server-1")

Parameters

Algorithm name: louvain

Name	Type	Spec	Default	Optional	Description
`phase1_loop_num`	Integer	≥1	`5`	Yes	The maximum number of loops in the first phase during each pass.
`min_modularity_increase`	Float	[0,1]	`0.01`	Yes	The minimum gain of modularity (ΔQ) to move a node to another community.
`edge_schema_property`	[]"`<@schema.?><property>`"	/	/	Yes	Numeric edge properties used as weights, summing values across the specified properties; edges without this property are ignored.
`return_id_uuid`	String	`uuid`, `id`, `both`	`uuid`	Yes	Includes `_uuid`, `_id`, or both to represent nodes in the results.
`limit`	Integer	≥-1	`-1`	Yes	Limits the number of results returned; `-1` includes all results.
`order`	String	`asc`, `desc`	/	Yes	Sorts the results by `count`; this option is only valid in Stream Return when `mode` is set to `2`.

File Writeback

This algorithm can generate three files:

Spec	Content
`filename_community_id`	`_id`/`_uuid`: The node. `community_id`: ID of the community the node belongs to.
`filename_ids`	`community_id`: ID of the community. `_ids`/`_uuids`: Nodes belonging to the community.
`filename_num`	`community_id`: ID of the community. `count`: Number of nodes in the community.

CALL algo.louvain.write("hdc_louvain", {
  params: {
    return_id_uuid: "id",
    phase1_loop_num: 5, 
    min_modularity_increase: 0.1,
    edge_schema_property: 'weight'
  },
  return_params: {
    file: {
      filename_community_id: "f1.txt",
      filename_ids: "f2.txt",
      filename_num: "f3.txt"
    }
  }
})

algo(louvain).params({
  projection: "hdc_louvain",
  return_id_uuid: "id",
  phase1_loop_num: 5, 
  min_modularity_increase: 0.1,
  edge_schema_property: 'weight'
}).write({
  file: {
    filename_community_id: "f1.txt",
    filename_ids: "f2.txt",
    filename_num: "f3.txt"
  }
})

Result:

_id,community_id
I,5
G,7
J,5
D,13
N,11
F,5
H,5
B,7
L,11
A,13
E,13
K,11
M,11
C,13

community_id,_ids
13,D;A;E;C;
5,I;J;F;H;
7,G;B;
11,N;L;K;M;

community_id,count
13,4
5,4
7,2
11,4

DB Writeback

Writes the community_id values from the results to the specified node property. The property type is uint32.

CALL algo.louvain.write("hdc_louvain", {
  params: {
    return_id_uuid: "id",
    phase1_loop_num: 5, 
    min_modularity_increase: 0.1,
    edge_schema_property: 'weight'
  },
  return_params: {
    db: {
      property: 'communityID'
    }
  }
})

algo(louvain).params({
  projection: "hdc_louvain",
  return_id_uuid: "id",
  phase1_loop_num: 5, 
  min_modularity_increase: 0.1,
  edge_schema_property: 'weight'
}).write({
  db: {
    property: 'communityID'
  }
})

Stats Writeback

CALL algo.louvain.write("hdc_louvain", {
  params: {
    return_id_uuid: "id",
    phase1_loop_num: 5, 
    min_modularity_increase: 0.1,
    edge_schema_property: 'weight'
  },
  return_params: {
    stats: {}
  }
})

algo(louvain).params({
  projection: "hdc_louvain",
  return_id_uuid: "id",
  phase1_loop_num: 5, 
  min_modularity_increase: 0.1,
  edge_schema_property: 'weight'
}).write({
  stats: {}
})

Result:

community_count	modularity
4	0.464280

Full Return

CALL algo.louvain("hdc_louvain", {
  params: {
    return_id_uuid: "id",
    phase1_loop_num: 5, 
    min_modularity_increase: 0.1
  },
  return_params: {}
}) YIELD r
RETURN r

exec {
  algo(louvain).params({
    return_id_uuid: "id",
    phase1_loop_num: 5, 
    min_modularity_increase: 0.1
  }) as r
  return r
} on hdc_louvain

Result:

_id	community_id
I	5
G	7
J	5
D	9
N	11
F	5
H	5
B	7
L	11
A	9
E	9
K	11
M	11
C	9

Stream Return

This Stream Return supports two modes:

Item	Spec	Columns
`mode`	`1` (Default)	`_id`/`_uuid`: The node. `community_id`: ID of the community the node belongs to.
`mode`	`2`	`community_id`: ID of the community. `count`: Number of nodes in the community.

CALL algo.louvain("hdc_louvain", {
  params: {
    return_id_uuid: "id",
    phase1_loop_num: 6, 
    min_modularity_increase: 0.1
  },
  return_params: {
  	stream: {}
  }
}) YIELD r
RETURN r

exec{
  algo(louvain).params({
    return_id_uuid: "id",
    phase1_loop_num: 6, 
    min_modularity_increase: 0.1
  }).stream() as r
  return r
} on hdc_louvain

Result:

_id	community_id
I	5
G	7
J	5
D	9
N	11
F	5
H	5
B	7
L	11
A	9
E	9
K	11
M	11
C	9

CALL algo.louvain("hdc_louvain", {
  params: {
    return_id_uuid: "id",
    phase1_loop_num: 6, 
    min_modularity_increase: 0.1,
    order: "asc"
  },
  return_params: {
    stream: {
      mode: 2
    }
  }
}) YIELD r
RETURN r

exec{
  algo(louvain).params({
    return_id_uuid: "id",
    phase1_loop_num: 6, 
    min_modularity_increase: 0.1,
    order: "asc"
  }).stream({
    mode: 2
  }) as r
  return r
} on hdc_louvain

Result:

community_id	count
7	2
5	4
9	4
11	4

Stats Return

CALL algo.louvain("hdc_louvain", {
  params: {
    return_id_uuid: "id",
    phase1_loop_num: 6, 
    min_modularity_increase: 0.1
  },
  return_params: {
  	stats: {}
  }
}) YIELD s
RETURN s

exec{
  algo(louvain).params({
    return_id_uuid: "id",
    phase1_loop_num: 6, 
    min_modularity_increase: 0.1
  }).stats() as s
  return s
} on hdc_louvain

Result:

community_count	modularity
4	0.397778

Algorithm Efficiency

The Louvain algorithm achieves lower time complexity than previous community detection algorithms through its improved greedy optimization, which is usually regarded as O(N*logN), where N is the number of nodes in the graph, and the result is more intuitive. For instance, in a graph with 10,000 nodes, the complexity of the Louvain algorithm is around O(40000); in a connected graph with 100 million nodes, the algorithm complexity is around O(800000000).

However, upon closer inspection of the algorithm process breakdown, we can see that the complexity of the Louvain algorithm depends not only on the number of nodes but also on the number of edges. Roughly speaking, it can be approximated as O(N * E/N) = O(E), where E is the number of edges in the graph. This is because the dominant algorithm logic of Louvain is to calculate the weights of edges attached to each node.

The table below shows the performance of the community detection algorithms of Clauset, Newman and Moore, of Pons and Latapy, of Wakita and Tsurumi, and Louvain, in networks of various sizes. For each algorithm/network, it gives the modularity that is gained and the computation time. Empty record indicates a computation time that is over 24 hours. It clearly demonstrates that Louvain achieves a significant (exponential) increase in both modularity and efficiency.

The choice of system architecture and programming language can significantly impact the efficiency and final results of the Louvain algorithm. For example, a serial implementation of the Louvain algorithm in Python may result in hours of computation time for small graphs with around 10,000 nodes. Additionally, the data structure used can influence performance, as the algorithm frequently calculates node degrees and edge weights.

The native Louvain algorithm adopts C++, but it is a serial implementation. The time consumption can be reduced by using parallel computation as much as possible, thereby improving the efficiency of the algorithm.

For medium-sized graphset with tens of millions of nodes and edges, Ultipa's Louvain algorithm can be completed literally in real time. For large graphs with over 100 million nodes and edges, it can be implemented in seconds to minutes. Furthermore, the efficiency of the Louvain algorithm can be affected by other factors, such as whether data is written back to the database property or disk file. These operations can impact the overall computation time of the algorithm.

This is the records of the modularity and execution time of the Louvain algorithm running on a graph with 5 million nodes and 100 million edges. The computation process takes approximately 1 minute, and additional operations, such as writing back to the database or generating a disk file, add around 1 minute to the total execution time.

ID
Product
Status
Cores
Maximum Shard Services
Maximum Total Cores for Shard Service
Maximum HDC Services
Maximum Total Cores for HDC Service
Applied Validity Period(days)
Effective Date
Expired Date
Mac Address
Reason for Application
Review Comment