**✓ File Writeback** **✓ Property Writeback** **✓ Direct Return** **✓ Stream Return** **✓ Stats**

## Overview

The Louvain algorithm, proposed by Vincent D. Blondel et al. from Université catholique de Louvain in Belgium in 2008, is a widely recognized and extensively used algorithm for community detection in graphs. It aims to maximize the modularity of the graph and has gained popularity due to its high efficiency and the quality of its results.

- V.D. Blondel, J. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of communities in large networks (2008)
- H. Lu, Parallel Heuristics for Scalable Community Detection (2014)

## Concepts

### Modularity

Modularity is a concept used to assess the quality of a graph partition into communities. It quantifies the density of edges inside communities as compared to edges between communities. The modularity (Q) is defined as

where,

*m*is the total sum of edge weights in the graph;*A*is the sum of edge weights between nodes_{ij}*i*and*j*, and*2m = ∑*;_{ij}A_{ij}*k*is the sum of weights of all edges attached to node_{i}*i*;*C*represents the community to which node_{i}*i*is assigned,*δ(C*,C_{i}_{j}) is 1 if*C*= C_{i}_{j}, and 0 otherwise.

This formula is equivalent to the following:

where,

*∑*is the sum of weights of edges inside community_{in}*C*, i.e., the**intra-community weight**;*∑*is the sum of weights of edges incident to nodes in community_{tot}*C*, i.e, the**total-community weight**;*m*has the same meaning as above, and*2m = ∑*._{c}(∑_{tot})

Nodes in this graph are assigned into 3 communities, take community *C1* as example:

- ∑
_{in}= A_{aa}+ A_{ab}+ A_{ac}+ A_{ad}+ A_{ba}+ A_{ca}+ A_{da}= 1.5 + 1 + 0.5 + 3 + 1 + 0.5 + 3 = 10.5 - (∑
_{tot})^{2}= k_{a}k_{a}+ k_{a}k_{b}+ k_{a}k_{c}+ k_{a}k_{d}+ k_{b}k_{a}+ k_{b}k_{b}+ k_{b}k_{c}+ k_{b}k_{d}+ k_{c}k_{a}+ k_{c}k_{b}+ k_{c}k_{c}+ k_{c}k_{d}+ k_{d}k_{a}+ k_{d}k_{b}+ k_{d}k_{c}+ k_{d}k_{d}+ = (k_{a}+ k_{b}+ k_{c}+ k_{d})^{2}= (6 + 2.7 + 2.8 + 3)^{2}= 14.5^{2}

The modularity ranges from -1 to 1. A value close to 1 indicates a strong community structure, while negative values suggest that the partition is not meaningful. For a connected graph, the modularity value ranges from -0.5 to 1.

Many popular community detection algorithms, including Louvain, are designed to find partitions that maximize the modularity of the resulting communities.

### Louvain

Initially, each node in the graph is assigned a different community. The Louvain algorithm iteratively runs through passes, each pass is made of two phases:

#### First Phase: Modularity optimization

For each node *i*, consider its neighbors *j* of *i*, evaluate the **gain of modularity** (ΔQ) that would take place by reassigning *i* from its current community to the community of *j*.

Node *i* is then placed in the community that offers the maximum ΔQ, but only if ΔQ is greater than a predefined positive threshold. If no such gain is possible, node *i* stays in its original community.

This process is sequentially applied for all nodes and repeated until no individual move can improve the modularity or the maximum loop number is reached, completing the first phase.

#### Second Phase: Community Aggregation

In the second phase, community aggregation is performed to represent each community found after the first phase with a single aggregated node. Each aggregated node has a self-loop with weight corresponds to the intra-community weight. The weights of edges between these new nodes are given by the sum of weights of the edges between nodes in the corresponding two communities.

Below is the result of community aggregation of the above example:

Community aggregation reduces the number of nodes and edges in the graph without altering the weight of the local or the entire graph. After compression, nodes within a community are treated as a whole, but they are no longer isolated for modularity optimization, achieving a hierarchical (iterative) community division effect.

Once this second phase is completed, another pass is applied to the resulting weighted network. The passes are iterated until there are no more changes, and a maximum modularity is attained.

## Considerations

- If node
*i*has any self-loop, when calculating*k*, the weight of self-loop is counted only once._{i} - The Louvain algorithm ignores the direction of edges but calculates them as undirected edges.
- The output of the Louvain algorithm may vary with each execution, depending on the order in which the nodes are considered. However, it does not have a significant influence on the modularity that is obtained.

## Syntax

- Command:
`algo(louvain)`

- Parameters:

Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|

phase1_loop_num | int | ≥1 | `5` |
Yes | The maximum loop number of the first phase during each pass |

min_modularity_increase | float | [0,1] | `0.01` |
Yes | The minimum gain of modularity (ΔQ) to move a node to another community |

edge_schema_property | []`@<schema>?.<property>` |
Numeric type, must LTE | / | Yes | Edge properties to use as weights, where the values of multiple properties are summed up; all edge weights are considered as 1 if not set |

limit | int | ≥-1 | `-1` |
Yes | Number of results to return, `-1` to return all results |

order | string | `asc` , `desc` |
/ | Yes | Sort communities by the number of nodes in it (only valid in mode 2 of the `stream()` execution) |

## Examples

The example graph is as follows:

### File Writeback

Spec | Content | Description |
---|---|---|

filename_community_id | `_id` ,`community_id` |
Node and its community ID |

filename_ids | `community_id` ,`_id` ,`_id` ,... |
Community ID and the ID of nodes in it |

filename_num | `community_id` ,`count` |
Community ID and the number of nodes in it |

```
algo(louvain).params({
phase1_loop_num: 5,
min_modularity_increase: 0.1,
edge_schema_property: 'weight'
}).write({
file:{
filename_community_id: "communityID",
filename_ids: "ids",
filename_num: "num"
}
})
```

Statistics: community_count = 4, modularity = 0.464280

Results: Files *communityID*, *ids*, *num*

```
M,2
N,2
K,2
L,2
J,8
I,8
G,13
H,8
F,8
C,12
E,12
D,12
A,12
B,13
```

```
8,J,I,H,F,
12,C,E,D,A,
2,M,N,K,L,
13,G,B,
```

```
8,4
12,4
2,4
13,2
```

### Property Writeback

Spec | Content | Write to | Data Type |
---|---|---|---|

property | `community_id` |
Node property | `uint32` |

```
algo(louvain).params({
phase1_loop_num: 5,
min_modularity_increase: 0.1,
edge_schema_property: 'weight'
}).write({
db:{
property: "communityID"
}
})
```

Statistics: community_count = 4, modularity = 0.464280

Results: The community ID of each node is written to a new property named *communityID*

### Stats Writeback

```
algo(louvain).params({
phase1_loop_num: 5,
min_modularity_increase: 0.1,
edge_schema_property: 'weight'
}).write()
```

Statistics: community_count = 4, modularity = 0.464280

### Direct Return

Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|

0 | []perNode | Node and its community ID | `_uuid` , `community_id` |

1 | KV | Number of communities, modularity | `community_count` , `modularity` |

```
algo(louvain).params({
phase1_loop_num: 6,
min_modularity_increase: 0.5,
edge_schema_property: 'weight'
}) as results, stats
return results, stats
```

Results: *results* and *stats*

_uuid | community_id |
---|---|

13 | 2 |

14 | 2 |

11 | 2 |

12 | 2 |

10 | 8 |

9 | 8 |

7 | 13 |

8 | 8 |

6 | 8 |

3 | 12 |

5 | 12 |

4 | 12 |

1 | 12 |

2 | 13 |

community_count | modularity |
---|---|

4 | 0.46428 |

### Stream Return

Spec | Content | Alias Ordinal | Type | Description | Columns |
---|---|---|---|---|---|

mode | `1` or if not set |
0 | []perNode | Node and its community ID | `_uuid` , `community_id` |

`2` |
[]perCommunity | Community and the number of nodes in it | `community_id` , `count` |

```
algo(louvain).params({
phase1_loop_num: 6,
min_modularity_increase: 0.5,
edge_schema_property: 'weight'
}).stream() as results
group by results.community_id
return table(results.community_id, max(results._uuid))
```

Results: *table(results.community_id, max(results._uuid))*

results.community_id | max(results._uuid) |
---|---|

12 | 5 |

13 | 7 |

2 | 14 |

8 | 10 |

```
algo(louvain).params({
phase1_loop_num: 5,
min_modularity_increase: 0.1,
order: "desc"
}).stream({
mode: 2
}) as results
return results
```

Results: *results*

community_id | count |
---|---|

8 | 4 |

12 | 4 |

2 | 4 |

13 | 2 |

### Stats Return

Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|

0 | KV | Number of communities, modularity | `community_count` , `modularity` |

```
algo(louvain).params({
phase1_loop_num: 5,
min_modularity_increase: 0.1
}).stats() as stats
return stats
```

Results: *stats*

community_count | modularity |
---|---|

4 | 0.397778 |

## Algorithm Efficiency

The Louvain algorithm achieves lower time complexity than previous community detection algorithms through its improved greedy optimization, which is usually regarded as *O(N*logN)*, where *N* is the number of nodes in the graph, and the result is more intuitive. For instance, in a graph with 10,000 nodes, the complexity of the Louvain algorithm is around *O(40000)*; in a connected graph with 100 million nodes, the algorithm complexity is around *O(800000000)*.

However, upon closer inspection of the algorithm process breakdown, we can see that the complexity of the Louvain algorithm depends not only on the number of nodes but also on the number of edges. Roughly speaking, it can be approximated as *O(N * E/N) = O(E)*, where *E* is the number of edges in the graph. This is because the dominant algorithm logic of Louvain is to calculate the weights of edges attached to each node.

The table below shows the performance of the community detection algorithms of Clauset, Newman and Moore, of Pons and Latapy, of Wakita and Tsurumi, and Louvain, in networks of various sizes. For each algorithm/network, it gives the modularity that is gained and the computation time. Empty record indicates a computation time that is over 24 hours. It clearly demonstrates that Louvain achieves a significant (exponential) increase in both modularity and efficiency.

The choice of system architecture and programming language can significantly impact the efficiency and final results of the Louvain algorithm. For example, a serial implementation of the Louvain algorithm in Python may result in hours of computation time for small graphs with around 10,000 nodes. Additionally, the data structure used can influence performance, as the algorithm frequently calculates node degrees and edge weights.

The native Louvain algorithm adopts C++, but it is a serial implementation. The time consumption can be reduced by using parallel computation as much as possible, thereby improving the efficiency of the algorithm.

For medium-sized graphset with tens of millions of nodes and edges, Ultipa's Louvain algorithm can be completed literally in real time. For large graphs with over 100 million nodes and edges, it can be implemented in seconds to minutes. Furthermore, the efficiency of the Louvain algorithm can be affected by other factors, such as whether data is written back to the database property or disk file. These operations can impact the overall computation time of the algorithm.

This is the records of the modularity and execution time of the Louvain algorithm running on a graph with 5 million nodes and 100 million edges. The computation process takes approximately 1 minute, and additional operations, such as writing back to the database or generating a disk file, add around 1 minute to the total execution time.