Modularity

Overview

The Modularity algorithm evaluates the quality of an existing community partition by computing its modularity score. Unlike community detection algorithms that find communities, this algorithm measures how good a given partition is.

It is typically used after running a community detection algorithm (such as Louvain and Leiden) to assess the quality of the detected communities.

Concepts

Modularity

In many networks, nodes tend to naturally form groups or communities, characterized by dense connections within a community and relatively sparse connections between communities.

Consider an equivalent network G' to G, where G' remains the same community partition and the same number of edges as in G, but the edges are placed randomly. If G has a strong community structure, the ratio of intra-community edges to the total number of edges in G should be higher than the expected ratio in G'. A greater disparity between the actual ratio and expected ratios indicates a more prominent community structure in G. This concept forms the basis of modularity. The modularity is one of the widely used methods to evaluate the quality of a community partition. The Louvain algorithm is designed to find partitions that maximize modularity.

Modularity is a value that ranges from -1 to 1. A value close to 1 indicates a strong community structure, while negative values imply that the partitioning is likely not meaningful. For a connected graph, the modularity generally falls within the range of -0.5 to 1.

Considering the weights of edges in the graph, the modularity (Q) is defined as

where,

m is the total sum of edge weights in the graph;
A_ij is the sum of edge weights between nodes i and j, and 2m = ∑_ijA_ij;
k_i is the sum of weights of all edges attached to node i;
C_i represents the community to which node iis assigned, δ(C_i,C_j) is 1 if C_i= C_j, and 0 otherwise.

Note, $\frac{k_{i} k_{j}}{2m}$ is the expected sum of weights of edges between nodes i and j if edges are placed at random. Both A_ij and $\frac{k_{i} k_{j}}{2m}$ are divided by 2m because each pair of distinct nodes in a community is considered twice, such as A_ab = A_ba, $\frac{k_{a} k_{b}}{2m}$ = $\frac{k_{b} k_{a}}{2m}$ .

We can also write the above formula as the following:

where,

$\sum_{in}^{c}$ is the sum of weights of edges inside community C, i.e., the intra-community weight;
$\sum_{tot}^{c}$ is the sum of weights of edges incident to nodes in community C, i.e, the total-community weight;
m has the same meaning as above, and 2m = ∑_c $\sum_{tot}^{c}$ .

Nodes in this graph are assigned into 3 communities, take community C₁ as example:

$\sum_{in}^{C_{1}}$ = A_aa + A_ab + A_ac + A_ad + A_ba + A_ca + A_da = 1.5 + 1 + 0.5 + 3 + 1 + 0.5 + 3 = 10.5
( $\sum_{tot}^{C_{1}}$ )² = k_ak_a + k_ak_b + k_ak_c + k_ak_d + k_bk_a + k_bk_b + k_bk_c + k_bk_d + k_ck_a + k_ck_b + k_ck_c + k_ck_d + k_dk_a + k_dk_b + k_dk_c + k_dk_d + = (k_a + k_b + k_c + k_d)² = (6 + 2.7 + 2.8 + 3)² = 14.5²

Considerations

The algorithm treats all edges as undirected.
Community assignments are read from a node property specified by communityProperty. If not specified, each node is treated as its own community.

Example Graph

GQL
INSERT (A:default {_id: "A", comm_id: 0}), (B:default {_id: "B", comm_id: 0}),
       (C:default {_id: "C", comm_id: 0}), (D:default {_id: "D", comm_id: 1}),
       (E:default {_id: "E", comm_id: 1}), (F:default {_id: "F", comm_id: 1}),
       (G:default {_id: "G", comm_id: 2}), (H:default {_id: "H", comm_id: 2}),
       (A)-[:default]->(B), (A)-[:default]->(C),
       (B)-[:default]->(C), (A)-[:default]->(D),
       (D)-[:default]->(E), (D)-[:default]->(F),
       (E)-[:default]->(F), (G)-[:default]->(D),
       (G)-[:default]->(H)

Parameters

Name	Type	Default	Description
`communityProperty`	`STRING`	/	Node property storing community assignments. If not specified, each node is treated as its own community.

Run Mode

Returns:

Column	Type	Description
`modularity`	`FLOAT`	Overall modularity score Q
`communityCount`	`INT`	Number of communities

GQL
CALL algo.modularity({
  communityProperty: "comm_id"
}) YIELD modularity, communityCount

Stream Mode

Returns the same columns as run mode, streamed for memory efficiency.

GQL
CALL algo.modularity.stream({
  communityProperty: "comm_id"
}) YIELD modularity, communityCount
RETURN modularity, communityCount

Stats Mode

Returns:

Column	Type	Description
`nodeCount`	`INT`	Total number of nodes
`modularity`	`FLOAT`	Overall modularity score Q
`communityCount`	`INT`	Number of communities

GQL
CALL algo.modularity.stats({
  communityProperty: "comm_id"
}) YIELD nodeCount, modularity, communityCount