UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Introduction
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Eccentricity Centrality
    • Betweenness Centrality
    • Bridges
    • Articulation Points
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • KNN
    • Vector Similarity
    • Bipartite Graph
    • HyperANF
    • Weakly Connected Components (WCC)
    • Strongly Connected Components (SCC)
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Triangle Count
    • Clique Count
    • k-Core
    • k-Truss
    • p-Cohesion
    • Induced Subgraph
    • Topological Sort
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Dijkstra's Shortest Path
    • A* Shortest Path
    • Yen's K-Shortest Paths
    • Shortest Path (BFS)
    • Delta-Stepping SSSP
    • Shortest Path Faster Algorithm (SPFA)
    • All-Pairs Shortest Path (APSP)
    • Minimum Spanning Tree (MST)
    • K-Spanning Tree
    • Steiner Tree
    • Prize-Collecting Steiner Tree (PCST)
    • Minimum Cost Flow
    • Maximum Flow
    • K-Hop Fast
    • Longest Path (DAG)
    • Random Walk
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Same Community
    • Louvain
    • Leiden
    • Modularity Optimization
    • Label Propagation
    • HANP
    • SLPA
    • k-Means
    • HDBSCAN
    • K-1 Coloring
    • Modularity
    • Conductance
    • Max k-Cut
      • Node2Vec
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Algorithms
  4. /
  5. Community Detection

Leiden

Overview

The Leiden algorithm is a community detection method designed to optimize modularity while addressing some of the limitations of the widely used Louvain algorithm. Unlike Louvain, which may produce poorly connected or even disconnected communities, Leiden guarantees well-connected communities. Additionally, the Leiden algorithm is faster. It is named after the city of Leiden, where it was developed.

References:

  • V.A. Traag, L. Waltman, N.J. van Eck, From Louvain to Leiden: guaranteeing well-connected communities (2019)
  • V.A. Traag, P. Van Dooren, Y. Nesterov, Narrow scope for resolution-limit-free community detection (2011)

Concepts

Modularity

The Leiden algorithm optimizes modularity with an additional resolution parameter γ (gamma):

The parameter γ controls the granularity of the detected communities by modulating the balance between intra-community and inter-community connections:

  • When γ > 1, the algorithm favors more and smaller communities that are tightly connected internally.
  • When 0 < γ < 1, it favors fewer and larger communities that may be less densely connected internally.

Leiden

When the Leiden algorithm starts, each node is placed in its own community. The algorithm then iteratively proceeds through passes, each consisting of three phases:

Phase 1: Fast Modularity Optimization

In the first phase of Louvain, the algorithm repeatedly visits all nodes in the graph until no further node movements can increase the modularity. The Leiden algorithm improves efficiency by only visiting all nodes once initially, and afterwards, only revisiting nodes whose neighborhoods have changed.

To achieve this, the Leiden algorithm maintains a queue, initializes it with all nodes in the graph in a random order, then repeats the following steps until the queue is empty:

  • Remove the first node v from the front of the queue.
  • Reassign node v to a different community C that provides the maximum gain of modularity (ΔQ), or keep v in its current community if there is no positive gain.
  • If v is moved to a new community C, add to the rear of the queue all neighbors of v that do not belong to C and that are not already in the queue.

Phase 2: Refinement

This phase produces a refined partition Prefined based on the partition P obtained from the first phase. Initially, Prefined is set as a singleton partition, where each node—either from the original graph or the aggregated graph—is placed in its own community. Then, each community C ∈ P is refined individually as follows:

1. Find each node v ∈ C that is well-connected within C by this formula:

where,

  • W(v,C-v) is the sum of edge weights between node v and the other nodes in C.
  • kv is the total edge weights between node v and the other nodes in the graph.
  • ∑totc is the sum of k of all nodes in C.
  • m is the sum of all edge weights in the graph.

Take community C1 in the graph above as an example, where

  • m = 18.1
  • ∑totC1 = ka + kb + kc + kd = 6 + 2.7 + 2.8 + 3 = 14.5

Set γ as 1.2, then:

  • W(a, C1) - γ/m ⋅ ka ⋅ (∑totC1 - ka) = 4.5 - 1.2/18.1*6*(14.5 - 6) = 1.12
  • W(b, C1) - γ/m ⋅ kb ⋅ (∑totC1 - kb) = 1 - 1.2/18.1*2.7*(14.5 - 2.7) = -1.11
  • W(c, C1) - γ/m ⋅ kc ⋅ (∑totC1 - kc) = 0.5 - 1.2/18.1*2.8*(14.5 - 2.8) = -1.67
  • W(d, C1) - γ/m ⋅ kd ⋅ (∑totC1 - kd) = 3 - 1.2/18.1*3*(14.5 - 3) = 0.71

Therefore, nodes a and d are considered well-connected in C1.

2. Visit each node v. If it remains in its own singleton community in Prefined, randomly merge it into a community C' ∈ Prefined that increases the modularity. The merge is allowed only if C' is well-connected with C, determined by the following condition:

Note that each node v is not necessarily merged greedily with the community that yields the maximum gain of modularity. Instead, the larger the modularity gain, the more likely that community is to be selected. The degree of randomness in selecting a community C' is determined by a parameter θ (theta) as:

Randomness in the selection of a community allows the partition space to be explored more broadly.

Phase 3: Community Aggregation

The aggregate graph is constructed based on the Prefined obtained from the previous phase. This aggregation process is the same as in Louvain. Note that each node is a single community in the aggregate graph in Louvain. However, the aggregate graph in Leiden is partitioned based on P, so multiple nodes may belong to the same community.

P is denoted by color blocks, Prefined is denoted by node colors

Once this third phase is completed, another pass is applied to the aggregate graph. These passes are iterated until no further changes occur in the node communities, and the modularity reaches its maximum.

Considerations

  • If node v has any self-loop, when calculating kv, the weight of self-loop is counted only once.
  • The Leiden algorithm treats all edges as undirected, ignoring their original direction.

Example Graph

GQL
INSERT (A:default {_id: "A"}), (B:default {_id: "B"}),
       (C:default {_id: "C"}), (D:default {_id: "D"}),
       (E:default {_id: "E"}), (F:default {_id: "F"}),
       (G:default {_id: "G"}), (H:default {_id: "H"}),
       (I:default {_id: "I"}), (J:default {_id: "J"}),
       (K:default {_id: "K"}), (L:default {_id: "L"}),
       (M:default {_id: "M"}), (N:default {_id: "N"}),
       (A)-[:default {weight: 1}]->(B), (A)-[:default {weight: 1.7}]->(C),
       (A)-[:default {weight: 0.6}]->(D), (A)-[:default {weight: 1}]->(E),
       (B)-[:default {weight: 3}]->(G), (F)-[:default {weight: 1.6}]->(A),
       (F)-[:default {weight: 0.3}]->(H), (F)-[:default {weight: 2}]->(J),
       (F)-[:default {weight: 0.5}]->(K), (G)-[:default {weight: 2}]->(F),
       (I)-[:default {weight: 1}]->(F), (K)-[:default {weight: 0.3}]->(A),
       (K)-[:default {weight: 0.8}]->(L), (K)-[:default {weight: 1.2}]->(M),
       (K)-[:default {weight: 2}]->(N)

Parameters

NameTypeDefaultDescription
maxIterationsINT10Maximum number of optimization passes. Each pass consists of 3 phases.
resolutionFLOAT1.0Resolution parameter γ. Values > 1 favor smaller, tighter communities; values < 1 favor larger communities.
thetaFLOAT0.01Controls randomness during refinement phase. Set to 0 to disable randomness and always select the maximum modularity gain.
weightSTRING/Numeric edge property to use as weight. If unset, all edges have unit weight.
limitINT-1Limits the number of results returned (-1 = all).
orderSTRING/Sorts the results by community size: asc or desc.

Run Mode

Returns:

ColumnTypeDescription
nodeIdSTRINGNode identifier (_id)
communityINTCommunity identifier
iterationINTNumber of optimization passes completed before convergence
modularityFLOATFinal modularity score
GQL
CALL algo.leiden({
  weight: "weight"
}) YIELD nodeId, community, level, modularity

Stream Mode

Returns the same columns as run mode, streamed for memory efficiency.

GQL
CALL algo.leiden.stream({
  weight: "weight"
}) YIELD nodeId, community
RETURN community, COLLECT(nodeId) AS members
GROUP BY community

Stats Mode

Returns:

ColumnTypeDescription
nodeCountINTTotal number of nodes
communityCountINTNumber of communities detected
largestCommunitySizeINTSize of the largest community
smallestCommunitySizeINTSize of the smallest community
modularityFLOATFinal modularity score
GQL
CALL algo.leiden.stats({
  weight: "weight"
}) YIELD nodeCount, communityCount, largestCommunitySize, smallestCommunitySize, modularity

Write Mode

Computes results and writes them back to node properties. The write configuration is passed as a second argument map.

Write parameters:

NameTypeDescription
db.propertySTRING or MAPNode property to write results to. String: writes the community column in results to a property. Map: explicit column-to-property mapping (e.g., {community: 'comm_id', level: 'lvl'}).

Writable columns:

ColumnTypeDescription
communityINTCommunity identifier
iterationINTNumber of optimization passes completed before convergence
modularityFLOATFinal modularity score

Returns:

ColumnTypeDescription
task_idSTRINGTask identifier for tracking via SHOW TASKS
nodesWrittenINTNumber of nodes with properties written
computeTimeMsINTTime spent computing the algorithm (milliseconds)
writeTimeMsINTTime spent writing properties to storage (milliseconds)
GQL
CALL algo.leiden.write({weight: "weight"}, {
  db: {
    property: "community_id"
  }
}) YIELD task_id, nodesWritten, computeTimeMs, writeTimeMs