UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Introduction
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Eccentricity Centrality
    • Betweenness Centrality
    • Bridges
    • Articulation Points
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • KNN
    • Vector Similarity
    • Bipartite Graph
    • HyperANF
    • Weakly Connected Components (WCC)
    • Strongly Connected Components (SCC)
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Triangle Count
    • Clique Count
    • k-Core
    • k-Truss
    • p-Cohesion
    • Induced Subgraph
    • Topological Sort
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Dijkstra's Shortest Path
    • A* Shortest Path
    • Yen's K-Shortest Paths
    • Shortest Path (BFS)
    • Delta-Stepping SSSP
    • Shortest Path Faster Algorithm (SPFA)
    • All-Pairs Shortest Path (APSP)
    • Minimum Spanning Tree (MST)
    • K-Spanning Tree
    • Steiner Tree
    • Prize-Collecting Steiner Tree (PCST)
    • Minimum Cost Flow
    • Maximum Flow
    • K-Hop Fast
    • Longest Path (DAG)
    • Random Walk
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Same Community
    • Louvain
    • Leiden
    • Modularity Optimization
    • Label Propagation
    • HANP
    • SLPA
    • k-Means
    • HDBSCAN
    • K-1 Coloring
    • Modularity
    • Conductance
    • Max k-Cut
    • Node2Vec
    • Struc2Vec
    • LINE
    • FastRP
    • GraphSAGE
    • HashGNN
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Algorithms
  4. /
  5. Graph Structure

p-Cohesion

Overview

The p-Cohesion algorithm computes a per-node cohesion value that measures how well each node is embedded within a densely connected neighborhood. The concept of p-cohesion was first proposed by S. Morris in a contagion model describing interactions within large populations:

  • S. Morris, Contagion. The Review of Economic Studies, 67(1), 57–78 (2000)

Concepts

p-Cohesion

One natural measure of the cohesion of a group is the relative frequency of ties among its members compared to non-members. Let cohesion be a constant p ∈ (0,1). A p-Cohesion is a connected subgraph in which every node has, at least, a proportion p of its neighbors within the subgraph. In other words, each node has at most, a proportion (1 − p) of its neighbors outside the subgraph.

The p-Cohesion model offers two key advantages over other cohesive subgraph models:

  • With a high p value, p-Cohesion ensures both strong internal cohesiveness and sparse connections to outside nodes.
  • It considers the proportion of neighbors within the group rather than a fixed number (such as the k value in k-Core), making it better suited for graphs with varying node degrees.

The example graph below illustrates this. Suppose p = 0.6. A grey label next to each node shows the minimum number of internal neighbors required for the node to remain in a p-Cohesion.

Below are the minimal p-Cohesion subgraphs, in terms of node count, that include node a and node j, respectively.

Cohesion Value

Finding the maximum p for which a node belongs to a p-cohesive subgraph is computationally hard, so this algorithm uses a tractable proxy based on k-Core decomposition:

  1. Compute each node's coreness: the largest k such that the node belongs to a k-core (a subgraph where every member has at least k neighbors inside).
  2. Normalize by the node's total degree: cohesion(v) = coreness(v) / degree(v).

The resulting value lies in [0, 1] and reflects the proportion of v's neighbors that remain with v in the deepest k-core containing it. A higher value indicates the node is embedded in a more tightly connected neighborhood.

NOTE

The cohesion value is a per-node ranking metric — it does not name the specific subgraph the node belongs to, and it differs from the literal maximum p of the ratio-based p-Cohesion definition above (it's a lower bound on that quantity). For most analytical use cases the ranking it produces is what matters.

Considerations

  • The algorithm treats all edges as undirected.

Example Graph

GQL
INSERT (A:default {_id: "A"}), (B:default {_id: "B"}),
       (C:default {_id: "C"}), (D:default {_id: "D"}),
       (E:default {_id: "E"}), (F:default {_id: "F"}),
       (G:default {_id: "G"}), (H:default {_id: "H"}),
       (I:default {_id: "I"}), (J:default {_id: "J"}),
       (K:default {_id: "K"}), (L:default {_id: "L"}),
       (K)-[:default]->(J), (K)-[:default]->(L),
       (J)-[:default]->(L), (L)-[:default]->(C),
       (C)-[:default]->(A), (A)-[:default]->(B),
       (C)-[:default]->(B), (A)-[:default]->(D),
       (B)-[:default]->(G), (B)-[:default]->(D),
       (D)-[:default]->(C), (C)-[:default]->(E),
       (C)-[:default]->(F), (D)-[:default]->(E),
       (E)-[:default]->(F), (D)-[:default]->(F),
       (D)-[:default]->(H), (I)-[:default]->(H),
       (F)-[:default]->(I)

Parameters

NameTypeDefaultDescription
pFLOAT/Cohesion threshold (0 < p ≤ 1). When set, only nodes with cohesion ≥ p are returned.

Run Mode

Returns:

ColumnTypeDescription
nodeIdSTRINGNode identifier (_id)
cohesionFLOATMaximal p-cohesion value for this node
GQL
CALL algo.pcohesion({
  p: 0.7
}) YIELD nodeId, cohesion

Result:

nodeIdcohesion
E1
G1
F0.75
A1
B0.75
I1
H1
K1
J1

Stream Mode

Returns the same columns as run mode, streamed for memory efficiency.

GQL
CALL algo.pcohesion.stream({
  p: 0.8
}) YIELD nodeId, cohesion
RETURN nodeId, cohesion

Result:

nodeIdcohesion
E1
G1
A1
I1
H1
K1
J1

Stats Mode

Returns:

ColumnTypeDescription
nodeCountINTTotal number of nodes
minCohesionFLOATMinimum cohesion value
maxCohesionFLOATMaximum cohesion value
avgCohesionFLOATAverage cohesion value
GQL
CALL algo.pcohesion.stats({
  p: 0.8
}) YIELD nodeCount, minCohesion, maxCohesion, avgCohesion

Write Mode

Computes results and writes them back to node properties. The write configuration is passed as a second argument map.

Write parameters:

NameTypeDescription
db.propertySTRING or MAPNode property to write results to. String: writes the cohesion column in results to a property. Map: explicit column-to-property mapping (e.g., {cohesion: 'p_cohesion'}).

Writable columns:

ColumnTypeDescription
cohesionFLOATMaximal p-cohesion value

Returns:

ColumnTypeDescription
task_idSTRINGTask identifier for tracking via SHOW TASKS
nodesWrittenINTNumber of nodes with properties written
computeTimeMsINTTime spent computing the algorithm (milliseconds)
writeTimeMsINTTime spent writing properties to storage (milliseconds)
GQL
CALL algo.pcohesion.write({}, {
  db: {
    property: "p_cohesion"
  }
}) YIELD task_id, nodesWritten, computeTimeMs, writeTimeMs