UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Introduction
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Eccentricity Centrality
    • Betweenness Centrality
    • Bridges
    • Articulation Points
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • KNN
    • Vector Similarity
    • Bipartite Graph
    • HyperANF
    • Weakly Connected Components (WCC)
    • Strongly Connected Components (SCC)
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Triangle Count
    • Clique Count
    • k-Core
    • k-Truss
    • p-Cohesion
    • Induced Subgraph
    • Topological Sort
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Dijkstra's Shortest Path
    • A* Shortest Path
    • Yen's K-Shortest Paths
    • Shortest Path (BFS)
    • Delta-Stepping SSSP
    • Shortest Path Faster Algorithm (SPFA)
    • All-Pairs Shortest Path (APSP)
    • Minimum Spanning Tree (MST)
    • K-Spanning Tree
    • Steiner Tree
    • Prize-Collecting Steiner Tree (PCST)
    • Minimum Cost Flow
    • Maximum Flow
    • K-Hop Fast
    • Longest Path (DAG)
    • Random Walk
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Same Community
    • Louvain
    • Leiden
    • Modularity Optimization
    • Label Propagation
    • HANP
    • SLPA
    • k-Means
    • HDBSCAN
    • K-1 Coloring
    • Modularity
    • Conductance
    • Max k-Cut
      • Node2Vec
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Algorithms
  4. /
  5. Centrality

SybilRank

Overview

The SybilRank algorithm ranks the trust of nodes by early-terminated random walks in the network, typically Online Social Network (OSN). The surge in popularity of OSNs has accompanied by the a rise in Sybil attacks, in which a malicious attacker creates multiple fake accounts (Sybils) to send spam, distribute malware, manipulate votes, inflate view counts for niche content, and so on.

SybilRank was proposed by Qiang Cao et al. in 2012, it is computationally efficient and can scale to large graphs.

  • Q. Cao, M. Sirivianos, X. Yang, T. Pregueiro, Aiding the Detection of Fake Accounts in Large Scale Social Online Services (2012)

Concepts

Threat Model and Trust Seeds

SybilRank models an OSN as an undirected graph, where each node represents a user in the network, and each edge represents a mutual social relationship.

In the threat model of SybilRank, all nodes are divided into two disjoint sets: non-Sybils H, and Sybils S. Denote the non-Sybil region GH as the subgraph induced by the set H, which includes all non-Sybils and edges among them. Similarly, the Sybil region GS is the subgraph induced by S. GH and GS are connected by attack edges between Sybils and non-Sybils.

Some nodes identified as non-Sybils are designated as trust seeds for the operation of SybilRank. Seeding trust on multiple nodes makes SybilRank robust to seed selection errors, as incorrectly designating a node that is Sybil or close to Sybils as a seed causes only a small fraction of the total trust to be initialized and propagated in the Sybil region.

Below is an example of the threat model with trust seeds:

NOTE

An important assumption of SybilRank is that the number of attack edges is limited. Since SybilRank is designed for large scale attacks, where fake accounts are crafted and maintained at a low cost, and are thus unable to befriend many real users. It results in a sparse cut between GH and GS.

Early-Terminated Random Walk

In an undirected graph, if a random walk's transition probability to a neighbor node is uniformly distributed, when the number of steps is sufficient, the probability of landing at each node would converge to be proportional to its degree. The number of steps that a random walk needs to reach the stationary distribution is called the graph's mixing time.

SybilRank relies on the observation that an early-terminated random walk starting from a non-Sybil node (trust seed) has higher landing probability to land at a non-Sybil node than a Sybil node, as the walk is unlikely to traverse one of the relatively few attack edges. That is to say, there is a significant difference between the mixing time of the non-Sybil region GH and the entire graph.

SybilRank refers to the landing probability of each node as the node's trust. SybilRank ranks nodes according to their trust scores; nodes with low trust scores are ranked higher, indicating they are potential Sybil (fake) users.

Trust Propagation via Power Iteration

SybilRank uses the technique of power iteration to efficiently calculate the landing probability of random walks in large graphs. Power iteration involves successive matrix multiplications where each element of the matrix represents the random walk transition probability from one node to a neighbor node. Each iteration computes the landing probability distribution over all nodes as the random walk proceeds by one step.

In an undirected graph G = (V, E), initially a total trust TG is evenly distributed among all trust seeds. During each power iteration, a node first evenly distributes its trust to its neighbors; it then collects trust distributed by its neighbors and updates its own trust accordingly. The trust of node v in the i-th iteration is:

where node u belongs to the neighbor set of node v, deg(u) is the degree of node u. The total amount of trust TG remains unchanged all the time.

With sufficient power iterations, the trust of all nodes would converge to the stationary distribution:

However, SybilRank terminates the power iteration after a fixed number of steps, without waiting for full convergence, and it is suggested to be set as log2(|V|). This number of iterations is sufficient to reach an approximately stationary distribution of trust over the fast-mixing non-Sybil region GH, but limits the trust escaping to the Sybil region GS, thus non-Sybils will be ranked higher than Sybils.

NOTE

In practice, the mixing time of GH is affected by many factors, so log2(|V|) is only a reference, but it must be less than the mixing time of the whole graph.

Considerations

  • Each self-loop adds two degrees to its subject node.
  • The algorithm treats edges as undirected.
  • SybilRank's computational cost is O(n log n). This is because each power iteration costs O(n), and it iterates O(log n) times.

Example Graph

GQL
INSERT (H1:user {_id: "H1"}), (H2:user {_id: "H2"}),
       (H3:user {_id: "H3"}), (H4:user {_id: "H4"}),
       (H5:user {_id: "H5"}), (H6:user {_id: "H6"}),
       (H7:user {_id: "H7"}), (H8:user {_id: "H8"}),
       (H9:user {_id: "H9"}), (H10:user {_id: "H10"}),
       (S1:user {_id: "S1"}), (S2:user {_id: "S2"}),
       (S3:user {_id: "S3"}), (S4:user {_id: "S4"}),
       (S2)-[:link]->(H4), (S3)-[:link]->(H6),
       (S4)-[:link]->(S2), (S4)-[:link]->(S3),
       (S4)-[:link]->(H9), (H1)-[:link]->(H9),
       (H2)-[:link]->(H7), (H2)-[:link]->(H10),
       (H3)-[:link]->(H1), (H3)-[:link]->(H5),
       (H4)-[:link]->(H3), (H4)-[:link]->(H6),
       (H5)-[:link]->(H1), (H6)-[:link]->(H1),
       (H6)-[:link]->(H3), (H6)-[:link]->(H5),
       (H7)-[:link]->(H10), (H8)-[:link]->(H7)

Parameters

NameTypeDefaultDescription
trustedNodesSTRING/Comma-separated _ids of trusted seed nodes. Required.
iterationsINT0Number of iterations. 0 = auto (ceil(log2(n))).
limitINT-1Limits the number of results returned (-1 = all).
orderSTRING/Sorts the results by trust: asc or desc.

Run Mode

Returns:

ColumnTypeDescription
nodeIdSTRINGNode identifier (_id)
trustFLOATTrust score (lower = more likely sybil)
rankINTRank position (1 = most trusted)

SybilRank with H2, H3, and H5 as trust seeds:

GQL
CALL algo.sybilrank({
  trustedNodes: "H2,H3,H5",
  order: "desc"
}) YIELD nodeId, trust, rank

Result:

nodeIdtrustrank
H60.148726851851851861
H30.13356481481481482
H10.111072530864197523
H50.099652777777777784
H40.075347222222222235
H70.069444444444444456
H20.066358024691358017
H90.0591820987654320958
S30.054783950617283959
S20.05401234567901233610
H100.0524691358024691311
S40.04143518518518518612
H80.03395061728395061513
S1014

Stream Mode

Returns the same columns as run mode, streamed for memory efficiency.

GQL
CALL algo.sybilrank.stream({
  trustedNodes: "H2,H3,H5",
  order: "asc",
  limit: 4
}) YIELD nodeId, trust
RETURN nodeId, trust

Result:

nodeIdtrust
S10
H80.033950617283950615
S40.041435185185185186
H100.05246913580246913

Stats Mode

Returns:

ColumnTypeDescription
nodeCountINTTotal number of nodes
trustedCountINTNumber of trusted seed nodes
minTrustFLOATMinimum trust score
maxTrustFLOATMaximum trust score
avgTrustFLOATAverage trust score
GQL
CALL algo.sybilrank.stats({
  trustedNodes: "H2,H3,H5"
}) YIELD nodeCount, trustedCount, minTrust, maxTrust, avgTrust

Result:

nodeCounttrustedCountminTrustmaxTrustavgTrust
14300.148726851851851830.07142857142857142

Write Mode

Computes results and writes them back to node properties. The write configuration is passed as a second argument map.

Write parameters:

NameTypeDescription
db.propertySTRING or MAPNode property to write results to. String: writes the trust column in results to a property. Map: explicit column-to-property mapping (e.g., {trust: 'trust_score', rank: 'trust_rank'}).

Writable columns:

ColumnTypeDescription
trustFLOATTrust score
rankINTRank position

Returns:

ColumnTypeDescription
task_idSTRINGTask identifier for tracking via SHOW TASKS
nodesWrittenINTNumber of nodes with properties written
computeTimeMsINTTime spent computing the algorithm (milliseconds)
writeTimeMsINTTime spent writing properties to storage (milliseconds)
GQL
CALL algo.sybilrank.write({trustedNodes: "H2,H3,H5"}, {
  db: {
    property: "trust_score"
  }
}) YIELD task_id, nodesWritten, computeTimeMs, writeTimeMs