UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Introduction
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Eccentricity Centrality
    • Betweenness Centrality
    • Bridges
    • Articulation Points
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • KNN
    • Vector Similarity
    • Bipartite Graph
    • HyperANF
    • Weakly Connected Components (WCC)
    • Strongly Connected Components (SCC)
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Triangle Count
    • Clique Count
    • k-Core
    • k-Truss
    • p-Cohesion
    • Induced Subgraph
    • Topological Sort
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Dijkstra's Shortest Path
    • A* Shortest Path
    • Yen's K-Shortest Paths
    • Shortest Path (BFS)
    • Delta-Stepping SSSP
    • Shortest Path Faster Algorithm (SPFA)
    • All-Pairs Shortest Path (APSP)
    • Minimum Spanning Tree (MST)
    • K-Spanning Tree
    • Steiner Tree
    • Prize-Collecting Steiner Tree (PCST)
    • Minimum Cost Flow
    • Maximum Flow
    • K-Hop Fast
    • Longest Path (DAG)
    • Random Walk
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Same Community
    • Louvain
    • Leiden
    • Modularity Optimization
    • Label Propagation
    • HANP
    • SLPA
    • k-Means
    • HDBSCAN
    • K-1 Coloring
    • Modularity
    • Conductance
    • Max k-Cut
      • Node2Vec
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Algorithms
  4. /
  5. Centrality

HITS

Overview

The HITS (Hyperlink-Induced Topic Search) algorithm was developed by L.M. Kleinberg in 1999 with the purpose of improving the quality of search methods on the World Wide Web (WWW). HITS makes use of the mutual reinforcing relationship between authorities and hubs to evaluate and rank a set of linked entities.

  • L.M. Kleinberg, Authoritative Sources in a Hyperlinked Environment (1999)

Concepts

Authority and Hub

In WWW, hyperlinks represent some latent human judgment: the creator of page p, by including a link to page q, has in some measure conferred authority on q. Instructively, a node with large in-degree is viewed as an authority.

If a node points to a considerable number of authoritative nodes, it is referred to as a hub.

As illustrated in the graph below, red nodes represent good authorities, while green nodes represent good hubs.

Hubs and authorities exhibit a mutually reinforcing relationship: a good hub points to many good authorities; a good authority is pointed to by many good hubs.

Compute Authorities and Hubs

HITS algorithm operates on the whole graph iteratively to compute the authority weight (denoted as x) and hub weight (denoted as y) for each node through the link structure. Nodes with larger x-values and y-values are viewed as better authorities and hubs respectively.

In a directed graph G = (V, E), all nodes are initialized with x = 1 and y = 1. In each iteration, for each node p ∈ V, update its x and y values as follows:

Here is an example:

At the end of one iteration, normalize all x values and all y values to meet the invariant below:

The algorithm iterates until the changes in all x and y values converge within a specific tolerance, or until the maximum number of iterations is reached. In the experiments of the original author, the convergence is quite rapid, 20 iterations are normally sufficient.

Considerations

  • In HITS algorithm, self-loops are ignored.
  • Nodes with no in-links are assigned an authority weight of 0, while nodes with no out-links are assigned a hub weight of 0.

Example Graph

GQL
INSERT (A:default {_id: "A"}), (B:default {_id: "B"}),
       (C:default {_id: "C"}), (D:default {_id: "D"}),
       (E:default {_id: "E"}), (F:default {_id: "F"}),
       (G:default {_id: "G"}), (H:default {_id: "H"}),
       (A)-[:default]->(F), (B)-[:default]->(A),
       (C)-[:default]->(A), (C)-[:default]->(B),
       (D)-[:default]->(A), (D)-[:default]->(F),
       (E)-[:default]->(A), (E)-[:default]->(G),
       (F)-[:default]->(H), (G)-[:default]->(F)

Parameters

NameTypeDefaultDescription
maxIterationsINT20Maximum number of iterations.
toleranceFLOAT0.000001Convergence tolerance. The algorithm terminates when changes in all authority and hub weights between iterations are less than this value.
limitINT-1Limits the number of results returned (-1 = all).
orderSTRING/Sorts the results by authScore: asc or desc.

Run Mode

Returns:

ColumnTypeDescription
nodeIdSTRINGNode identifier (_id)
hubScoreFLOATHub score
authScoreFLOATAuthority score

HITS for all nodes:

GQL
CALL algo.hits({
  maxIterations: 50,
  tolerance: 0.001,
  order: "desc"
}) YIELD nodeId, hubScore, authScore

Result:

nodeIdhubScoreauthScore
A0.191085204393699220.8524670163199872
F2.2374583080847388e-70.42727940782846513
G0.191085204393699220.21299330239705236
B0.381234927468788230.21299330239705236
H05.003107718113052e-7
E0.47648845005223320
D0.57232013186248740
C0.47648845005223320

Stream Mode

Returns the same columns as run mode, streamed for memory efficiency.

GQL
CALL algo.hits.stream({
  order: "desc"
}) YIELD nodeId, hubScore
RETURN nodeId, hubScore
ORDER BY hubScore DESC

Result:

nodeIdhubScore
D0.5720777504129628
E0.4767310977570136
C0.4767310977570136
B0.38138491402847763
A0.19069283638448514
G0.19069283638448514
F4.5823157782445074e-15
H0

Stats Mode

Returns:

ColumnTypeDescription
nodeCountINTTotal number of nodes
minHubScoreFLOATMinimum hub score
maxHubScoreFLOATMaximum hub score
minAuthScoreFLOATMinimum authority score
maxAuthScoreFLOATMaximum authority score
GQL
CALL algo.hits.stats() YIELD nodeCount, minHubScore, maxHubScore, minAuthScore, maxAuthScore

Result:

nodeCountminHubScoremaxHubScoreminAuthScoremaxAuthScore
800.572077750412962800.8528025933604596

Write Mode

Computes results and writes them back to node properties. The write configuration is passed as a second argument map.

Write parameters:

NameTypeDescription
db.propertySTRING or MAPNode property to write results to. String: writes the authScore column in results to a property. Map: explicit column-to-property mapping (e.g., {hubScore: 'hub', authScore: 'auth'}).

Writable columns:

ColumnTypeDescription
hubScoreFLOATHub score
authScoreFLOATAuthority score

Returns:

ColumnTypeDescription
task_idSTRINGTask identifier for tracking via SHOW TASKS
nodesWrittenINTNumber of nodes with properties written
computeTimeMsINTTime spent computing the algorithm (milliseconds)
writeTimeMsINTTime spent writing properties to storage (milliseconds)
GQL
CALL algo.hits.write({}, {
  db: {
    property: {hubScore: "hub", authScore: "auth"}
  }
}) YIELD task_id, nodesWritten, computeTimeMs, writeTimeMs