UltipaDocs
Try Playground
  • Introduction
  • Managing HDC Graphs
  • Managing Distributed Projections
  • Installing Algorithms
  • Running Algorithms
    • Degree Centrality
    • Closeness Centrality
    • Harmonic Centrality
    • Graph Centrality
    • Betweenness Centrality
    • Eigenvector Centrality
    • Katz Centrality
    • CELF
    • PageRank
    • ArticleRank
    • TextRank
    • HITS
    • SybilRank
    • Jaccard Similarity
    • Overlap Similarity
    • Cosine Similarity
    • Pearson Correlation Coefficient
    • Euclidean Distance
    • K-Hop All
    • Bipartite Graph
    • HyperANF
    • Connected Component
    • Triangle Counting
    • Induced Subgraph
    • k-Core
    • k-Truss
    • p-Cohesion
    • k-Edge Connected Components
    • Local Clustering Coefficient
    • Topological Sort
    • Schema Overview
    • Dijkstra's Single-Source Shortest Path
    • Delta-Stepping Single-Source Shortest Path
    • Shortest Path Faster Algorithm (SPFA)
    • Minimum Spanning Tree
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
    • Adamic-Adar Index
    • Common Neighbors
    • Preferential Attachment
    • Resource Allocation
    • Total Neighbors
    • Louvain
    • Leiden
    • Label Propagation
    • HANP
    • k-Means
    • kNN (k-Nearest Neighbors)
    • K-1 Coloring
    • Conductance
      • Random Walk
      • Node2Vec Walk
      • Node2Vec
      • Struc2Vec Walk
      • Struc2Vec
      • LINE
      • Fast Random Projection
      • Summary of Graph Embedding
      • Gradient Descent
      • Backpropagation
      • Skip-gram
      • Skip-gram Optimization
  1. Docs
  2. /
  3. Graph Analytics & Algorithms
  4. /
  5. Centrality

HITS

HDC

Overview

The HITS (Hyperlink-Induced Topic Search) algorithm was developed by L.M. Kleinberg in 1999 with the purpose of improving the quality of search methods on the World Wide Web (WWW). HITS makes use of the mutual reinforcing relationship between authorities and hubs to evaluate and rank a set of linked entities.

  • L.M. Kleinberg, Authoritative Sources in a Hyperlinked Environment (1999)

Concepts

Authority and Hub

In WWW, hyperlinks represent some latent human judgment: the creator of page p, by including a link to page q, has in some measure conferred authority on q. Instructively, a node with large in-degree is viewed as an authority.

If a node points to a considerable number of authoritative nodes, it is referred to as a hub.

As illustrated in the graph below, red nodes represent good authorities, while green nodes represent good hubs.

Hubs and authorities exhibit a mutually reinforcing relationship: a good hub points to many good authorities; a good authority is pointed to by many good hubs.

Compute Authorities and Hubs

HITS algorithm operates on the whole graph iteratively to compute the authority weight (denoted as x) and hub weight (denoted as y) for each node through the link structure. Nodes with larger x-values and y-values are viewed as better authorities and hubs respectively.

In a directed graph G = (V, E), all nodes are initialized with x = 1 and y = 1. In each iteration, for each node p ∈ V, update its x and y values as follows:

Here is an example:

At the end of one iteration, normalize all x values and all y values to meet the invariant below:

The algorithm iterates until the changes in all x and y values converge within a specific tolerance, or until the maximum number of iterations is reached. In the experiments of the original author, the convergence is quite rapid, 20 iterations are normally sufficient.

Considerations

  • In HITS algorithm, self-loops are ignored.
  • Nodes with no in-links are assigned an authority weight of 0, while nodes with no out-links are assigned a hub weight of 0.

Example Graph

Run the following statements on an empty graph to define its structure and insert data:

INSERT (A:default {_id: "A"}),
       (B:default {_id: "B"}),
       (C:default {_id: "C"}),
       (D:default {_id: "D"}),
       (E:default {_id: "E"}),
       (F:default {_id: "F"}),
       (G:default {_id: "G"}),
       (H:default {_id: "H"}),
       (A)-[:default]->(F),
       (B)-[:default]->(A),
       (C)-[:default]->(A),
       (C)-[:default]->(B),
       (D)-[:default]->(A),
       (D)-[:default]->(F),
       (E)-[:default]->(A),
       (E)-[:default]->(G),
       (F)-[:default]->(H),
       (G)-[:default]->(F);

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as my_hdc_graph:

CREATE HDC GRAPH my_hdc_graph ON "hdc-server-1" OPTIONS {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static"
}

Parameters

Algorithm name: hits_centrality

Name
Type
Spec
Default
Optional
Description
max_loop_numInteger≥120YesThe maximum number of iteration rounds. The algorithm will terminate after completing all rounds.
toleranceFloat(0,1)0.001YesThe algorithm terminates when the changes in all authority and hub weights between iterations are less than the specified tolerance, indicating that the result is stable.
return_id_uuidStringuuid, id, bothuuidYesIncludes _uuid, _id, or both values to represent nodes in the results.
limitInteger≥-1-1YesLimits the number of results returned. Set to -1 to include all results.

File Writeback

algo(hits_centrality).params({
  return_id_uuid: "id",
  projection: "my_hdc_graph"
}).write({
  file: {
    filename: "ranks"
  }
})

Result:

File: ranks
_id,authority,hub
D,0,0.572083
F,0.42642,1.43197e-11
H,3.20199e-11,0
B,0.213196,0.381382
A,0.852796,0.190701
E,0,0.476726
C,0,0.476726
G,0.213196,0.190701

DB Writeback

Writes the authority and hub values from the results to the specified node property. The property types are both double.

algo(hits_centrality).params({
  projection: "my_hdc_graph",
  max_loop_num: 20,
  tolerance: 0.0001
}).write({
  db: {
    authority: 'auth',
    hub: 'hub'
  }
})

Full Return

exec{
  algo(hits_centrality).params({
    return_id_uuid: "id"
  }) as r
  return r
} on my_hdc_graph

Result:

_idauthorityhub
D00.572083
F0.426420
H00
B0.2131960.381382
A0.8527960.190701
E00.476726
C00.476726
G0.2131960.190701

Stream Return

exec{
  algo(hits_centrality).params({
    return_id_uuid: "id",
    max_loop_num: 20,
    tolerance: 0.0001
  }).stream() as r
  return r._id, r.hub order by r.hub
} on my_hdc_graph

Result:

r._idr.hub
D0.572083
E0.476726
C0.476726
B0.381382
A0.190701
G0.190701
F0
H0