Overview
The HITS (Hyperlink-Induced Topic Search) algorithm was developed by L.M. Kleinberg in 1999 with the purpose of improving the quality of search methods on the World Wide Web (WWW). HITS makes use of the mutual reinforcing relationship between authorities and hubs to evaluate and rank a set of linked entities.
- L.M. Kleinberg, Authoritative Sources in a Hyperlinked Environment (1999)
Concepts
Authority and Hub
In WWW, hyperlinks represent some latent human judgment: the creator of page p, by including a link to page q, has in some measure conferred authority on q. Instructively, a node with large in-degree is viewed as an authority.
If a node points to considerable number of authoritative nodes, it is referred to as a hub.
As illustrated in the graph below, red nodes are good authorities, green nodes are good hubs.
Hubs and authorities exhibit what could be called a mutually reinforcing relationship: a good hub points to many good authorities; a good authority is pointed to by many good hubs.
Compute Authorities and Hubs
HITS algorithm operates on the whole graph iteratively to compute the authority weight (denoted as x) and hub weight (denoted as y) for each node through the link structure. Nodes with larger x-values and y-values are viewed as better authorities and hubs respectively.
In a directed graph G = (V, E), all nodes are initialized with x = 1 and y = 1. In each iteration, for each node p ∈ V, update its x and y values as follows:
Here is an example:
At the end of one iteration, normalize all x values and all y values to meet the invariant below:
The algorithm continues until the change of all x values and y values converges to within some tolerance, or the maximum iteration rounds is met. In the experiments of the original author, the convergence is quite rapid, 20 iterations are normally sufficient.
Considerations
- In HITS algorithm, self-loops are ignored.
- Authority weight of nodes with no in-links is 0, hub weight of nodes with out-links is 0.
Syntax
- Command:
algo(hits_centrality)
- Parameters:
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
max_loop_num | int | >=1 | 20 |
Yes | Maximum rounds of iterations; the algorithm ends after running for all rounds, even though the condition of tolerance is not met |
tolerance | float | (0,1) | 0.001 |
Yes | When all authority weights and hub weights change less than the tolerance between iterations, the result is considered stable and the algorithm ends |
limit | int | ≥-1 | -1 |
Yes | Number of results to return, -1 to return all results |
Examples
The example graph is as follows:
File Writeback
Spec | Content |
---|---|
filename | _id ,authority ,hub |
algo(hits_centrality).params({}).write({
file: {
filename: 'rank'
}
})
Results: File rank
H,0.000000,0.000000
G,0.213196,0.190701
F,0.426420,0.000000
E,0.000000,0.476726
D,0.000000,0.572083
C,0.000000,0.476726
B,0.213196,0.381382
A,0.852796,0.190701
Property Writeback
Spec | Content | Write to | Data Type |
---|---|---|---|
authority | authority |
Node property | double |
hub | hub |
Node property | double |
algo(hits_centrality).params({
max_loop_num: 20,
tolerance: 0.0001
}).write({
db: {
authority: 'auth',
hub: 'hub'
}
})
Results: Authority weight for each node is written to a new property named auth, hub weight for each node is written to a new property named hub
Direct Return
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|
0 | []perNode | Node and its authority and hub weight | _uuid , authority , hub |
algo(hits_centrality).params() as rank
return rank
Results: rank
_uuid | authority | hub |
---|---|---|
8 | 3.20199049138017e-11 | 0 |
7 | 0.213196444093741 | 0.190700611234451 |
6 | 0.426419530029166 | 1.43197368054726e-11 |
5 | 0 | 0.476726292571473 |
4 | 0 | 0.572082555485605 |
3 | 0 | 0.476726292571473 |
2 | 0.213196444093741 | 0.381381944251153 |
1 | 0.852795952652963 | 0.190700611234451 |
Stream Return
Alias Ordinal |
Type |
Description | Columns |
---|---|---|---|
0 | []perNode | Node and its authority and hub weight | _uuid , authority , hub |
algo(hits_centrality).params({
max_loop_num: 20,
tolerance: 0.0001
}).stream() as rank
find().nodes({_uuid == rank._uuid}) as nodes
order by rank.hub desc
return table(nodes._id, rank.hub)
Results: table(nodes._id, rank.hub)
nodes._id | rank.hub |
---|---|
D | 0.572082555485605 |
E | 0.476726292571473 |
C | 0.476726292571473 |
B | 0.381381944251153 |
G | 0.190700611234451 |
A | 0.190700611234451 |
F | 1.43197368054726e-11 |
H | 0 |