✓ File Writeback ✓ Property Writeback ✓ Direct Return ✓ Stream Return ✕ Stats
Overview
PageRank was originally proposed in the context of World Wide Web (WWW), it takes advantage of the link structure of WWW to produce a global objective 'importance' ranking of webpages that can be used by search engines. This algorithm was proposed in 1997-1998 by Google co-founders Larry Page and Sergey Brin.
- L. Page, S Brin, R. Motwani, T. Winograd, The PageRank Citation Ranking: Bringing Order to The Web (1998)
With the development of technology and the emergence of enormous correlation data, PageRank has been adopted in many other fields too.
Concepts
Link Structure and PageRank
In WWW, hypertexts contained in webpages create links between webpages. Every webpage (node) can have some forward links (via out-edges) and backlinks (via in-edges). In the following graph, A and B are backlinks of C, D is a forward link of C.
Webpages vary greatly in terms of the number of backlinks they have. Naturally, webpages that are more important, authoritative or of high quality are likely to receive more or more important backlinks.
PageRank can be described as this: a page has high rank if the sum of the ranks of its backlinks is high. This covers both the case when a page has many backlinks and when a page has a few highly ranked backlinks.
Rank Propagation
The ranks (scores) of all pages are computed in a recursive way by starting with any set of ranks and iterating the computation until it converges. In each iteration, a page gives out its rank to all its forward links evenly to contribute to the ranks of the pages it points to; meanwhile every page receives ranks from its backlinks, so the rank of page u after one iteration is:
where Bu is the backlink set of u.
Below shows a steady state of a set of pages:
Damping Factor
Consider the following kinds of webpages:
- Webpages with no backlinks. The rank they receive is 0, but they still need to be browsed in the Internet.
- Webpages with no forward links. Their ranks are lost from the system.
- A group of webpages that only point to pages within the group, but not any page outside the group.
To overcome these problems, a damping factor, whose value is between 0 and 1, is introduced. It gives each webpage a base rank while weakening the ranks passed from backlinks. The rank of page u after one iteration becomes:
where d is the damping factor. For example, when d is 0.7, if a webpage receives 8 ranks in total from backlinks, then the rank of this webpage is updated to 0.7*8 + (1-0.7) = 5.9
.
Damping factor can also be understood as the probability that a web surfer randomly jump to a webpage that is not one of the forward links of the current webpage.
Considerations
- The rank of isolated webpage will stay the same as the value of (1 - d).
- Self-loop is regarded as a forward link and a backlink, a webpage would pass some rank to itself through self-loop. If a network has many self-loops, it will take more iterations to converge.
Syntax
- Command:
algo(page_rank)
- Parameters:
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
init_value | float | >0 | 0.2 |
Yes | The same initial rank for all nodes |
loop_num | int | >=1 | 5 |
Yes | Number of iterations |
damping | float | (0,1) | 0.8 |
Yes | Damping factor |
weaken | int | 1 , 2 |
1 |
Yes | For PageRank, keep it as 1 ; 2 means to run ArticleRank |
limit | int | ≥-1 | -1 |
Yes | Number of results to return, -1 to return all results |
order | string | asc , desc |
/ | Yes | Sort nodes by the rank |
Examples
The example graph is as follows:
File Writeback
Spec | Content |
---|---|
filename | _id ,rank |
algo(page_rank).params({
init_value: 1,
loop_num: 50,
damping: 0.8,
weaken: 1,
order: 'desc'
}).write({
file: {filename: 'rank'}
})
Results: File rank
E,3.96235
F,1.61052
N,1.48175
G,1.25663
I,1.25663
B,0.844209
L,0.844209
K,0.702651
M,0.48106
J,0.36
H,0.333333
A,0.333333
C,0.333333
D,0.2
Property Writeback
Spec | Content | Write to | Data Type |
---|---|---|---|
property | rank |
Node property | float |
algo(page_rank).params({
loop_num: 50,
weaken: 1
}).write({
db:{property: 'PR'}
})
Results: Rank for each node is written to a new property named PR
Direct Return
Alias Ordinal | Type | Description | Columns |
---|---|---|---|
0 | []perNode | Node and its rank | _uuid , rank |
algo(page_rank).params({
init_value: 1,
loop_num: 50,
damping: 0.8,
weaken: 1,
order: 'desc',
limit: 5
}) as PR
return PR
Results: PR
_uuid | rank |
---|---|
5 | 3.9623489 |
6 | 1.6105210 |
14 | 1.4817491 |
7 | 1.2566270 |
9 | 1.2566270 |
Stream Return
Alias Ordinal | Type | Description | Columns |
---|---|---|---|
0 | []perNode | Node and its rank | _uuid , rank |
algo(page_rank).params({
loop_num: 50,
damping: 0.8,
weaken: 1,
order: 'desc',
limit: 5
}).stream() as PR
find().nodes({_uuid == PR._uuid}) as nodes
return table(nodes._id, PR.rank)
Results: table(nodes._id, PR.rank)
nodes._id | PR.rank |
---|---|
E | 3.9623020 |
F | 1.6104970 |
N | 1.4817290 |
G | 1.2566110 |
I | 1.2566110 |