Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit
Search
v2.x
    v4.0

    PageRank

      Basic  

    Overview

    PageRank algorithm is an iterative algorithm that passes the scores of nodes along the direction of the directed edges in directed graph until the score distribution of the whole graph converges, the underlying assumption is that "more important web pages are likely to receive more or more important backlinks from other web pages". This algorithm was proposed from 1997 to 1998 by Google co-founders Larry Page and Sergey Brin with the purpose to calculate the popularity and importance of web pages, so as to provide a basis for ranking search results. With the development of technology and the emergence of enormous correlation data, the usage of PageRank has been drived into many other fields.

    Related materials of the algorithm are as below:

    Basic Concept

    Forward Link, Backlink

    The 1-step neighbor of a node that is connected with an outbound edge of that node is called the forward link of that node; on the contrary, it's called the backlink of that node if it's connected with an inbound edge of the node. When using node to represent web page, forward links of node are the web pages that are referenced by that web page, and backlinks represent the web pages that reference that web page.

    Graph: A and B are backlinks of C, D is forward link of C

    PageRank

    In the PageRank algorithm, all nodes are assigned an initial score; the score is evenly divided among all outbound edges of the node and passed to each forward link of that node; in the meantime, the node receives score from its various backlinks, and the sum of these received scores is the node's score in this round of transfer:

    In the formula above, node j is any backlink of node i, D(j) is the outbound degree of j. For example:

    When the above scores become steady after multiple rounds of transfer, it can be used to represent the popularity and importance of the web page:

    ArticleRank

    ArticleRank is an extended application of the PageRank algorithm. The reference of publications is very similar to the general web page reference, but there are also differences. Since a publication cannot reference itself, two publications cannot reference each other, and the references in a published publication do not change, a reference graph of publications has the following features: node does not have self-loop edge, a node cannot be both the forward link and the backlink of another node, and the forward links (outbound degree) of node will not change.

    In comparison with PageRank, Ultipa's ArticleRank divides the score of node equally, not by the outbound degree (number of edges in the outbound direction) of that node, but "the sum of the outbound degree of that node and the average outbound degree of nodes in the whole graph" (which also differs from the original ArticleRank). Intuitively, this change will greatly weaken the score transfer capability of nodes with the outbound degree that is much lower than the average outbound degree of the whole graph.

    In the formula above, D(avg) is the average outbound degree of the whole graph. Let's say D(avg) is 2, and using this method to re-calculate the previous score transfer process:

    There is no cycle in an ideal publication reference graph, thus the total score of the whole graph would decrease as the number of transfer rounds increases, and the score of ArticleRank will never become steady without a damping coefficient.

    Damping Coefficient

    Taking PageRank as an example, consider those web pages that are not referenced by any other sites (nodes with inbound degree of 0 and no backlink), such as those lonely web pages, although the score they receive is always 0, they still need to be browsed in the real network; or those web pages that do not reference any other sites (nodes with outbound degree of 0 and no forward link), the score transfer of the whole graph should not become meaningless due to the score loss of these nodes.

    To deal with the two cases above as well as the problem that ArticleRank can not converge, damping coefficient - a numeric value between 0 to 1 - is introduced to give each node a floor score while weakening the scores passed from the backlinks (if has) of the node. Take PageRank, for example, when the damping coefficient is 0.7, each node gets a floor score of 1 - 0.7 = 0.3, let's say the score a node receives is 8, it will be weakened to 8 * 0.7 = 5.6 instead, and the sum of the two parts is the score of the node in this round: 0.3 + 5.6 = 5.9.

    The figure below shows a steady state of PageRank calculation with damping coefficient equals to 0.8, and the initial socre of each node is 1 (please see Algorithm Process - Example below for the result of each iteration):

    Algorithm Process

    Formula

    Using c to represent the damping coefficient, the score of node after each round of transfer is:

    Ultipa supports both PageRank and ArticleRank score calculation methods, which can be configured by algorithm parameters.

    Iteration

    At the beginning of the algorithm, the initial score of all points is set according to the algorithm parameters; in each iteration, the score is calculated and updated for each node. Iterating and looping by this rule until the score of nodes in the whole graph no longer changes, or the number of iterations reaches the limit.

    Example

    Verify the steady state of PageRank calculation with a damping coefficient above, the initial score of nodes in the graph is 1 and damping coefficient is 0.8.

    Score Division and Transfer Transfer Result
    Initial State
    1st Iteration
    2nd Iteration
    3rd Iteration
    4th Iteration
    5th Iteration

    Special Case

    Lonely Node, Disconnected Graph

    With the introduction of damping coefficient c, the score lonely node gets is always 1-c (except for the initial state).

    There is no score transfer between different connected components in disconnected graph, the score transfer reaches a steady state inside each connected component.

    Self-loop Edge

    In PageRank algorithm, each self-loop edge is regarded as a valid outbound degree and a valid inbound degree, that is, self-loop edge of a node passes a part of the score to the node itself, and the execution of this algorithm on the graph with self-loop edge usually requires more iterations to stablize.

    Directed Edge

    The score of nodes are divided and passed along the direction of directed edges in PageRank algorithm.

    Command

    algo(page_rank).params(<>)

    Configuration Item Type Default Value Specification Description
    init_value float 0.2 >0 Initial score
    loop_num int 5 >=1 Number of iterations of the algorithm
    damping float 0.8 0~1 Damping coefficient, i.e. the probablity that users continue to stay on the current page for browsing
    weaken int 1 1 or 2 1: To calculate PageRank
    2: To calculate ArticleRank
    limit int -1 -1 or >=0 Number of results to return, -1 means to return all results
    order string / ASC or DESC, case insensitive To sort the retuned results, no sorting is applied if not set

    Example: Execute PageRank, and set the number of iterations as 3, damping coefficient as 0.7, default score as 1

    algo(page_rank)
      .params({ loop_num: 3, damping: 0.7, init_value: 1, limit: -1 })
    

    File Writeback

    .write({file: {<>}})

    Parameter Type Default Value Specification Description
    filename string / / Name of the file path to be written back. Columns of the file are: _idrank

    Property Writeback

    .write({db: {<>}})

    Parameter Type Default Value Specification Description
    property string / / Name of the property to be written back. Write the score back to: <property>

    Statistics Writeback

    (Not supported)

    Direct Return

    as <alias> return <alias>

    Alias Number Type Description Column Name
    0 []perNode Node and its score _uuid, rank

    Streaming Return

    .stream() as <alias> return <alias>

    Alias Number Type Description Column Name
    0 []perNode Node and its score _uuid, rank

    Real-time Statistics

    (Not supported)

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写