Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit
Search
v4.0
    v4.0

    HANP

    Overview

    HANP (Hop Attenuation & Node Preference) algorithm is an extension of the Label Propagation algorithm (LPA). It introduces a label score attenuation mechanism based on LPA, and considers the influence of neighbor node degree on neighbor label weight. The algorithm was proposed by Ian X.Y. Leung in 2009.

    Related material of the algorithm:

    Basic Concept

    Label Score

    Label score indicates the propagation ability of the label. The HANP algorithm assigns all labels initial score of 1, and each time a node adopts new label from its neighbor (i.e. the label hops once), the score of the new label being propagated to that node will have a hop attenuation, this is what HA (Hop Attenuation) means in the name of the algorithm. The hop attenuation is denoted as δ, which can prevent large cluster from dominating. In the graph below, given δ = 0.3, when the green node adopts label a, the score of label a on the green node would attenuate to 2 - 0.3 = 1.7.

    When the adopted new label comes from multiple neighbors, the score of the label is attenuated from the maximum score.

    Preference For Node Degree

    Preference for node degree means that node with greater degree has better ability to propagate labels; in other words, a node tends to adopt label from neighbor with greater degree, this is what NP (Node Preference) means in the name of the algorithm.

    In the graph above, the green node has two neighbors - the red node with degree of 6 (self-loop edge is counted twice), and the yellow node with the degree of 4, so the green node prefers to adopt label a.

    Of course, it is also possible to prioritize the propagation of label of node with lower degrees. Therefore, the algorithm considers the exponential power of the degree of the node where the label is located. When the power exponent is greater than 0, labels with high node degree will be propagated preferentially; when the power exponent is less than 0, labels with low node degree are preferentially propagated; when the power exponent is 0, the node degree does not affect the priority of label propagation.

    Label Weight

    Node j has label l, when it propagates to its neighbor node i, the weight of label l (i.e. w(l)) equals to the product of the score of l on j (i.e. s(j)), the exponential power function of the degree of j, and the weight of all edges between i and j; when i has multiple neighbor nodes with label l, the weight of each l needs to be summed:

    where d(j) is the degree of j, m is the power exponent, A(ij) is the total weight of all edges between i and j.

    For any node i in the graph, label l is on i's neighbor, then i's new label l' can be denoted as:

    Let s(l') be the score of the label on the neighbor nodes, δ is the hop attenuation, so the score of the new label l' on i is:

    Please note, δ = 0 if the selected label is equal to the current label, i.e. the score of the label does not change.

    The iteration process of the HANP algorithm is similar to LPA. All label scores are set to 1 at the beginning; during each iteration, it calculates for each node whether there is any label different from its current label or any label same with its current label but has higher score that can be adopted from its neighbors; and after all nodes are calculated, updates the nodes which can be updated. Iterating and looping by this rule until no node could adopt new label or update label score, or the number of iterations reaches the limit.

    Different from LPA, since the HANP algorithm applies label score, which can prevent label oscillations, no extra interrupt mechanism is needed.

    Special Case

    Lonely Node, Disconnected Graph

    For disconnected graph, there is no edge between its lonely nodes and connected components to propagate labels, connected components do not contain the same initial labels will not be divided into the same community.

    Self-loop Edge

    The HANP algorithm treats the self-loop edge in the same way with the Degree algorithm algo(degree), which counts each self-loop edge twice.

    Directed Edge

    For directed edges, the HANP algorithm ignores the direction of edges but calculates them as undirected edges.

    Results and Statistics

    Run HANP algorithm in the graph below, all node and edge weights are 1, the letter wrote in the node is the initial label of node, node 16 has no label, the algorithm iterates 10 rounds, the power exponent m is 0.1, hop attenuation δ is 0.2:

    Algorithm results: Keep maximum one label for each node, return _uuid, label_1 and score_1

    _uuid label_1 score_1
    1 A -0.200000
    2 F -0.200000
    3 F -0.200000
    4 F -0.200000
    5 F -0.200000
    6 A -0.200000
    7 A -0.200000
    8 A -0.200000
    9 A -0.200000
    10 J 1
    11

    Algorithm statistics: Number of labels label_count

    label_count
    3

    Command and Configuration

    • Command: algo(hanp)
    • Configurations for the parameter params():
    Name
    Type
    Default
    Specification Description
    loop_num int 5 >=1 Number of iterations
    node_label_property @<schema>.<property> / Numeric or string class node property, LTE needed Name of the node schema and property as label; node without this property will not participate in the calculation; random number is used as label if not set
    edge_weight_property @<schema>.<property> / Numeric edge property, LTE needed Name of the edge schema and property as edge weight; edge without this property will not participate in the calculation; weight is 1 if not set
    m float / Mandatory The power exponent of the degree of neighbor node, which indicates the preference of node degree; m = 0 means to ignore node degree, m > 0 means to prioritize neighbors with high degree, m < 0 means to prioritize neighbors with low degree
    delta float / (0,1); mandatory Hop attenuation of label score during propagation
    limit int -1 >= -1 Number of results to return; return all results if sets to -1 or not set

    Example: Run HANP, label locates on node property @default.label, edge weight locates on edge property @default.level, the power exponent is 0.1, hop attenuation is 0.2, iterate 10 rounds

    algo(hanp).params({ 
      loop_num: 10,
      node_label_property: "@default.label", 
      edge_weight_property: "@default.level", 
      m: 0.1, 
      delta: 0.2 
    }) as a return a
    

    Algorithm Execution

    Task Writeback

    1. File Writeback

    Configuration
    Data in Each Row
    filename _id,label_1,score_1

    Example: Run HANP, label locates on node property @default.label, edge weight locates on edge property @default.level, the power exponent is 0.1, hop attenuation is 0.2, iterate 10 rounds, write the algorithm results back to file named hanp

    algo(hanp).params({ 
      loop_num: 10,
      node_label_property: "@default.label", 
      edge_weight_property: "@default.level", 
      m: 0.1, 
      delta: 0.2 
    }).write({
      file:{
        filename: "hanp"
      }
    })
    

    2. Property Writeback

    Configuration
    Writeback Content
    Type
    Data Type
    property label_1,score_1 Node property Data type of label is string, data type of label score is float

    Example: Run HANP, label locates on node property @default.label, weight of each edge is 1, the power exponent is 0.1, hop attenuation is 0.2, iterate 10 rounds, write the algorithm results back to node properties named tag_1 and score_1

    algo(hanp).params({ 
      loop_num: 10,
      node_label_property: "@default.label", 
      m: 0.1, 
      delta: 0.2 
    }).write({
      db:{
        property: "tag"
      }
    })
    

    3. Statistics Writeback

    Name Data Type Description
    label_count int Number of labels

    The number of labels means the number of labels left after the algorithm is finished; the HANP algorithm only keeps one label the maximum for each node, the number of labels is the number of communities.

    Example: Run HANP, label locates on node property @default.label, weight of each edge is 1, the power exponent is 0.1, hop attenuation is 0.2, iterate 10 rounds, write the algorithm statistics back to task information

    algo(hanp).params({ 
      loop_num: 10,
      node_label_property: "@default.label", 
      m: 0.1, 
      delta: 0.2 
    }).write()
    

    Direct Return

    Alias Ordinal
    Type
    Description Column Name
    0 []perNode Node and its label, label score _uuid, label_1, score_1
    1 KV Number of labels label_count

    Example: Run HANP, label locates on node property @default.label, weight of each edge is 1, the power exponent is 0.1, hop attenuation is 0.2, iterate 10 rounds, define algorithm results and statistics as aliases named results and count, and return the results and statistics

    algo(hanp).params({ 
      loop_num: 10,
      node_label_property: "@default.label", 
      m: 0.1, 
      delta: 0.2 
    }) as results, count 
    return results, count
    

    Streaming Return

    Alias Ordinal
    Type
    Description Column Name
    0 []perNode Node and its label, label score _uuid, label_1, score_1

    Example: Run HANP, label locates on node property @default.label, weight of each edge is 1, the power exponent is 0.1, hop attenuation is 0.2, iterate 10 rounds, and return the results ordered by the ascending label name and node UUID

    algo(hanp).params({ 
      loop_num: 10,
      node_label_property: "@default.label", 
      m: 0.1, 
      delta: 0.2 
    }).stream() as res 
    return res order by res.label_1, res._uuid
    

    Real-time Statistics

    Alias Ordinal
    Type
    Description Column Name
    0 KV Number of labels label_count

    Example: Run HANP, label locates on node property @default.label, weight of each edge is 1, the power exponent is 0.1, hop attenuation is 0.2, iterate 10 rounds, and return the number of communities

    algo(hanp).params({ 
      loop_num: 10,
      node_label_property: "@default.label", 
      m: 0.1, 
      delta: 0.2 
    }).stats() as count 
    return count
    

    Consistency of Results

    Affected by factors such as the order of nodes, the random selection of labels with the same weights, the parallel calculation and so on, the community division results of the HANP algorithm may be different.

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写