Change Password

Please enter the password.
Please enter the password. Between 8-64 characters. Not identical to your email address. Contain at least 3 of uppercase, lowercase, numbers, and special characters (such as @*&#).
Please enter the password.
Submit

Change Nickname

Current Nickname:
Submit

Search
v4.0
    v4.0

    CELF

    Overview

    CELF (Cost Effective Lazy Forward) algorithm selects some seed nodes in a network as the propagation source to reach the effect of Influence Maximimzation (IM). CELF algorithm was proposed by Jure Leskovec et al. in 2007, it improves the traditional greedy algorithm based on the IC model by taking advantage of the submodularity. It only calculates the influence score of all nodes only at the initial stage and does not recalculate for all nodes afterwards, hence cost-effective. A typical application scenario of the algorithm is to prevent epidemic outbreaks by selecting a small group of people to monitor, so that any disease can be detected early in the outbreak.

    Related materials of the algorithm:

    Basic Concepts

    IC Model

    One common approach of studying a system is to abstract it into a mathematical model and study the model in replacement of studying the system itself, namely, simulation or modeling. For IM, one of the most frequently used models is Independent Cascade Model (IC Model).

    IC Model is a probabilistic model, its basic architecture is as follows:

    • When innitializing, non of nodes in the graph are activated, we choose a seed set and activate all nodes within the set.
    • In the 1st round of spreading, seed nodes try to activate all their neighbors on the outbound edges with a certain probability for only one time, successfully or not.
    • In following rounds of spreading, only nodes that were activated in previous round has the capability to activate, similarly, they try to activate neighbors on their outbound edges for only time for each round, successfully or not.
    • When the number of nodes capable of activating in the graph is 0, the spreading ends.

    When the spreading process finished, we are able to know the seed nodes' influence according to the amount of activated nodes in the graph. Ultipa's CELF algorithm adopts IC model to simulate the spreading process: given a probability p of activating (or spreading), for each time the nodes that are capable of activating try to active other nodes, the system selects a random number random within 0~1, the activation succeeds if random < p.

    Submodular

    Submodularity is an attribute of aggregation function. It describes a phenomenon of diminishing Marginal Gain. For aggregation function f(), there's set A ⊆ B ⊆ V and element e ∈ V-B, it suggests to be submodular if conditions below are satisfied:

    It means that under the effect of f(), with a decreasing number of elements in the set, element e's marginal gain is diminishing.

    Function IC() that evaluate the seeds' influence is submodular, CELF improves the traditional greedy algorithm by taking advantage of the sobmodularity of IC(), saving tons of repetitive calculations with an efficiency improvement by 700 times compared to greedy algorithm, but they can give comparable results.

    Submodular Function Maximization

    Influence Maximization problem can be regarded as a matter of sumodular function maximization which has following forms:

    Function f() is submodular, A is the input set, c(A) is a limit to A and it needs to be smaller than condition B. The limiting condition for this algorithm is a simple cardinal number, i.e, c(A) = |A| < k, and it requires to select subsets that include no more than a number k of elements to maximize the function result.

    Sumodular function maximization is an NP-hard problem, and it only gives a near-optimal result.

    Special Cases

    Lonely node, Disconnected Graph

    Lonely nodes are not connected to any other node, with 0 influence.

    For unconnected graphs, the influence of nodes spread within each connected part.

    Self-loop Edge

    A node is unable to activate other nodes via self-loop edges, so these edges do not affect the influence of nodes.

    Directed Edge

    A node's influence is relevant to the number of their outbound edges.

    Command and Configuration

    • Command: algo(celf)
    • Configurations for the parameter params():
    Name
    Type
    Default
    Specification
    Description
    seedSetSize int 1 >=1 The number of seed nodes
    monteCarloSimulations int 1000 >=1 The number of Monte-Carlo simulations
    propagationProbability float 0.1 (0,1) The probability of a node being activated

    Example: Run CELF in the graph to find 3 seeds, the possibility of nodes being activated by their activated neighbors is 0.5, execute 1000 Monte-Carlo simulations

    algo(celf).params({
      seedSetSize: 3,
      monteCarloSimulations: 1000,
      propagationProbability: 0.5 }) as seeds
    return seeds
    

    Algorithm Execution

    Task Writeback

    1. File Writeback

    Configuration
    Data in Each Row
    filename _id,spread

    Example: Run CELF in the graph to find 3 seeds, the possibility of nodes being activated by their activated neighbors is 0.5, execute 1000 Monte-Carlo simulations, write the algorithm results back to file named seeds

    algo(celf).params({
      seedSetSize: 3,
      monteCarloSimulations: 1000,
      propagationProbability: 0.5 
    }).write({
      file:{
        filename: "seeds"}})
    

    2. Property Writeback

    Not supported by this algorithm.

    3. Statistics Writeback

    This algorithm has no statistics.

    Direct Return

    Alias Ordinal Type
    Description
    Column Name
    0 []perNode Node and its marginal return _uuid, spread

    Example: Run CELF in the graph to find 3 seeds, the possibility of nodes being activated by their activated neighbors is 0.5, execute 1000 Monte-Carlo simulations, define algorithm results as alias named seeds and return the results

    algo(celf).params({
      seedSetSize: 3,
      monteCarloSimulations: 1000,
      propagationProbability: 0.5 }) as seeds
    return seeds
    

    Streaming Return

    Alias Ordinal Type
    Description
    Column Name
    0 []perNode Node and its marginal return _uuid, spread

    Example: Run CELF in the graph to find 3 seeds, the possibility of nodes being activated by their activated neighbors is 0.5, execute 1000 Monte-Carlo simulations, return seeds with their ID and createdTime properties as well as the score

    algo(celf).params({
      seedSetSize: 3,
      monteCarloSimulations: 1000,
      propagationProbability: 0.5 }).stream() as seeds
    find().nodes({_uuid == seeds._uuid}) as nodes
    return table(nodes._id, nodes.createdTime, seeds.spread)
    

    Real-time Statistics

    This algorithm has no statistics.

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写