Overview
The CELF (Cost Effective Lazy Forward) algorithm is used to select some seed nodes in a network as propagation source to reach as many nodes as possible. This is known as Influence Maximization (IM), where 'influence' represents anything that can be spread across the network, such as contamination, information, disease, etc.
CELF was proposed by Jure Leskovec et al. in 2007, it improves the traditional Greedy algorithm based on the IC model by taking advantage of the submodularity. It only calculates the spread score for all nodes only at the initial stage and does not recalculate for all nodes afterwards, hence cost-effective.
Related materials of the algorithm:
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, N. Glance, Cost-effective Outbreak Detection in Networks (2007)
- D. Kempe, J. Kleinberg, E. Tardos, Maximizing the Spread of Influence through a Social Network (2003)
A typical application of the algorithm is to prevent epidemic outbreak by selecting a small group of people to monitor, so that any disease can be detected in an early stage.
Concepts
Spread Function - Independent Cascade
This algorithm adopts Independent Cascade (IC) model to simulate the influence spread process in the network. IC is a probabilistic model, it starts with a set of active seed nodes, and in step k
:
- For each node that becomes active in step
k-1
, it has a single chance to activate each inactive outgoing neighbor with a success probability. - The process runs until no more activations are possible.
The spread of the given seed set is measured by the number of active nodes in the graph when it ends. This process is repeated for a large number of time (Monte Carlo Simulations) and we calculate it by taking the average.
Submodularity
The spread function IC()
is called submodular as the marginal gain of a single node v
is diminishing as the seed set S
grows:
where the seed set |Sk+1| > |Sk|, S ∪ {v}
means to add node v
into the seed set.
Submodularity of the spread function is the key property exploited by CELF. CELF significantly improves the traditional Greedy algorithm that is used to solve the influence maximization problem, it runs a lot faster while achieving near optimal results.
Lazy Forward
When CELF begins, like Greedy, it calculates the spread for each node, puts them in a list sorted by the descending spread. As the seed set is empty now, the spread for each node can be viewed as its initial marginal gain.
In the first iteration, the top node is moved from the list to the seed set.
In the next iteration, only calculate the marginal gain for the current top node. After sorting, if that node remains at top, move it to the seed set; if not, repeat the process for the new top node.
Unlike Greedy, CELF avoids calculating marginal gain for all the rest nodes in each iteration, this is where the submodularity of the spread function is considered - the marginal gain of every node in this round is always lower than the previous round. So if the top node remains at top, we can put it into the seed set directly without calculating for other nodes.
The algorithm stops when the seed set reaches the set size.
Parameters
Name |
Type |
Spec |
Default |
Optional |
Description |
---|---|---|---|---|---|
project |
String | / | / | / | The projection on which the algorithm will run. This is required for writeback modes but not applicable to return modes. |
ids |
[]_id |
/ | / | Yes | Specifies nodes by their _id values for computation; computes for all nodes if it is unset. |
uuids |
[]_uuid |
/ | / | Yes | Specifies nodes by their _uuid values for computation; computes for all nodes if it is unset. |
return_id_uuid |
String | uuid , id , both |
uuid |
Yes | Includes _uuid , _id , or both values for nodes in the results. |
seedSetSize |
Integer | >0 | 1 |
Yes | The size of the seed set. |
monteCarloSimulations |
Integer | >0 | 1000 |
Yes | The number of Monte Carlo simulations |
propagationProbability |
Float | (0,1) | 0.1 |
Yes | The probability that each outgoing neighbor is successfully activated by a node with activation capability in certain round. |
Example Graph
To create this graph:
// Runs each row separately in order in an empty graphset
create().node_schema("account").edge_schema("follow")
create().node_property(@account, "createdOn", datetime)
insert().into(@account).nodes([{_id:"A", createdOn: "2016-12-3"}, {_id:"B", createdOn:"2022-1-30" }, {_id:"C", createdOn: "2019-11-8"}, {_id:"D", createdOn: "2019-1-16"}, {_id:"E", createdOn: "2016-3-4"}, {_id:"F", createdOn: "2019-11-10"}, {_id:"G", createdOn: "2019-7-26"}, {_id:"H", createdOn: "2016-7-11"}, {_id:"I", createdOn: "2018-12-13"},{_id:"J", createdOn: "2018-3-21"}])
insert().into(@follow).edges([{_from:"A", _to:"B"}, {_from:"A", _to:"G"}, {_from:"B", _to:"F"}, {_from:"C", _to:"J"}, {_from:"D", _to:"J"}, {_from:"E", _to:"A"}, {_from:"F", _to:"C"}, {_from:"F", _to:"G"}, {_from:"G", _to:"H"}, {_from:"H", _to:"C"}, {_from:"H", _to:"E"}, {_from:"H", _to:"J"}, {_from:"I", _to:"B"}, {_from:"C", _to:"B"}])
Running on HDC Projections
Creating HDC Projections
To project the entire graph to the HDC server hdc-server-1
as hdc_celf
:
hdc.graph.create("hdc_celf", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true,
update: "static",
query: "query",
default: true
}).to("hdc-server-1")
File Writeback
algo(celf).params({
project: "hdc_celf",
return_id_uuid: "id",
seedSetSize: 3,
monteCarloSimulations: 1000,
propagationProbability: 0.5
}).write({
file: {
filename: "seeds"
}
})
Results:
_id,spread
H,3.609
I,1.647
A,1.361
Full Return
exec{
algo(celf).params({
return_id_uuid: "id",
seedSetSize: 2,
monteCarloSimulations: 1000,
propagationProbability: 0.6
}) as seeds
return seeds
} on hdc_celf
Results:
_id | spread |
---|---|
H | 4.517 |
I | 1.689 |
Stream Return
exec{
algo(celf).params({
return_id_uuid: "id",
seedSetSize: 3,
monteCarloSimulations: 1000,
propagationProbability: 0.6
}).stream() as seeds
find().nodes({_id == seeds._id}) as nodes
return table(nodes._id, nodes.createdOn, seeds.spread)
} on hdc_celf
Results:
nodes._id | nodes.createdOn | seeds.spread |
---|---|---|
H | 2016-07-11 00:00:00 | 4.517 |
I | 2018-12-13 00:00:00 | 1.689 |
A | 2016-12-03 00:00:00 | 1.102 |