Node2Vec - Graph Analytics & Algorithms

Change Password

Submit

Change Email

Submit

Change Nickname

Current Nickname:

Submit

Profile

Account ID:

Full Name:
Phone:
Company:
Company Email:

Change Password

Apply

You have no license application record.

Apply

Certificate	Issued at	Valid until	Serial No.	File

Serial No.	Valid until	File

Not having one? Apply now! >>>

Product	Created On	ID	Amount (USD)	Invoice

Product	Created On	ID	Amount (USD)	Invoice

No Invoice

Create Ultipa Account

I agree to the Privacy Policy and the

Data Processing Agreement .

Please agree to continue.

Already have an Ultipa account? Sign in now!

Forgot Password

Reset Password

Back to sign in

Node2Vec

HDC

Overview

Node2Vec is a semi-supervised algorithm designed for feature learning of nodes in graphs while efficiently preserving their neighborhoods. It introduces a versatile search strategy that can explore both the BFS and DFS neighborhoods of nodes. It also extends the Skip-gram model to graphs for training node embeddings. Node2Vec was developed by A. Grover and J. Leskovec at Stanford University in 2016.

A. Grover, J. Leskovec, node2vec: Scalable Feature Learning for Networks (2016)

Concepts

Node Similarity

Node2Vec learns a mapping of nodes into a low-dimensional vector space, intending to ensure that similar nodes in the network exhibit close embeddings in the vector space.

Nodes in network often shuttle between two kinds of similarities:

1. Homophily

Homophily in networks refers to the phenomenon that nodes with similar properties, characteristics, or behaviors are more likely to be connected together or belong to the same or similar communities (nodes u and s₁ in the graph above belong to the same community).

For example, in social networks, individuals with similar backgrounds, interests, or opinions are more likely to form connections.

2. Structural Equivalence

Structural equivalence in networks refers to the concept where nodes are considered equivalent based on their structural roles within the network. Nodes that are structurally equivalent have similar connectivity patterns and relationships to other nodes (i.e., the local topology), even if their individual characteristics are different (nodes u and v in the graph above act as hubs of their corresponding communities).

For example, in social networks, individuals that are structurally equivalent might occupy similar positions in their social groups.

Unlike homophily, structural equivalence does not emphasize connectivity; nodes could be far apart in the network and still have the same structural role.

When discussing structural equivalence, it's important to keep in mind two key points: Firstly, achieving complete structural equivalence in a real network is uncommon, leading us to focus on assessing structural similarity instead. Secondly, as the scope of the neighborhood being analyzed expands, the level of structural similarity between the two nodes tends to decrease.

Search Strategies

Generally, there are two extreme search strategies for generating a neighborhood set N_S of k nodes:

Breadth-first Search (BFS): N_S is restricted to nodes which are immediate neighbors of the start node. E.g., N_S(u) = s₁, s₂, s₃ of size k = 3 in the graph above.
Depth-first Search (DFS): N_S consists of nodes sequentially searched at increasing distances from the start node. E.g., N_S(u) = s₄, s₅, v of size k = 3 in the graph above.

The BFS and DFS strategies play a key role in producing embeddings that reflect homophily or structural equivalence between nodes:

The neighborhoods sampled by BFS lead to embeddings that correspond closely to structural equivalence. By restricting search to nearby nodes, BFS obtains a microscopic view of the neighborhood which is often sufficient to characterize the local topology.
The neighborhoods sampled by DFS lead to embeddings that correspond closely to homophily. By moving further away from the start node, DFS obtains a macro-view of the neighborhood which is essential in inferring node-to-node dependencies exist in a community.

Node2Vec Framework

1. Node2Vec Walk

Node2Vec employs a biased random walk with the return parameter p and in-out parameter q to guide the walk.

Consider the random walk that just traversed edge (t,v) and now arrives at node v, the next step of the walk is determined by the transition probabilities on edges (v,x) originating from v, which are proportional to the edge weights (weights are 1 in unweighted graphs). The weights of edges (v,x) are adjusted by p and q based on the shortest distance d_tx between nodes t and x:

If d_tx = 0, the edge weight is scaled by 1/p. In the provided graph, d_tt = 0. Parameter p influences the inclination to revisit the node just left. When p < 1, backtracking a step becomes more probable; when p > 1, otherwise.
If d_tx = 1, the edge weight remains unaltered. In the provided graph, d_tx₁ = 1.
If d_tx = 2, the edge weight is scaled by 1/q. In the provided graph, d_tx₂ = 2. Parameter q determines whether the walk moves inward (q > 1) or outward (q < 1).

Note that d_tx must be one of {0, 1, 2}.

Through the two parameters, Node2Vec provides a way of controlling the trade-off between exploration and exploitation during random walk generation, which leads to representations obeying a spectrum of equivalences from homophily to structural equivalence.

2. Node Embeddings

The node sequences obtained from the random walks serve as input to the Skip-gram model. SGD is used to optimize the model's parameters based on the prediction error, and the model is optimized by techniques such as negative sampling and subsampling.

Considerations

The Node2Vec algorithm ignores the direction of edges but calculates them as undirected edges.

Example Graph

To create this graph:

// Runs each row separately in order in an empty graphset
create().edge_property(@default, "score", float)
insert().into(@default).nodes([{_id:"A"},{_id:"B"},{_id:"C"},{_id:"D"},{_id:"E"},{_id:"F"},{_id:"G"},{_id:"H"},{_id:"I"},{_id:"J"},{_id:"K"}])
insert().into(@default).edges([{_from:"A", _to:"B", score:1}, {_from:"A", _to:"C", score:3}, {_from:"C", _to:"D", score:1.5}, {_from:"D", _to:"C", score:2.4}, {_from:"D", _to:"F", score:5}, {_from:"E", _to:"C", score:2.2}, {_from:"E", _to:"F", score:0.6}, {_from:"F", _to:"G", score:1.5}, {_from:"G", _to:"J", score:2}, {_from:"H", _to:"G", score:2.5}, {_from:"H", _to:"I", score:1}, {_from:"I", _to:"I", score:3.1}, {_from:"J", _to:"G", score:2.6}])

Creating HDC Graph

To load the entire graph to the HDC server hdc-server-1 as hdc_node2vec:

CALL hdc.graph.create("hdc-server-1", "hdc_node2vec", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static",
  query: "query",
  default: false
})

hdc.graph.create("hdc_node2vec", {
  nodes: {"*": ["*"]},
  edges: {"*": ["*"]},
  direction: "undirected",
  load_id: true,
  update: "static",
  query: "query",
  default: false
}).to("hdc-server-1")

Parameters

Algorithm name: node2vec

Name	Type	Spec	Default	Optional	Description
`ids`	[]`_id`	/	/	Yes	Specifies nodes to start random walk by their `_id`; computes for all nodes if it is unset.
`uuids`	[]`_uuid`	/	/	Yes	Specifies nodes to start random walk by their `_uuid`; computes for all nodes if it is unset.
`walk_length`	Integer	≥1	`1`	Yes	Depth of each walk, i.e., the number of nodes to visit.
`walk_num`	Integer	≥1	`1`	Yes	Number of walks to perform for each specified node.
`p`	Float	>0	`1`	Yes	The return parameter; a larger value reduces the probability of returning.
`q`	Float	>0	`1`	Yes	The in-out parameter; it tends to walk at the same level when the value is greater than 1, otherwise it tends to walk far away.
`edge_schema_property`	[]"`<@schema.?><property>`"	/	/	Yes	Numeric edge properties used as edge weights, summing values across the specified properties; edges without the specified properties are ignored.
`window_size`	Integer	≥1	/	No	The maximum size of context.
`dimension`	Integer	≥2	/	No	Dimensionality of the embeddings.
`loop_num`	Integer	≥1	/	No	Number of training iterations.
`learning_rate`	Float	(0,1)	/	No	Learning rate used initially for training the model, which decreases after each training iteration until reaches `min_learning_rate`.
`min_learning_rate`	Float	(0,`learning_rate`)	/	No	Minimum threshold for the learning rate as it is gradually reduced during the training.
`neg_num`	Integer	≥1	`5`	Yes	Number of negative samples to produce for each positive sample, it is suggested to set between 1 to 10.
`resolution`	Integer	≥1	`1`	Yes	The parameter used to enhance negative sampling efficiency; a higher value offers a better approximation to the original noise distribution; it is suggested to set as 10, 100, etc.
`sub_sample_alpha`	Float	/	`0.001`	Yes	The factor affecting the probability of down-sampling frequent nodes; a higher value increases this probability; a value ≤0 means not to apply subsampling
`min_frequency`	Integer	/	/	No	Nodes that appear less times than this threshold in the training "corpus" will be excluded from the "vocabulary" and disregarded in the embedding training; a value ≤0 means to keep all nodes.
`return_id_uuid`	String	`uuid`, `id`, `both`	`uuid`	Yes	Includes `_uuid`, `_id`, or both values to represent nodes in the results.
`limit`	Integer	≥-1	`-1`	Yes	Limits the number of results returned; `-1` includes all results.

File Writeback

CALL algo.node2vec.write("hdc_node2vec", {
  params: {
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    p: 0.5,
    q: 1000,
    window_size: 5,
    dimension: 5,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  },
  return_params: {
    file: {
      filename: "embeddings"
    }
  }
})

algo(node2vec).params({
  projection: "hdc_node2vec",
  return_id_uuid: "id",
  walk_length: 10,
  walk_num: 20,
  p: 0.5,
  q: 1000,
  window_size: 5,
  dimension: 5,
  loop_number: 10,
  learning_rate: 0.01,
  min_learning_rate: 0.0001,
  neg_number: 9,
  resolution: 100,
  sub_sample_alpha: 0.001,
  min_frequency: 3
}).write({
  file:{
    filename: 'embeddings'
  }
})

_id,embedding_result
J,0.0800537,0.0883881,-0.0766052,-0.0655609,0.0273315,
D,0.0604218,0.0188171,0.00422668,-0.0720703,0.0443695,
F,-0.0871277,0.0249908,-0.0150269,-0.0191437,-0.0663147,
H,-0.0376434,0.0515869,0.0605072,0.0593811,0.0319489,
B,0.030896,-0.0760529,-0.0819153,0.0993927,0.0760254,
A,0.0618011,-0.0120789,0.0803131,-0.0098999,0.0146942,
E,-0.00298462,-0.0596649,0.0262451,-0.0267487,-0.0765076,
K,0.0950836,0.0875854,-0.0219025,-0.0045227,0.0101837,
C,-0.0727539,-0.0801422,0.091095,0.00126038,-0.0516479,
I,-0.0608429,-0.0615295,0.0339386,0.00402832,0.0266205,
G,-0.0842712,-0.0761566,-0.0026001,0.0228729,0.0509949,

DB Writeback

Writes the embedding_result values from the results to the specified node property. The property type is float[].

CALL algo.node2vec.write("hdc_node2vec", {
  params: {
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    p: 0.5,
    q: 1000,
    window_size: 5,
    dimension: 5,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  },
  return_params: {
    db: {
      property: "vector"
    }
  }
})

algo(node2vec).params({
  projection: "hdc_node2vec",
  return_id_uuid: "id",
  walk_length: 10,
  walk_num: 20,
  p: 0.5,
  q: 1000,
  window_size: 5,
  dimension: 5,
  loop_number: 10,
  learning_rate: 0.01,
  min_learning_rate: 0.0001,
  neg_number: 9,
  resolution: 100,
  sub_sample_alpha: 0.001,
  min_frequency: 3
}).write({
  db: {
      property: "vector"
    }
})

Full Return

CALL algo.node2vec("hdc_node2vec", {
  params: {
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    p: 0.5,
    q: 1000,
    window_size: 5,
    dimension: 4,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  },
  return_params: {}
}) YIELD embeddings
RETURN embeddings

exec{
  algo(node2vec).params({
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    p: 0.5,
    q: 1000,
    window_size: 5,
    dimension: 4,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  }) as embeddings
  return embeddings
} on hdc_node2vec

Result:

_id	embedding_result
J	[0.100067138671875,0.11048507690429688,-0.09575653076171875,-0.08195114135742188]
D	[0.0341644287109375,0.07552719116210938,0.02352142333984375,0.005283355712890625]
F	[-0.090087890625,0.055461883544921875,-0.10890960693359375,0.031238555908203125]
H	[-0.0187835693359375,-0.023929595947265625,-0.08289337158203125,-0.047054290771484375]
B	[0.064483642578125,0.07563400268554688,0.07422637939453125,0.039936065673828125]
A	[0.0386199951171875,-0.09506607055664063,-0.10239410400390625,0.12424087524414063]
E	[0.09503173828125,0.07725143432617188,-0.01509857177734375,0.10039138793945313]
K	[-0.0123748779296875,0.018367767333984375,-0.00373077392578125,-0.07458114624023438]
C	[0.032806396484375,-0.033435821533203125,-0.09563446044921875,0.11885452270507813]
I	[0.1094818115234375,-0.027378082275390625,-0.00565338134765625,0.012729644775390625]
G	[-0.0909423828125,-0.10017776489257813,0.11386871337890625,0.001575469970703125]

Stream Return

CALL algo.node2vec("hdc_node2vec", {
  params: {
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    p: 0.5,
    q: 1000,
    window_size: 5,
    dimension: 5,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  },
  return_params: {
    stream: {}
  }
}) YIELD embeddings
RETURN embeddings

exec{
  algo(node2vec).params({
    return_id_uuid: "id",
    walk_length: 10,
    walk_num: 20,
    p: 0.5,
    q: 1000,
    window_size: 5,
    dimension: 5,
    loop_number: 10,
    learning_rate: 0.01,
    min_learning_rate: 0.0001,
    neg_number: 9,
    resolution: 100,
    sub_sample_alpha: 0.001,
    min_frequency: 3
  }) as embeddings
  return embeddings
} on hdc_node2vec

Result:

_id	embedding_result
J	[0.08005370944738388,0.08838806301355362,-0.07660522311925888,-0.06556091457605362,0.02733154222369194]
D	[0.06042175367474556,0.01881713792681694,0.0042266845703125,-0.07207031548023224,0.04436950758099556]
F	[-0.087127685546875,0.02499084547162056,-0.015026855282485485,-0.01914367638528347,-0.066314697265625]
H	[-0.0376434326171875,0.05158691480755806,0.06050720065832138,0.05938110500574112,0.03194885328412056]
B	[0.03089599683880806,-0.07605285942554474,-0.08191528171300888,0.09939269721508026,0.07602538913488388]
A	[0.06180114671587944,-0.01207885704934597,0.08031310886144638,-0.009899902157485485,0.0146942138671875]
E	[-0.0029846192337572575,-0.05966491624712944,0.0262451171875,-0.0267486572265625,-0.076507568359375]
K	[0.09508361667394638,0.08758544921875,-0.02190246619284153,-0.0045227049849927425,0.01018371619284153]
C	[-0.07275390625,-0.08014221489429474,0.091094970703125,0.0012603759532794356,-0.05164794996380806]
I	[-0.06084289401769638,-0.06152953952550888,0.03393859788775444,0.0040283203125,0.02662048302590847]
G	[-0.08427123725414276,-0.0761566162109375,-0.0026000975631177425,0.0228729248046875,0.050994873046875]

ID
Product
Status
Cores
Maximum Shard Services
Maximum Total Cores for Shard Service
Maximum HDC Services
Maximum Total Cores for HDC Service
Applied Validity Period(days)
Effective Date
Expired Date
Mac Address
Reason for Application
Review Comment