GraphSAGE (SAmple and aggreGatE) is a versatile inductive framework. Instead of training distinct embeddings for each node, it learns functions that generate embeddings by sampling and aggregating features from a node’s local neighborhood. This enables efficient generation of node embeddings for new data. GraphSAGE was proposed by W.H Hamilton et al. of Stanford University in 2017:
The GraphSAGE algorithm is to produce node embeddings using a trained GraphSAGE model. The training process is outlined in GraphSAGE Train.
Most conventional graph embedding methods learn node embeddings by utilizing information from all nodes throughout the iterations. When new nodes are introduced to the network, the model must be retrained using the entire dataset. These transductive frameworks don't naturally extend to generalize.
GraphSAGE, on the other hand, acts as an inductive framework. It trains a collection of aggregator functions rather than creating individual embeddings for each node. This allows embeddings for newly added nodes to be derived based on the features and structural details of existing nodes, eliminating the need to reiterate the entire training procedure. This inductive capacity is crucial for high-throughput, operational machine learning systems.
Assume that we have already trained the parameters of K aggregator functions (denoted as AGGREGATEk) and K weight matrices (denoted as Wk). Let's now delve into the process of generating GraphSAGE embeddings (i.e., the forward propagation).
In graph G = (V, E), for each target node to generate the embedding, sample some nodes from its 1st layer of neighborhood to the K-th layer of neighborhood:
NOTEThe creators of GraphSAGE observed that the value of K need not be large; practical success can be achieved even with modest values, such as K = 2, given that S1·S2 is below 500.

For the target node a in the above graph, considering the settings K = 2, S1 = 3, and S2 = 5. B2 is initialized as {a}.
For each node v ∈ B0, initialize their embedding vectors as their feature vectors:

where each feature vector Xv is composed of several specified numeric property values of the node.
The final embeddings of the target nodes are computed through K iterations. In the k-th (k = 1,2,...,K) iteration, for each node v ∈ Bk:




The process of feature aggregation of our example can be illustrated as below:

| 1st Iteration | 2nd Iteration |
|---|---|
![]() | ![]() |
algo(graph_sage)Name | Type | Spec | Default | Optional | Description |
|---|---|---|---|---|---|
| model_task_id | int | / | / | No | Task ID of the GraphSAGE Train algorithm that trained the model |
| ids | []_id | / | / | Yes | ID of the nodes to generate embeddings; generate for all nodes if not set |
| node_property_names | []<property> | Numeric type, must LTE | Read from the model | Yes | Node properties to form the feature vectors |
| edge_property_name | <property> | Numeric type, must LTE | Read from the model | Yes | Edge property to use as edge weight; edges are unweighted if not set |
| sample_size | []int | / | Read from the model | Yes | Elements in the list are the number of nodes sampled at layer K to layer 1 respectively; the size of the list is the number of layers |
| Spec | Content | Write to | Data Type |
|---|---|---|---|
| property_name | Node embedding | Node property | string |
UQLalgo(graph_sage).params({ model_task_id: 4785, ids: ['ULTIPA8000000000000001', 'ULTIPA8000000000000002'] }).write({ db:{ property_name: 'embedding_graphSage' } })
Results: Embedding for each node is written to a new property named embedding_graphSage