Overview
A distributed projection resides in the memory of the corresponding shard servers where the data is persistently stored. It can hold either full or partial data from a graphset and supports running graph algorithms, though it doesn't execute graph queries.
All distributed projections of a graphset are lost when the data in the graphset is migrated to different shards.
Managing Distributed Projections
Showing Distributed Projections
Retrieves information about all distributed projections of the current graphset:
show().project()
It returns a table _projectList
with the following fields:
Field |
Description |
---|---|
project_name |
Name of the projection. |
project_type |
Type of the projection, which is pregel for all distributed projections. |
graph_name |
Name of the current graphset from which the data was loaded. |
status |
Current state of the projection, which can be DONE or CREATING , FAILED or UNKNOWN . |
stats |
Node and edge statistics per shard, including address of the leader replica of the current graphset, edge_in_count , edge_out_count and node_count . |
config |
Configurations for the distributed projection. |
Creating a Distributed Projection
The create.project()
statement creates an in-memory projection of the current graphset to shard servers. The project creation is executed as a job, you may run show().job(<id?>)
afterward to verify the success of the creation.
create().project(
"<projectName>",
{
nodes: {
"<schema1?>": ["<property1?>", "<property2?>", ...],
"<schema2?>": ["<property1?>", "<property2?>", ...],
...
},
edges: {
"<schema1?>": ["<property1?>", "<property2?>", ...],
"<schema2?>": ["<property1?>", "<property2?>", ...],
...
},
direction: "<edgeDirection?>",
load_id: <boolean?>
}
)
Method | Param | Description | Optional | |
---|---|---|---|---|
project() |
<projectName> |
Name of the projection. Each distributed projection name within a database must be unique and cannot duplicate the name of any HDC projection of the same graphset. | No | |
Config map | nodes |
Specifies nodes to project based on schemas and properties. The _uuid is loaded by default, while _id is configurable with load_id . Sets to "*": ["*"] to load all nodes. |
Yes | |
edges |
Specifies edges to project based on schemas and properties. All system properties are loaded by default. Sets to "*": ["*"] to load all edges. |
Yes | ||
direction |
Since each edge is physically stored twice - as an incoming edge along its destination node and an outgoing edge with its source node - you can choose to project only incoming edges with in , only outgoing edges with out , or both with undirected (the default setting). Please note that in or out restricts graph traversal during computation to the specified direction. |
No | ||
load_id |
Sets to false to project nodes without _id values to save the memory space; it defaults to true . |
Yes |
To project the entire current graphset to its shard servers as distGraph
:
create().project("distGraph", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true
})
To project @account
and @movie
nodes with selected properties and incoming @rate
edges in the current graphset to its shard servers as distGraph_1
, while omitting nodes' _id
values:
create().project("distGraph_1", {
nodes: {
"account": ["name", "gender"],
"movie": ["name", "year"]
},
edges: {"rate": ["*"]},
direction: "in",
load_id: false
})
Dropping a Distributed Projection
You can drop any distributed projection of the current graphset from the shard servers using the drop().project()
statement.
The following example deletes the distributed projection named distGraph_1
:
drop().project("distGraph_1")
Example Graph and Projection
To create the graph, execute each of the following UQL queries sequentially in an empty graphset:
create().node_schema("entity").edge_schema("link")
create().edge_property(@link, "weight", float)
insert().into(@entity).nodes([{_id:"A"},{_id:"B"},{_id:"C"},{_id:"D"}])
insert().into(@link).edges([{_from:"A", _to:"B", weight:1},{_from:"A", _to:"C", weight:1.5},{_from:"A", _to:"D", weight:0.5},{_from:"B", _to:"C", weight:2},{_from:"C", _to:"D", weight:0.5}])
To create a distributed projection distGraph
of the entire graph:
create().project("distGraph", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: true
})
Executing Algorithms
Distributed projections run distributed algorithms. Distributed algorithms run in File and DB writeback modes with the syntax algo().params().write()
. In the params()
method, you must include the parameter project
to specify the name of the projection.
File Writeback
Runs the Degree Centrality algorithm on distGraph
to compute the out-degree of all nodes and write the results back to a file degree.txt
:
algo(degree).params({
project: "distGraph",
return_id_uuid: "id",
direction: "out"
}).write({
file: {
filename: "degree.txt"
}
})
Result:
_id,degree_centrality
C,1
A,3
B,1
D,0
DB Writeback
Runs the Degree Centrality algorithm on distGraph
to compute the out-degree of all nodes and write the results back to the node property degree
:
algo(degree).params({
project: "distGraph",
return_id_uuid: "id",
direction: "out"
}).write({
db: {
property: "degree"
}
})
Graph Traversal Direction
If a distributed projection is created with the direction
option set to in
or out
, graph traversal is restricted to incoming or outgoing edges, respectively. Algorithms attempting to traverse in the missing direction throws errors or yields empty results.
To create a distributed projection distGraph_in_edges
of the graph with nodes and incoming edges:
create().project("distGraph_in_edges", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "in",
load_id: true
})
The Degree Centrality algorithm computes the out-degree of all nodes on distGraph_in_edges
, they are all 0:
algo(degree).params({
project: "distGraph_in_edges",
return_id_uuid: "id",
direction: "out"
}).write({
file: {
filename: "degree.txt"
}
})
_id,degree_centrality
C,0
A,0
D,0
B,0
Exclusion of Node IDs
If a distributed projection is created with the load_id
option set to false
, it does not contain the _id
values for nodes. Algorithms referencing _id
throws errors or yields empty results. In algorithm writeback files, _id
values are replaced with _uuid
values instead.
To create a distributed projection distGraph_no_id
of the graph without nodes' _id
values:
create().project("distGraph_no_id", {
nodes: {"*": ["*"]},
edges: {"*": ["*"]},
direction: "undirected",
load_id: false
})
The Degree Centrality algorithm computes the degree of all nodes on distGraph_no_id
and writes the results back to a file degree.txt
, nodes' _id
are replaced with _uuid
:
algo(degree).params({
project: "distGraph_no_id",
return_id_uuid: "id"
}).write({
file: {
filename: "degree.txt"
}
})
_uuid,degree_centrality
12033620403357220866,1
10016007770295238657,3
288232575174967298,0
3530824306881724417,1
Exclusion of Properties
If a distributed projection is created without certain properties, algorithms referencing those properties throws errors or yields empty results.
To create a distributed projection distGraph_no_weight
of the graph containing nodes and only system properties of edges:
create().project("distGraph_no_weight", {
nodes: {"*": ["*"]},
edges: {"link": []},
direction: "undirected",
load_id: true
})
The Degree Centrality algorithm computes the degree of all nodes weighted by the edge property @link.weight
on distGraph_no_weight
, error occurs as the weight
property is missing:
algo(degree).params({
project: "distGraph_no_weight",
edge_property: "@link.weight"
}).write({
file: {
filename: "degree.txt"
}
})