Based on retail business' graph model built in the previous section, this section explains how to conduct graph query and computing using UQL.
Introduction
While table is helpful to people's understanding of data, visualization of query results can optimize it. It is, indeed, comparably more economical to display query results for metadata such as node and edge, in fact, they do not show any more information other than their property information; however when operating query results for high-dimensional and multimodel data such as paths, subgraphs, it becomes obvious that tabel does not work the best.
Node query
Let's look into a simple matadata query first and try to understand UQL statement below:
find().nodes() as myFirstQuery
return myFirstQuery{*} limit 10
Its literal meaning: find nodes as myFirstQuery, return myFirstQuery, limit 10 nodes.
The description above is close enough, some more explanations here:
as
defines an alias to the execution results of previous UQL statement and it can be used when the following UQL statements need to refer to the previous results.{*}
following the node and edge's alias means all properties of the node and edge the alias represents
UQL can be interpreted as: find 10 nodes, return all their properties. Mananger's execution result is as in:
This UQL statement found 10 个@customer
nodes. Users can adjust the filter to find @merchant
nodes as well:
find().nodes({@merchant}) as mySecondQuery
return mySecondQuery{*} limit 10
- In UQL syntax,
@
followed by a schema name represents nodes and edges under certain schema.
UQL statement above can be interpreted as: find 10 @merchant
nodes, return all their properties. Mananger's execution result is as in:
While the two statements above found @customer
and @merchant
nodes repectively, their query results are not related at all, for in retail business, @customer
nodes transfer to @merchant
nodes via @transfer
edges:
If we want to build connections from @customer
to@merchant
, we will need to involve @transfer
edge as well.
Edge query
Try to interpret UQL below:
find().edges({_from == "60017791850"}) as payment
return payment{*} limit 10
- This is an edge query, it defines that the edge start
_from
a node whose ID is 60017791850.
The UQL can be interpreted as in: find 10 payment edges innitiated from Chen** (ID: 60017791850),return all property information, we can get results below after executing it in Manager:
In results above, edge's system property _to
are IDs of the merchants who receive Chen**'s payment. We can further find detailed information from these merchants by modifying this UQL statement:
find().edges({_from == "60017791850"}) as payment
find().nodes({_id == payment._to}) as merchant
return merchant{*} limit 10
- In UQL syntax, an alias followed by a full stop
.
as well as property name represents a certain property of nodes or edges the alias stand for.
The UQL statement above can be interpreted as in: find 10 payment edges starting from Chen** (ID: 60017791850), then find the merchants who received those payments, return all property information of the merchants, we can get results below after executing it in Manager:
The image above shows detailed informatino of 10 merchants who received 10 payments. We can notice that the table shows two merchants twice: the merchant whose UUID (another system property, compared with node's ID) is 117 and whose ID is 119, the reasons is that each of then received two payments from Chen**.
We would need to use another UQL command for better observation of the duplication.
Spread
Try to interpret UQL below:
spread().src({_id == "60017791850"}).depth(1) as transaction
return transaction{*} limit 10
Its literal meaning: spread, with a node whose ID is 60017791850 as the spreading center, with a depth of 1, return all property information, limit 10.
spread()
command can spread from the spreading center nodesrc()
and explore edges, and explore more edges after reaching new nodes in an outbound direction;depth()
limit the greatest distance starting fromsrc()
(the number of edges);- The command returns one-step paths in "node-edge-node" format, when the path's alias is followed by
{*}
, it represents all property information of nodes and edges within the path.
The UQL statements above can be interpreted as in: find 10 transacion paths from Chen** to merchants, return all property information of the nodes and edges within those paths, we can get results below after executing it in Manager:
As paths are returned here, we can view the 2D visualization of these paths and observe the transaction relations between Chen** and the two merchants (ID 117 and ID 119).
spread()
command returns 10 different transaction paths, instead of 10 merchants who received payments. If we need to find 10 merchants who receive payments from Chen**, we would need to use another UQL command.
K-Hop query
Try to interpret UQL below:
khop().src({_id == "60017791850"}).depth(1) as merchant
return merchant{*} limit 10
Its literal meaning: K-hop, starting from a node whose ID is 60017791850, with a depth of 1, return all property information, limit 10.
khop()
command can start from nodesrc()
in an outbound direction (passing one edge) to explore nodes, and it explores more nodes after reaching new nodes;- The command returns nodes.
The UQL statements above can be interpreted as in: find 10 merchants who received payment from Chen**, return all property information. We can get results below after executing it in Manager:
khop()
command found 10 different merchants as payment receivers, compared with 8 payment receiver merchants from 10 payments found by spread()
command, we can notice two new merchants (ID 110 and ID 118)
"Template" thinking
The spread()
command aforementioned returns multiple 1-step paths from the spreading process. In fact, a path can have numerous steps, and its format can be like "node-edge-node-edge...-node", with the number of edges (unduplicated) as the number of steps
In UQL syntax, template query can accurately describe each node and edge, and returns the whole paths. In template commands, n()
stands for a node, e()
stands for an edge, e()[]
represents a sector's format as in "node-edge-node-edge...-node".
Chains
Try to interpret UQL below:
n({_id == "60017791850"}).e().n() as transaction
return transaction{*} limit 10
Its literal meaning: a path starts from a node with ID 60017791850, passes a node, return all information, limit 10.
The UQL statements above can be interpreted as in: find 10 one-step payment paths from Chen** to Merchants, return all properties of all nodes and edges in these paths, we can get results below after executing it in Manager:
We can find that the result generated from the template is identical with that of spread()
, completely realising the function of spread()
.
Now consider a path that starts from Chen**, passes three @transfer
edges, and reaches a certain @merchant
node:
Try to interpret UQL below:
n({_id == "60017791850"}).e({tran_amount > 70000})[3].n() as transChain
return transChain{*} limit 10
- The 3 edges from
e()[3]
satisfy the condition where their transaction amounts are greater than 70000.
The UQL statements above can be interpreted as in: find 10 three-step transaction paths starting from Chen**, with each of their transaction amounts greater than 70000, return all properties of all nodes and edges in these paths. We can get results below after executing it in Manager:
When looking closely at the 10 three-step paths starting from Chen** in image above, it can be noticed that they separate from the second step: @transfer
edge, and reach customers Zheng**, Qian**, Chu** respectively; then they further spread upon the third step:@transfer
edge and reach 8 different merchants in the end.
In retail scenario, chain query can help realise product and merchant recommendation, target customer locating, etc. For instance, Chen**, Zheng**, Qian**, and Chu** purchased products from a same merchants, if given more conditions, it is likely that we select some from the 8 merchants that Zheng**, Qian**, and Chu** visited to recommend to Chen**.
Manager's 2D view does not re-render nodes or edges that repetatively appear in paths. For instance, all 10 paths above start from Chen**, with “60” as their first edge, “111” as the second node, however in 2D view, only one Chen**, one “60”, and one “111” merchant are rendered. If need to learn more about details of each path, we can switch to table view as well:
Circles
Try to interpret UQL below:
n({@customer} as start).e({tran_date > "2020-1-1 0:0:0"})[4].n(start) as transRing
return transRing{*} limit 10
Its literal meaning: start from a @customer
node, pass 4 edges, with edges' tran_date later than 2020-1-1 0:0:0, then returns to starting node, return all property information, limit 10.
- When a certain
n()
's alias is refered to by followingn()
, it represents thatn()
is the same node, and the path comes into circulating.
The UQL statements above can be interpreted as in: find 10 four-step transaction paths starting from @customer
, with each transfer happening after 2020-1-1 0:0:0, and transfering back to starting node in the end, return all property information of all nodes and edges in these paths. We can get results below after executing it in Manager:
After switching to table view:
All circular paths discussed in this retail scenario contain even number of edges, with nodes @customer
and@merchant
alternating with one another. This feature provides evidences for some similarity analyses: for instance, in the four-step circles, both Chu** and Ou** purchased products from merchants 110 and 108, so there could be some similarities between them. And thier similarities can be better understood if given more filters.
Shortest paths
Try to interpret UQL below:
n({_id == "60017791850"}).e()[:5].n(115) as transRange
return transRange{*} limit 1
:5
means the edge number ofe()[]
are not fixed, but not greater than 5;n()
stands for node whose UUID is the number filled in
The UQL statements above can be interpreted as in: find 1 transaction path starting from Chen** with less than 5 steps, reaching a node whose UUID is 115, return all property information of all nodes and edges in the path. We can get results below after executing it in Manager:
If we need to find the shortest path between two node, we can modify the UQL above into:
n({_id == "60017791850"}).e()[*:5].n(115) as transShortest
return transShortest{*} limit 1
*:5
represents the smallest number possible from the edge number ine()[]
, and it is not greater than 5.
The UQL statements above can be interpreted as in: find 1 shortest transaction path with less than 5 steps, starting from Chen**, reaching a node whose UUID is 115, return all property information of all nodes and edges in these paths. We can get results below after executing it in Manager:
Shortest path suggests the most direct connection betwwen two nodes. In general, the shorter the path is, the greater their connectiveness and value is. However it is due to this that some of real-world behaviors can purposefully hide themselves in exceedingly long (20~30 steps) data chains, attempting to avoid getting digged out by data analysis. These scenarios call for a high-performance DBMS with HTAP-capability, ultra-deep graph traversal capability, and outstanding responsiveness.
Common calculations
UQL can conduct a variety of calculations after finding nodes, edges, or paths and are made via functions and clauses. This section introduces some frequently used ones, please refer to Ultipa GQL-UQL for more.
Deduplicate
The first instance from Template Query section finds 10 one-step transaction paths from Chen** to merchants. Let's find the 8 non-deduplicated merchants from the 10 in total by modifying UQL into below:
n({_id == "60017791850"}).e().n(as payee) limit 10
with distinct(payee) as payeeDedup
return payeeDedup{*}
distinct()
function deduplicates a column of query results;- Functions need to be included in clauses, such as
with
; - When
limit 10
is beforedistinct()
, we find 10 results before deduplication; we deduplicate first then limit the result numbers, if otherwise.
We can get results below after executing it in Manager:
Count
For data columns with unknown lengths, we can get their lengths by using aggregation functions:
n({_id == "60017791850"}).e().n(as payee) limit 10
with count(distinct(payee)) as cardinality
return cardinality
count()
function count the number of a column of results.
We can get results below after executing it in Manager:
Order by
Let's return all edges with a descending order of their transaction amounts:
find().edges({_from == "60017791850"}) as payment limit 10
order by payment.tran_amount desc
return payment{*}
order by
clause adjusts the order of a column of query results, followed by its basis, withdesc
to set its sorting method.
We can get results below after executing it in Manager:
Group by
Let's group the 3-step paths based on their different 3rd nodes, and count the path number of each group:
n({_id == "60017791850"}).e({tran_amount > 70000})[2].n(as third).e({tran_amount > 70000}).n() limit 10
group by third
return table(third.cust_name, count(third))
- In order to define the alias of the 3rd nodes respectively, we need to extend
e()[3]
toe()[2].n().e()
; group by
clause divides a column of query results in groups, followed by its grouping method;table()
function collects multiple data columns into one table, displaying results more clearly.
We can get results below after executing it in Manager: