Data Stream
The concept of data stream helps understand the data rows produced by the iterative execution of UQL, each row of data comes from a preceding statement and enters the next statement one by one. Data stream is given alias in order to be called, so data stream also represents 6 types of data structures: NODE, EDGE, PATH, ATTR, ARRAY, TABLE (see Query - Alias System for more information).
In this chapter the term alias is equivalent to the term data stream.
Homologous Alias
Aliases derived from the result of a same query are homologous. Homologous aliases always have the same number of rows with data in the same row correlated.
The template query in the image below found 5 path; aliases path, tail and length are homologous, they all have 5 rows of data, and the tail and length in each row represent the terminal node and number of edges of the path in the same row:
If an alias is aggregated, deduplicated or processed with clause (except the deduplication in RETURN), the data rows of its homologous aliases are also discarded or re-ordered simultaneously. Homologous aliases will not be affected if the deduplication is composed in RETURN clause. The (a) and (b) in below image demonstrates this difference.
Heterologous Alias
Aliases coming from completely independent queries are heterologous, or non-homologous. Data rows of heterologous aliases usually have no correlations.
The aliases a and b generated by two UNCOLLECT clauses are heterologous, they each have 3 and 2 rows:
Heterologous aliases may enter the same statement:
- When entering RETURN clause but not in any function or numerical calculation, each alias is left in its original length
- When entering WITH clause but not for aggregation operation, aliases from different source will be cross-joined (multiplied as Cartesian Product)
- For the rest of cases, each alias will be cut to the minimum number of rows.
The (a) and (b) in below image demonstrates the case 2 and 3.
Subquery Triggered by Aliases
When an alias enters a chain statement, the times the chain statement are executed equals the number of rows of this alias, and each execution uses one row of data (system may apply optimizations based on the actual situation).
The KHop query in below image found 4 neighbors n, the delete command uses n as input and is executed 4 times and deletes 1 node each time:
If an alias enters a query command, then each execution of this query command triggered by a data row is called a subquery. Normally, each subquery will generate multiple rows of results, and the total number of results the query generates equals the sum of results of each subquery.
The UQL in the image below found 2 nodes - blue n and red n, the subsequent node query uses n as input, the first subquery found 3 red nodes, the second subquery found 2 blue nodes, that is 5 rows of data in total:
Subquery Result
The aggregation, deduplication and clause operations (except CALL) of a UQL operate all the rows of an alias as a whole. If the operation target is to operate the rows of an alias generated by each subquery independently, it is time to use CALL clause, see CALL for more information. Beside this, the parameter limit(<N>)
and prefix OPTIONAL introduced in chapter Query also operate the results of each subquery. The (a) and (b) in below image demonstrates the difference between operations on subquery results and whole query results, using limit(<N>)
and LIMIT: