Indexing, or property indexing, is a technique used in Ultipa to accelerate the retrieval of nodes and edges with specific properties. By avoiding full graph scans, indexes enable the database to quickly locate relevant data. This is especially advantageous when working with large graphs.
Ultipa supports single index on one property and composite index which involve multiple properties from a schema.
To retrieve node indexes in the current graph:
GQLSHOW NODE INDEX
To retrieve edge indexes in the current graph:
GQLSHOW EDGE INDEX
The information about indexes is organized into the _nodeIndex or _edgeIndex table with the following fields:
Field | Description |
|---|---|
id | Index id. |
name | Index name. |
properties | The properties involved in the index. |
schema | The schema of the properties involved in the index. |
status | Index status, which can be DONE or CREATING. |
You can create an index using the CREATE INDEX statement. Note that each property can only have one single index. The index creation runs as a job, you may run SHOW JOB <id?> afterward to verify the success of the creation.
System properties in Ultipa are inherently optimized for query performance and have built-in efficiencies. They do not support indexing.
Syntax<create index statement> ::= "CREATE INDEX" <index name> "ON" < "NODE" | "EDGE" > <schema name> "(" <property index item> [ { "," <property index item> }... ] ")" <property index item> ::= <property name> [ "(" <bytes> ")" ]
Details
<index name> must be unique among nodes and among edges, but a node index and an edge index may share the same name.<property index item>; for a composite index, lists multiple <property index item>.string or text, you can specify the maximum number of bytes [1] (count from left) to be indexed for each value. If omitted, the default indexing length is 1024 bytes for string and 2048 bytes for text. Learn more about how this byte-length limitation affects queries.[1] In standard English text, most encodings (such as ASCII or UTF-8) use 1 byte per character. However, for non-English characters, the byte size may vary—for example, one Chinese character typically occupies 3 bytes.
To create single index named cBalance for the property balance of card nodes:
GQLCREATE INDEX cBalance ON NODE card (balance)
To create single index named name for the property name (string type) of card nodes, restricting the indexed byte-length as 10:
GQLCREATE INDEX name ON NODE card (name(10))
To create composite index named transAmountNotes for properties amount and notes (text type, restricting the indexed byte-length as 10) for transfer edges:
GQLCREATE INDEX transAmountNotes ON EDGE transfer (amount, notes(10))
You can drop an index using the DROP NODE INDEX or DROP EDGE INDEX statement. Dropping an index does not affect the actual property values stored in shards.
NOTEA property with an index cannot be dropped until the index is deleted.
To drop the node index cBalance:
GQLDROP NODE INDEX cBalance
To drop the edge index transAmountNotes:
GQLDROP EDGE INDEX transAmountNotes
Indexes are automatically applied when the corresponding properties are used in the following types of queries. They are not effective in other types of queries.
1. Node retrieval using a single node pattern. For example,
GQLCREATE INDEX user_age_index ON NODE user (age)
The user_age_index is effective in the following queries:
GQLMATCH (n:user {age: 45}) RETURN n
GQLMATCH (n) WHERE n.age > 45 RETURN n
In the second query, the node label is not specified, so user_age_index is only partially used during the search for user nodes.
2. Edge retrieval using a one-step path pattern. For example,
GQLCREATE INDEX links_weight_index ON EDGE links (weight)
The links_weight_index is effective in the following query:
GQLMATCH ()-[e:links WHERE e.weight = 2]->() RETURN e
The edge direction can be left (<-[]-), right (-[]->), or any (-[]-).
The following query does not use links_weight_index because it retrieves paths, not edges:
GQLMATCH p = ()-[e:links WHERE e.weight = 2]->() RETURN p
3. Start node filtering in path patterns.
The above user_age_index is effective in the following query:
GQLMATCH p = (n:user WHERE n.age > 45)-[]-()-[]-() RETURN p
It does not apply to the following query:
GQLMATCH p = ()-[]-(n:user WHERE n.age > 45) RETURN p
The order of properties in a composite index matters — queries that match the leftmost properties of the index (i.e., the first property or the first few properties in the defined order) will benefit from the index.
For example:
GQLCREATE INDEX name_age ON NODE user (name(10),age)
MATCH (u:user WHERE u.name = "Kavi" AND u.age > 20) uses the index.MATCH (u:user WHERE u.name = "Kavi") uses the index.MATCH (u:user WHERE u.age > 20) doesn't use the index.MATCH (u:user WHERE u.name = "Kavi" AND u.age > 20 AND u.grade = 7) uses the index, meanwhile it contains the filtering for the grade property which lacks an index.When using indexes with string or text properties, ensure the byte-length of the string used in the filter does not exceed the defined limit when creating the index.
For example, an index Username is created for the name property of the user nodes with a 8-byte limitation:
GQLCREATE INDEX Username ON NODE user (name(8))
The query below won't utilize the Username index as the specified string Aventurine exceeds the 8-byte limit:
GQLMATCH (n:user {name: "Aventurine"}) RETURN n