UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Introduction
  • GQL vs Other Languages
    • Overview
    • Node and Edge Patterns
    • Path Patterns
    • Quantified Paths
    • Questioned Paths
    • Shortest Paths
    • Cheapest Paths
    • K-Hop Traversal
    • Graph Patterns
    • Overview
    • Open Graphs
    • Closed Graphs
    • Graph Types
    • Constraints
    • Projections
    • Storage Maintenance
    • Node and Edge IDs
    • INSERT
    • INSERT OVERWRITE
    • UPSERT
    • MERGE
    • SET
    • REMOVE
    • DELETE
    • FOREACH
    • LOAD CSV
    • Query Composition
    • Result Table and Visualization
    • MATCH
    • OPTIONAL MATCH
    • FILTER
    • LET
    • FOR
    • ORDER BY
    • LIMIT
    • SKIP
    • CALL
    • RETURN
    • Composite Query
    • NEXT
    • All Functions
    • Element Functions
    • Path Functions
    • Aggregate Functions
    • Mathematical Functions
    • Trigonometric Functions
    • String Functions
    • List Functions
    • Datetime Functions
    • Spatial Functions
    • Null Functions
    • Utility Functions
    • Type Conversion Functions
    • Table Functions
    • Database Functions
  • Operators
  • Predicates
    • CASE
    • LET Value Expression
    • Value Query Expression
    • List Expressions
    • Current Values
    • Index
    • Full-text Index
    • Vector Index
  • Transactions
  • Triggers
  • Query Management
  • Execution Plan
  • Backup and Restore
    • Variables
    • Values and Types
    • Comments
    • Reserved Words
    • Naming Conventions
    • Syntactic Notation
  • GQL Conformance
  1. Docs
  2. /
  3. ISO GQL
  4. /
  5. Data Manipulation

LOAD CSV

Overview

LOAD CSV statement has two forms:

  • Dump CSV form reads a CSV file and returns each row as a map. Use this to preview a file.
  • Import from CSV form writes each CSV row straight into the current graph as a node or edge. Use this to load data. Large files are streamed in batches.

Both forms read a local filesystem path or a file:// URI. Remote schemes (http://, https://, s3://, ...) are not supported.

Syntax
<load csv statement> ::= 
  "LOAD CSV FROM" <single-quoted file source> { <dump csv> | <import from csv> }

<dump csv> ::=
  [ "AS" <row variable> ]
  [ "WITH HEADER" [ "DELIMITER" <character> [ "QUOTE" <character> ] ] ]

<import from csv> ::=
  "WITH HEADER" [ "DELIMITER" <character> [ "QUOTE" <character> ] ]
  "INTO" <label name>
  [ "EDGE FROM" <label name> "(" <column> ")" "TO" <label name> "(" <column> ")" ]
  [ "MAPPING" "(" <property mapping> { "," ... } ")" ]

<property mapping> ::= <property> ":" <column> [ "AS" <type> ]

Details

  • <row variable> defaults to row when AS is omitted.
  • With WITH HEADER, the first row is consumed as column names and each subsequent row is keyed by those names. Without WITH HEADER, each row is keyed positionally: col0, col1, ... .
  • DELIMITER accepts a single character (e.g. ',', ';', '\t'). QUOTE accepts the double quote (") as the field-quote character.
  • Path resolution: absolute paths (/data/users.csv) and file:/// URIs are used as-is. Relative paths resolve against the server process's working directory, the directory the database was launched from, not the .gdb data folder. Prefer absolute paths in production.

Example CSVs

user.csv
_id,name,balance,risk_score,is_mule
USR_000000,David Brown,16625.7,0.86,true
USR_000001,James Johnson,62870.63,0.76,true
USR_000002,Jane Brown,17065.87,0.85,true
transfers.csv
_from,_to,amount,is_suspicious,timestamp
USR_000000,USR_000002,3384.45,false,2024-01-27T19:19:00Z
USR_000001,USR_000000,4353.22,true,2024-01-25T06:18:00Z
USR_000000,USR_000001,2482.13,false,2024-01-30T15:27:00Z
USR_000001,USR_000002,4050.61,false,2024-01-28T01:02:00

Dump CSV to Inspect Rows

Read a CSV file with no header. Each row is returned as a map keyed positionally (col0, col1, ...) under a single column named row (the default variable):

GQL
LOAD CSV FROM 'data/users.csv'

Result:

JSON
[
  {
    "row": {
      "col2": "balance",
      "col3": "risk_score",
      "col4": "is_mule",
      "col0": "_id",
      "col1": "name"
    }
  },
  {
    "row": {
      "col0": "USR_000000",
      "col1": "David Brown",
      "col2": "16625.7",
      "col3": "0.86",
      "col4": "true"
    }
  },
  {
    "row": {
      "col0": "USR_000001",
      "col1": "James Johnson",
      "col2": "62870.63",
      "col3": "0.76",
      "col4": "true"
    }
  },
  {
    "row": {
      "col3": "0.85",
      "col4": "true",
      "col0": "USR_000002",
      "col1": "Jane Brown",
      "col2": "17065.87"
    }
  }
]

Read a CSV file with a header so the map is keyed by column name:

GQL
LOAD CSV FROM 'data/users.csv' WITH HEADER

Result:

JSON
[
  {
    "row": {
      "_id": "USR_000000",
      "name": "David Brown",
      "balance": "16625.7",
      "risk_score": "0.86",
      "is_mule": "true"
    }
  },
  {
    "row": {
      "_id": "USR_000001",
      "name": "James Johnson",
      "balance": "62870.63",
      "risk_score": "0.76",
      "is_mule": "true"
    }
  },
  {
    "row": {
      "_id": "USR_000002",
      "name": "Jane Brown",
      "balance": "17065.87",
      "risk_score": "0.85",
      "is_mule": "true"
    }
  }
]

Bind the per-row map to a different column name with AS:

GQL
LOAD CSV FROM '/data/users.csv' AS u

Read a tab-separated file:

GQL
LOAD CSV FROM 'data/users.tsv' AS u WITH HEADER DELIMITER '\t'

Specify the field-quote character with QUOTE. The double quote (") is the supported quoting character:

GQL
LOAD CSV FROM 'data/users.csv' AS u WITH HEADER DELIMITER ',' QUOTE '"'

The result set has one column (row / u / whatever name AS specified). Each row in that column is a MAP — access fields client-side after fetching, e.g. row['name'] in your driver.

The rows are returned to the client only, they cannot be passed to a following RETURN, INSERT, or MATCH in the same query.

Import from CSV

Use INTO <label> to write each CSV row as a node (or as an edge with EDGE FROM ... TO ...) in a single statement, no chained INSERT needed. Returns a one-row summary with the count of imported (and, for edges, skipped) records.

Node import

Without MAPPING, every CSV column becomes a property of the same name on the new node. WITH HEADER is required (the import needs column names to map by). All values default to STRING.

GQL
LOAD CSV FROM 'data/users.csv' WITH HEADER INTO User

Add a MAPPING (...) clause to rename columns, select a subset, and apply type coercion. Include _id in the mapping if you plan to reference these nodes from a later edge import.

GQL
LOAD CSV FROM 'data/users.csv' WITH HEADER
INTO Person MAPPING (
  _id: '_id',
  name: 'name',
  balance: 'balance' AS FLOAT,
  riskScore: 'risk_score' AS FLOAT,
  isMule: 'is_mule' AS BOOL
)

Edge import

For edges, name the endpoint labels and the CSV columns that supply each endpoint's _id. Rows whose endpoint _id is missing in the graph are skipped and reported in the summary instead of erroring the whole import. Requires the graph to have EDGE_ID enabled (default for new graphs).

GQL
LOAD CSV FROM 'data/transfers.csv' WITH HEADER
INTO TRANSFERS EDGE FROM Person('_from') TO Person('_to')
MAPPING (
  isSuspicious: 'is_suspicious' AS BOOL,
  timestamp: 'timestamp' AS TIMESTAMP
)  

Supported Type Coercions

TypeAliasesAccepted format
STRINGTEXT, CHAR, VARCHARcell preserved verbatim
INT / INTEGERBIGINT, SMALLINT, INT8/16/32/64, UINT, UINT8/16/32/64base-10, whitespace trimmed
FLOAT / DOUBLEREAL, FLOAT32, FLOAT64IEEE 754, whitespace trimmed
DECIMALNUMERICarbitrary-precision, stored as the original digit string
BOOL / BOOLEANtrue/false/t/f/yes/no/y/n/1/0 (case-insensitive)
DATEYYYY-MM-DD
TIMEHH:MM[:SS[.fff]]
DATETIMELOCAL_DATETIMERFC 3339 or YYYY-MM-DD HH:MM:SS (no zone)
TIMESTAMPRFC 3339 (assumes UTC if no offset)
ZONED_DATETIMERFC 3339 with offset (2024-01-15T09:00:00+09:00)
DURATIONINTERVALISO 8601 (PT3H30M, P1Y2M, -PT15M, fractional units OK)
BYTESBLOB, BINARY, VARBINARYhex (0xDEADBEEF or deadbeef) or base64
POINT1.5 2.5, 1.5,2.5, or POINT(1.5 2.5) — longitude first
POINT3D1.0 2.0 3.0, 1.0,2.0,3.0, or POINT(1.0 2.0 3.0)

Notes:

  • Empty CSV cells produce a NULL (the property is omitted from the row) rather than an empty-string value.
  • Coercion failures error out with the row number and the offending column name.

Limitations

  • Only local filesystem paths and file:// URIs are supported. HTTP(S) and object-storage sources are not yet wired in.
  • Every cell is read as a STRING in the dump form; downstream client code is responsible for any type conversion. The inline-import form does its own coercion via AS <TYPE> in the MAPPING clause.