UltipaDocs
Products
Solutions
Resources
Company
Start Free Trial
UltipaDocs
Start Free Trial
  • Ultipa CLI
    • Overview
    • Importer
    • Import Configurations
      • Import from CSV
      • Import from JSON / JSONL
      • Import from a Relational Database
      • Import from Neo4j
      • Import from BigQuery
      • Import from Kafka
      • Import from Hive
      • Import from Salesforce
      • Import from RDF
      • Import from GraphML
    • Exporter
    • Export Configurations
      • Export to CSV
      • Export to JSON / JSONL
      • Export to GraphML
  1. Docs
  2. /
  3. Ultipa Tools
  4. /
  5. Ultipa Transporter

Configuration Reference

This page is the single reference for the YAML configuration consumed by gqldb-importer. Every supported source uses the same top-level shape; source-specific fields live under a block named after the source.

Generate a starter configuration with ./gqldb-importer -sample <type> (or -sample all for one of each).

Top-Level Structure

YAML
mode: <source-type>    # csv, json, jsonl, sql, neo4j, bigQuery, kafka, hive, salesforce, rdf, graphml
server:                # GQLDB connection and target graph
settings:              # Batching, threading, parsing, logging
<source-type>:         # Optional: source-specific block (sql, neo4j, kafka, ...); omitted for file sources
nodes:                 # Where to read nodes from (file sources put this at top level)
edges:                 # Where to read edges from (file sources put this at top level)

For file sources (csv, json, jsonl), nodes / edges sit at the top level. For single-file graph sources (rdf, graphml), there is no nodes / edges, the entire file is imported via the source-specific block. For database / query / streaming sources (sql, neo4j, bigQuery, hive, salesforce, kafka), nodes / edges are nested inside the source-specific block.

mode

Must match the source the configuration is for. The importer rejects a mismatch between mode and the source-specific block name.

ValueSource
csvCSV files
jsonJSON files (array of objects)
jsonlJSON-Lines files
sqlRelational databases (MySQL, PostgreSQL, SQL Server, Oracle, Snowflake)
neo4jNeo4j
bigQueryGoogle BigQuery
kafkaKafka topics
hiveApache Hive
salesforceSalesforce (SOQL)
rdfRDF (N-Triples / Turtle / RDF/XML)
graphmlGraphML

server

Connection to the target GQLDB cluster and the destination graph.

FieldTypeDescription
hostlist of stringsOne or more host:port entries. Multiple entries enable client-side failover.
usernamestringGQLDB user. Supports env vars: "${DB_USERNAME}".
passwordstringGQLDB password. Supports env vars: "${DB_PASSWORD}".
graphstringTarget graph name.
graph_typestringopen or closed. Used when the importer auto-creates the graph.
edge_idboolIf the importer auto-creates the graph, controls the EDGE_ID feature on it. true (default) creates the graph with EDGE_ID enabled; false creates it with WITH EDGE_ID DISABLED. Matches the GQLDB default of EDGE_ID-enabled for new graphs. See Node and Edge IDs.
timeoutintegerPer-RPC timeout in seconds.
tls.enabledboolEnable TLS to the GQLDB server.
tls.cert_filestringClient certificate path.
tls.key_filestringClient key path.
tls.ca_filestringCA certificate path.

settings

Common runtime knobs. Source-specific parsing options (e.g., CSV separator) also live here and are marked accordingly.

FieldTypeDefaultApplies toDescription
batch_sizeinteger1000AllRecords per batched RPC.
threadsinteger4AllWorker thread count.
import_modestringoverwriteAllinsert (fail on dup _id), overwrite (replace), upsert (update or insert).
skip_invalid_nodesbool—AllSkip nodes that fail validation; do not abort.
stop_on_errorbool—AllAbort the import on the first error.
create_node_if_not_existbool—AllWhen inserting an edge, auto-create missing endpoints.
estimated_nodesinteger—AllHint for the bulk-import pipeline.
estimated_edgesinteger—AllHint for the bulk-import pipeline.
timezonestring—AllTimezone for parsing temporal values. Accepts UTC offsets ("+0800", "-0500", "+08:00") or IANA names ("Asia/Shanghai").
timestamp_unitstringautoAlls (seconds) or ms (milliseconds).
log_levelstringinfoAlldebug, info, warn, error.
log_pathstring—AllPath to the main log file.
error_log_pathstring—AllPath to the error-only log file.
log_appendbool—AllAppend to log files instead of truncating.
separatorstring,CSVField separator.
quotestring"CSVQuote character.
commentstring—CSVComment line prefix.
fit_to_headerboolfalseCSVWhen true, ignore extra columns past the header.
lazy_quotesbooltrueCSVAllow lazy / unescaped quotes inside fields.
trim_spacebooltrueCSVTrim leading / trailing whitespace from each field.

nodes and edges

The structure depends on the source category.

Per-entry fields (all sources)

FieldRequiredDescription
labels (nodes) / label (edges)yesTarget label(s). Nodes accept multiple.
id_columnoptionalColumn / field carrying the entity's _id. Default: _id. Valid on nodes always; valid on edges only when the target graph has EDGE_ID enabled (i.e., server.edge_id: true or an already-enabled existing graph). Supplying id_column on an edge entry against an EDGE_ID-disabled graph is rejected.
from_column(edges)Column / field carrying the source node's _id.
to_column(edges)Column / field carrying the target node's _id.
propertiesoptionalEither the short form (a map of name: type) or the list form (a list of objects with name, type, and optionally prefix, new_name). See Property Mapping.

File sources add file: (path) and optionally head: (header present?). Database / streaming sources add query:, topic:, schema: (logical type name) as documented per source.

Example of assigning custom edge _ids from the source — edge_id must be enabled on the target graph:

YAML
server:
  graph: "my_graph"
  edge_id: true             # required for id_column on edges

edges:
  - file: "./data/knows.csv"
    label: "KNOWS"
    id_column: "txn_id"     # source column carrying the edge _id
    from_column: "from_id"
    to_column: "to_id"

Property Mapping

Short form — name to type:

YAML
properties:
  age: int32
  salary: double
  active: bool

List form — full control, supports renaming, ID prefixing, and explicit _id marker:

YAML
properties:
  - name: cust_no       # source column / field name
    type: _id           # mark this property as the node's _id
    prefix: "CUST_"     # prepend a prefix to the value (e.g., "123" -> "CUST_123")
  - name: full_name
    type: string
    new_name: name      # rename in target graph
  - name: age
    type: int32

Type values: string, bool, int32, int64, uint32, uint64, float, double, timestamp, plus _id (special — marks the ID column when using the list form).

Per-Source Reference

Common fields above are not repeated below; this section documents only what changes per source.

csv

Top-level nodes / edges entries; each carries a file: path.

YAML
nodes:
  - file: "./data/people.csv"
    labels: ["Person"]
    head: true           # default true; file has header row
    properties:
      age: int32

edges:
  - file: "./data/knows.csv"
    label: "KNOWS"
    from_column: "from_id"
    to_column: "to_id"

CSV parsing options (separator, quote, comment, fit_to_header, lazy_quotes, trim_space) live under settings.

json

Top-level nodes / edges, one file: per entry. The JSON file is an array of objects keyed by column names.

YAML
nodes:
  - file: "./data/people.json"
    labels: ["Person"]
    properties:
      age: int32

edges:
  - file: "./data/knows.json"
    label: "KNOWS"
    from_column: "from_id"
    to_column: "to_id"

jsonl

Identical shape to json. Each line of the input file is one JSON object.

sql

Connects to a relational source and runs one query per node/edge entry.

YAML
sql:
  driver: mysql          # mysql, postgres, sqlserver, oracle, snowflake
  host: "localhost"
  port: 3306
  database: "my_database"
  username: "db_user"
  password: "db_password"
  # dsn: ""              # alternative: full connection string

  nodes:
    - schema: "Person"
      query: "SELECT id AS _id, name, age FROM users"
      id_column: "_id"
      properties:
        age: int32

  edges:
    - schema: "FOLLOWS"
      query: "SELECT follower_id, following_id, created_at FROM follows"
      from_column: "follower_id"
      to_column: "following_id"
      properties:
        created_at: timestamp

schema is the target label. Either supply host/port/database/username/password or dsn (a complete driver-specific connection string).

neo4j

Queries the Neo4j source with Cypher.

YAML
neo4j:
  uri: "neo4j://localhost:7687"
  username: "neo4j"
  password: "password"
  database: "neo4j"

  nodes:
    - schema: "Person"
      query: "MATCH (n:Person) RETURN n.id AS _id, n.name AS name, n.age AS age"
      id_column: "_id"
      properties:
        age: int32

  edges:
    - schema: "KNOWS"
      query: "MATCH (a:Person)-[r:KNOWS]->(b:Person) RETURN a.id AS from_id, b.id AS to_id, r.since AS since"
      from_column: "from_id"
      to_column: "to_id"
      properties:
        since: int32

bigQuery

Uses a GCP service-account JSON for authentication.

YAML
bigQuery:
  projectId: "my-gcp-project"
  certFile: "./service-account.json"

  nodes:
    - schema: "Person"
      query: "SELECT id AS _id, name, age FROM my_dataset.users"
      id_column: "_id"
      properties:
        age: int32

  edges:
    - schema: "FOLLOWS"
      query: "SELECT follower_id, following_id FROM my_dataset.follows"
      from_column: "follower_id"
      to_column: "following_id"

kafka

Reads one Kafka topic per node/edge entry; each message is a JSON object.

YAML
kafka:
  brokers:
    - "localhost:9092"

  nodes:
    - schema: "Person"
      topic: "users"
      offset: oldest         # oldest, newest
      id_column: "_id"
      properties:
        age: int32

  edges:
    - schema: "FOLLOWS"
      topic: "follows"
      offset: oldest
      from_column: "follower_id"
      to_column: "following_id"

hive

Connects via HiveServer2.

YAML
hive:
  host: "localhost"
  port: 10000
  auth: "NONE"               # NONE, NOSASL, KERBEROS
  database: "default"
  username: ""
  password: ""

  nodes:
    - schema: "Person"
      query: "SELECT id AS _id, name, age FROM users"
      id_column: "_id"
      properties:
        age: int32

  edges:
    - schema: "FOLLOWS"
      query: "SELECT follower_id, following_id FROM follows"
      from_column: "follower_id"
      to_column: "following_id"

salesforce

Authenticates with username + password + security token. Queries are SOQL.

YAML
salesforce:
  url: "https://your-instance.salesforce.com"
  username: "[email protected]"
  password: "sf_password"
  token: "security_token"

  nodes:
    - schema: "Account"
      query: "SELECT Id, Name, Industry FROM Account LIMIT 1000"
      id_column: "Id"

  edges:
    - schema: "CONTACT_OF"
      query: "SELECT Id, AccountId, Name FROM Contact LIMIT 1000"
      from_column: "Id"
      to_column: "AccountId"

rdf

Single file; no nodes / edges blocks. Triples become nodes and edges based on the RDF graph.

YAML
rdf:
  file: "./data/ontology.nt"
  format: ntriples           # ntriples, turtle, rdfxml
  defaultSchema: "RDFNode"   # label for unlabeled subjects

graphml

Single file; no nodes / edges blocks. Labels come from the configured attribute.

YAML
graphml:
  file: "./data/graph.graphml"
  schemaAttr: "type"         # GraphML attribute name carrying the label
  defaultSchema: "Node"      # label when the attribute is missing

CLI Overrides

A subset of server fields can be overridden at the command line, which is useful for credential injection in CI or quick environment swaps. See Flags.

FlagOverrides
-hostserver.host
-usernameserver.username
-passwordserver.password
-graphserver.graph
-levelsettings.log_level