This manual covers the usage of Ultipa Importer (Go Version), a command-line-based lightweight tool for fast import of multiple metadata files from local to the Ultipa Graph database in one command.


Prerequisites
- node files and edge files
- configuration file (yml)
- a command line terminal that is compatible with your operating system:
- Linux or MacOS: bash, zsh, tcsh
- Windows: PowerShell
- a version of Ultipa Importer compatible with your operating system
Background Knowledge - System Properties
System properties of node:
_id
: ID of node, a string of maximum 128 bytes_uuid
: ID of node, an uint64
System properties of edge:
_uuid
: ID of edge, an uint64_from
: the_id
of start-node (FROM) of edge_to
: the_id
of end-node (TO) of edge_from_uuid
: the_uuid
of start-node (FROM) of edge_to_uuid
: the_uuid
of end-node (TO) of edge
Failures induced by system properties:
- ID of data already exists in the current graphset when the import mode is
insert
- Not providing ID of FROM or TO when importing edge data
- ID of FROM or TO does not exist in the current graphset and the node files that are imported in the same command, and not allowing system to create such FROM or TO by setting
createNodeIfNotExist
to 'true' - Provide node ID, FROM or TO in both string and uint64 type, but the mapping relation of two types of ID is not consistent with that in the the graphset or the node files imported in the same command
When providing only one type of node ID, FROM or TO, either string or uint64, the other type of ID will be automatically mapped or generated by the system.
Data File
- Each file: nodes or edges that belong to a specific schema
- Each row (except headers): a node or an edge
- Each column: a property
- File format:
csv
(file extension does not matter) - File delimiter:
,
,\t
,|
,;
- File headers (column name) format:
<property_name>
or<property_name>:<property_type>
, headerless allowed - Valid
<property_name>
(name
): 2 ~ 64 characters, not allow to start with wave line '~' or contain back quote '`' - Valid
<property_type>
(type
):- For system properties:
_id
,_uuid
,_from
,_to
,_from_uuid
,_to_uuid
- For custom properties:
string
,text
,float
,double
,int32
,uint32
,int64
,uint64
,datetime
,timestamp
,point
,list
- For columns to be ingored:
_ignore
- For system properties:
The data type of properties declared should satisfy the data in the columns. A misparsing of data, reading a column of integers as strings for example, will be prompted as an error.
YML: server
server:
host: "192.168.35.151:60024" # for cluster, separate multiple server nodes with comma ','
username: employee533
password: joaedSSGsdf
crt: "" # The directory of the SSL certificate when both servers are in SSL mode
graphset: test_graph # The graphset name, or use graphset 'default' by default
YML: nodeConfig | edgeConfig
- Headerless: edge file 'review.csv' of schema @review, columns are FROM, TO, rating, comment and tags:
A2CMX45JPSCTUJ,B0002CZSJO,5,The Best Cable,"[possitive,rebuy]"
A3EIML9QZO5NZZ,B0000AQRSU,5,awesome,"[possitive]"
A3C9F3SZWLWDZF,B000165DSM,2,worse than the one i bought last time,"[negative,rebuy]"
A1C60KQ8VJZBS5,B0002CZV82,4,Makes changing strings a breeze,"[possitive]"
edgeConfig:
- schema: review # Schema of current data file, mandatory
file: ./review.csv # The directory of the data file
properties: # set `properties` when the file is headerless
# declare name and type for each column, must correspond to the sequence of columns
- name: _from
type: _from # declare system property
- name: _to
type: _to
- name: rating # declare custom property
type: int32 # declare property type; must be consistent with that in the graphset if property already exist
- name: comment # set to string when `type` is not set
- name: tags
type: string[]
- With header: node file 'reviewer.txt' of schema @reviewer; need to revise some headers:
- 'level' is mistakenly marked as 'string' which should be 'uint32'
- 'address' is actually 'location' of 'point' type
reviewerID,username,address,level:string,birthday:datetime
A00625243BI8W1SSZNLMD,jespi59jr,POINT(32.5 -117.6),12,1984-05-31
A10044ECXDUVKS,Dean J Copely,POINT(39.4 105.9),10,1987-11-02
A102MU6ZC9H1N6,Teresa Halbert,POINT(42.9 2.3),5,2001-08-14
A109JTUZXO61UY,Mike C,POINT(112.6 103.8),9,1998-02-19
nodeConfig:
- schema: reviewer
file: ./reviewer.txt
types: # set `types` when the file has headers
# declare or revise type for columns if necessary, regardless of sequence
- name: level
type: uint32
- name: address
new_name: location # modify the property name in the header
type: point
- name: reviewerID
type: _id # declare system property
# use string for 'username' by default
# use 'datatime' from header for 'birthday'
A yml file can include both
nodeConfig
andedgeConfig
, where thenodeConfig
can have many sets ofschema
and its parameters with the same level, same ofedgeConfig
.
Other parameters:
Parameter | Specification | Default Value | Description |
---|---|---|---|
skip | int | 0 | The number of rows to be skipped (not imported) from the first record. |
limit | int | (no limit) | The total number of rows to import from the current data file. |
Parameters
skip
andlimit
are on the same level withschema
,file
and etc., they are usually set when re-importing data file after error occurred.
YML: settings
settings:
separator: "," # The delimiter of data columns of all the files to be imported, supports `,`, `\t`, `\|` and `;`, or take `,` by default
importMode: overwrite # The mode of an import operation, supports `insert`, `upsert`, and `overwrite`, or take `insert` by default
yes: true # Whether to auto-create graphset, schema and properties that do not exist, or do not auto-create by default
threads: 32 # The maximum threads (no less than 2), or take the number of CPUs that run the Importer by default; 32 threads recommended
batchSize: 1000 # The number of rows in each batch, valid from 500 to 10000; an integer of 100000/number_of_properties is recommended, or take 10000 by default
Other parameters:
Parameter | Specification | Default Value | Description |
---|---|---|---|
logPath | <log_path> | ./log/ | The path of the log file, i.e., '/data/import/log/' |
MaxPacketSize | int | 41943040 (40M) | The maximum bytes of each packet the GO SDK processes |
timezone | string | (local timezone) | The timezone of timestamp values, e.g. +00:80, Asia/Shanghai etc. |
createNodeIfNotExist | bool | false | true: create nodes for the non-existing _from, _to, _from_uuid or _to_uuid of edges; false: leave them non-existing and their related edges un-imported |
stopWhenError | bool | false | (When error occurs) true: terminate the import operation immediately; false: skip the error data batch and continue with the next batch when not using this parameter |
fitToHeader | bool | false | (When the header length and the number of data columns are inconsistent) true: omit or auto-fill columns based on the header; false: stop and throw an error |
Command Line
- Show help
./ultipa-importer --help
- Download configuration sample file
./ultipa-importer --sample
- Execute import operation, the config file in.yml is in the current directory
./ultipa-importer --config ./in.yml
All parameters:
Command |
Description |
---|---|
--help | show help information |
--config <FILE_PATH_NAME> | define the configuration file and execute import operation |
--sample | true: generate a sample config file; false: do not generate sample config file |
--host <IP:PORT> | overwrite the parameter host in the config file |
--graph <GRAPH_NAME> | overwrite the parameter graphset in the config file |
--username <USERNAME> | overwrite the parameter username in the config file |
--password <PASSWORD> | overwrite the parameter password in the config file |
--maxPacketSize <MAXPACKETSIZE> | overwrite the parameter MaxPacketSize in the config file |
--logAppend | true: append multiple error info into one log file; false: generate a log file for each error info |
--progressLog <boolean> | (for Ultipa Manager) true: generate progress log; false do not generate progress log |
--version | true: show Ultipa Importer version; false: do not show Ultipa Importer version |
Errors
Before Importing
Definition: errors triggered when checking configurations in the yml file, creating graphset or creating schema.
Triggers:
- unconformity of the yml file content with yml format
- parameter config error, such as property name or data type mismatching UQL specification
- failure when creating graphset and/or schema
During Importing
Definition: errors triggered when importing data files to the remote server.
Types:
- server returned error;
- network error;
- data format error (inconsistency between declared file header and data columns, see
fitToHeader
for solution); - duplicated identifier (data ID already exists under
insert
mode).
When error occurs during importing a data batch, the server will record the error type, the skipped data rows (represented by the start row position and the total number of rows) in the log file, which is for easy re-import using parameters skip
and limit
.
FAQ
Q: I got such error 'rpc error: code = ResourceExhausted desc = Received message larger than max (31324123 vs. 4194304)', what does it mean and how to solve it?
A: This message means when importing a data batch, the packet size which is 31324123 bytes exceeds the limit of 4194304 bytes. The possible reasons are too many properties imported at a time, excessive property volume (long texts stored in text type), or too large batchSize that has been set, as a result of which the data volume of a data batch exceeds the default server config of max_rpc_msgsize (4M) and/or the MaxPacketSize of Go SDK (40M).
Solution A: reduce the batchSize
in the config file
Solution B: raise the setting of MaxPacketSize
in the config file, and/or max_rpc_msgsize
in the server config (the latter requires a server re-boot).
Q: How to set timezone for time values?
A: Please follow below format examples:
- [YY]YY-MM-DD HH:MM:SS
- [YY]YY-MM-DD HH:MM:SSZ
- [YY]YY-MM-DDTHH:MM:SSZ
- [YY]YY-MM-DDTHH:MM:SS[+/-]0x00
- [YY]YYMMDDHH:MM:SS[+/-]0x00
Supports year of 4-digit or 2-digit (2-digit year will be parsed as 19xx if year≥70, or parsed as 20xx if year<70; supports month and day of 2-digit or 1-digit; dash (-) can be replaced with slash (/); [+/-]0x00
stands for +0700
or -0300
dependent, and Z stands for UTC 0 timezone.
Q: Can the headers or field names of _id
, _uuid
, _from
, _to
, _from_uuid
, _to_uuid
be declared as string or uint64?
A: For any column representing the above 6 types of system properties, no matter what its header or filed name is, should be declared as the corresponding system properties, but not string or uint64. If there are columns with headers or field names as _id
, _uuid
, _from
, _to
, _from_uuid
, _to_uuid
but not representing system properties, they should be declared as _ignore hence will not be imported, or be renamed into valid property names through parameter new_name
.
V4.3.5至V4.3.6
- Added
mode
to declare source type - Added
bigQuery
to configure BigQuery project information - Added
sql
to acquire data from BigQuery project (in replace offile
)
Change Log (V4.2 to V4.3)
- Added
new_name
for modifying property name in the header (undertypes
) - Modified logic when setting property type: yml > headers > default string
- Supports list, point type
- Column delimiter only supports comma ','
- Parses null for no data between two delimiters