This manual covers the usage of Ultipa Transporter (Go).
Function Preview
Ultipa Transporter is a command-line-based lightweight tool for fast import/export meta-data to/from the Ultipa Graph database. Batch import/export is supported in remote mode.
A command-line can import or export multiple node files and edge files.
Local operation (to be declared in local
in the yml file) is executed against graphsets in a local Ultipa database. The directory of the local database server needs to be declared via parameter path
(by default is './data').
Local import operation is normally used for initializing the Ultipa server, in which case the Ultipa server should be stopped and there is no previous import executed.
We strongly recommend the local import operation to be implemented under the supervision of a certified Ultipa Graph Database engineer.
- Remote Operation (to be declared in
server
in the yml file)
Remote operation is executed against graphsets in a remote Ultipa database. The IP and port of the remote database server should be declared via parameter host
,as well as username
and password
if required.
Node/Edge File
Files carrying node/edge information (either exported or to be imported) are stored in the local directory. A file contains either nodes of a specific schema or edges of a specific schema; each row (except the header) in the file represents a node or an edge, and each column represents a property of the node/edge. Files to be imported can be encoded in the format of csv, tsv, txt, etc., and delimiters supported include ,
, \t
, |
, ;
. Files to be exported are in the format of csv with ,
as delimiter.
Data contained in the files (either node or edge) should be declared via nodeConfig
and
edgeConfig
. Format (and delimiter if contained) of all files in one operation should be consistent.
Import Declaration
Information that are consistent for all files (to be declared in settings
in the yml file): delimiter separator
, number of threads threads
and so on, see settings
introduced later for details. A csv or tsv file to import can have header or not, and the column name should be <property_name>
or <property_name>:<property_type>
.
Information to be declared for each file (to be declared in nodeConfig
, edgeConfig
in the yml file): file directory file
, schema of data - schema
, start position of import skip
, number of rows to import limit
, and property list properties
or types
:
(for csv or tsv file)
- when the file is headerless, for each column in turn, use
properties
to:- declare property name and type using
- name
andtype
; - declare property name and type to 'string' using
- name
only; - when there are less
- name
than expected, columns will be omitted from the left;
- declare property name and type using
- when the file contains header, for some columns, use
types
to:- declare or modify property type using
- name
andtype
; - declare or modify property type to 'string' using
- name
only; - for columns whose
- name
does not appear, the property type will either be 'string' if the header is<property_name>
, or be the<property_type>
from the header if the header is<property_name>:<property_type>
.
- declare or modify property type using
Valid type
are listed below:
_id
,_uuid
,_from
,_to
,_from_uuid
,_to_uuid
(for declaration of Ultipa system properties);string
,float
,double
,int32
,uint32
,int64
,uint64
,datetime
,timestamp
;_ignore
(for ignoring a column)
The data type of properties declared should satisfy the data in the columns. A misparsing of data, reading a column of integers as strings for example, will be prompted as an error.
Export Declaration
Information that is consistent for all files (to be declared in settings
in the yml file): the csv file directory outPath
, whether to include header writeHeder
(with <property_name>:<property_type>
as the column name if includes header). Batch size is auto-defined and need not be set.
Information to be declared for each file (to be declared in nodeConfig
, edgeConfig
in the yml file): schema of data - schema
, property list (not including Ultipa system properties) properties
. Ultipa system properties will always be exported and need no declaration.
Metadata Knowledge
Unique Identifier
Node has two types of unique identifiers which are _id
(32bit-string) and_uuid
(uint64-integer); edge has therefore starting node as either _from
or _from_uuid
, and ending node as either _to
or _to_uuid
. Edge, on the other hand, has only one identifier _uuid
. These six properties are Ultipa system properties.
Import Mode
When the import mode importMode
is set to upsert
or overwrite
, a node/edge that already exists (meaning that its identifier is provided in the file and the identifier is found in the graphset) will update or overwrite its corresponding record in the graphset; otherwise (which is, either there is no identifier in the file, or the identifier is not found in the graphset) the node/edge will be inserted into the graphset as a new record.
When the import mode importMode
is set to insert
, only new node/edge (either there is no identifier in the file, or the identifier is not found in the graphset) will be inserted; any identifier that already exists in the graphset will trigger error.
The absence of _id
or _uuid
from the file will be automatically generated by the system up insertion.
Special Requirements on Edges
An edge file must contain starting node and ending node of edges.
To successfully import edges, both ending nodes of the edges should already exist, otherwise the import will fail. In this case, set createNodeIfNotExist
to 'true' and let the system create the non-existing ending nodes so as to import the edge.
How does Transporter judge whether an ending node exists when importing edge files? During remote operation, Transporter will search the node file being imported in the same command line as well as the graphset for the ending nodes.
What will happen if only system properties _from
and _to
are provided as unique identifiers of ending nodes, or only _from_uuid
and _to_uuid
are provided? During remote operation, Transporter will automatically project the other two system properties.
Error
Error Before Importing
Error berfore importing is triggered when checking configuration in the yml file, creating graphset or creating schema. Possible causes:
- unconformity of the yml file content with yml format;
- parameter config error, such as property name or data type mismatching UQL specification;
- failure when creating graphset and/or schema;
Error During Importing
Error berfore importing is triggered when imporing data records into the server. Error types:
- server returned error;
- network error;
- data format error (inconsistency between declared file header and data columns);
- duplicated identifier (node or edge already exists under
insert
mode).
By default when an error occurs during importing a data batch, the whole batch will be skipped and the operation will continue from the next batch; one can use parameter stopWhenError
to make the import operation fully stop once an error occurs, without importing any later batches. The error type, skipped data rows (represented by the start row position and the total number of rows) will both be recorded in the log file, which is for easy re-import using parameters skip
and limit
.
The 3rd type of error which is data format related occurs when:
- in the case of a headerless file, the number of
- name
declared underproperties
is different than the number of columns in the file; - in the case of a file with header, some data columns have no column name, or some column names in the tail have no data.
To avoid the 3rd type of error, set fitToHeader
to 'true', and let Transporter import data columns according to the properties declared, which means columns without names will be ignored and names with no data will be taken as properties and filled with empty value or 0.
Command and Parameters
Ultipa Transporter has two tools --- ultipa-importer
and ultipa-exporter
--- and both have below parameters:
Parameter | Description |
---|---|
--help | Help |
--config <config_file_path> | Path of the configuration file |
Example of Import:
/opt/ultipa-transporter/ultipa-importer --config ./in.yml
Example of Export:
/opt/ultipa-transporter/ultipa-exporter --config ./out.yml
Example of Help:
/opt/ultipa-transporter/ultipa-exporter --help
Configuration File
Both import and export operations need a configuration file in yml format. There are four parts in a yml config file:
- server
- nodeConfig
- edgeConfig
- settings
server
Parameter | Specification | Description |
---|---|---|
host | <ip>:<port> | The IP and port of the remote server |
username | string | The username to log in to the remote server if required |
password | string | The password to log in to the remote server if required |
crt | <file_path> | The absolute path of the SSL certificate to communicate with the remote server, required when both servers are in SSL mode |
graphset | string | The graphset name, or take 'default' when not using this parameter; a non-existing graphset name will lead to failure, otherwise make Transporter auto-create the desired graphset as per instruction of yes under settings |
nodeConfig | edgeConfig
Parameter | Specification | Operation | Description |
---|---|---|---|
- schema | string | Import/Export | The schema of nodes/edges, must provide; a non-existing schema name will lead to failure; in the case of an import operation, make Transporter auto-create the desired schema as per instruction of yes under settings ; in the case of an export operation, use '*' to declare all properties of all schemas |
file | <file_path> | Import | The absolute path and name of the node file to import, such as '/opt/ultipa-server/import/amz/node.csv' |
skip | int | Import | The number of records that will be skipped and not imported, from the beginning of the data file, or do not skip any data when not using this parameter |
limit | int | Import | The number of records to import, or to import until the end of the file |
properties or types |
Import/Export | Prompt that file columns are to be declared next, use properties when importing a headerless file or when exporting files, and use types when exporting a file with header; a - schema will no have properties and types simultaneously |
|
- name | string | Import/Export | The name of a property, a non-existing property name will lead to failure; in the case of an import operation, make Transporter auto-create the desired property as per instruction of yes under settings |
type | string | Import | The data type of a particular column, valid types are: string, int32, int64, uint32, uint64, float, double, datetime and timestamp for custom properties; _id _uuid _from _to _from_uuid and _to_uuid for system properties; _ignore for omitting a column; take 'string' when not using this parameter; properties that already exist in the graphset need to be set with a data type consistent with the record in the graphset |
Note: Parameters with a dash '-' ('- schema' and '- name') and their sub-parameters carried can appear multiple times.
settings
Parameter |
Specification | Operation | Description |
---|---|---|---|
logPath | <path> | Import | The path of the log file during import operation, such as '/opt/ultipa-server/log/', or write to './log/' when not using this parameter |
separator | string | Import | The delimiter that separates data fields in the file during an import operation, support ',' '\t' '|' and ';', or take ',' when not using this parameter |
threads | int | Import | The maximum threads (an integer no less than 2) during an import operation, or take the number of CPU that runs the Transport when not using this parameter; 32 threads are recommended |
batchSize | int | Import | The number of records in each batch during an import operation, valid from 500 to 10000; an integer of 100000/number_of_properties is recommended, or take 10000 when not using this parameter |
importMode | string | Import | The mode of an import operation, support 'insert', ''upsert' and 'overwrite', or take 'insert' when not using this parameter |
createNodeIfNotExist | bool | Import | Whether to create nodes for the non-existing _from, _to, _from_uuid or _to_uuid of edges, or leave them non-existing and their related edges un-imported |
stopWhenError | bool | Import | Whether to terminate the import operation once an error occurs, or to skip the error data batch and continue with the next batch when not using this parameter |
yes | bool | Import | Whether to auto-create graphset, schema and properties that do not exist; will set to 'false' when not using this parameter |
fitToHeader | bool | Import | Whether to omit or fill up data columns according to the header in the data file or header configured in the yml file; the inconsistency between data columns and property header will trigger an error when not using this parameter |
writeHeader | bool | Export | Whether to write header into the csv (or tsv) file during an export operation, or to write when not using this parameter, with column names in the form of <property>:<type> |
outPath | <path> | Export | The path of the exported files, such as '/opt/ultipa-server/import/amz/', or write to './export/' when not using this parameter |
Note: exported files (node and edge) are automatically named, eg., node file will be named with format: <schema>.node.<file_type>, which can be 'default.node.csv'
YML Samples
Remote Import
Example: Given the following 3 files, import them into a remote server.
Node file-A (txt): @student, in which the proterty type of 'age' is mistakenly written as 'string'; please revise it into 'uint32' during import:
stuNo:_id,name:string,age:string,gender:string
20215865,Alice,24,f
20215925,Jack,25,m
20215973,John,28,m
20215990,Grace,25,f
Node file-B (txt): @course; please create an empty column for property 'professor':
crsNo,title,credit
CS202104,Computer Principle and Application,4
SH202127,Art of File and Television,2.5
MS202104,Calculus,3
Edge file (csv): @enroll, in which the 1st column is stuNo number and the 2nd column is crsNo; please ignore the 3rd column when importing:
20215865,SH202127,84.5
20215925,MS202104,77.5
20215973,CS202104,86
20215990,SH202127,64.5
Command line:
./ultipa-importer --config ./in.yml
in.yml:
server:
host: "192.168.35.151:60024"
username: employee533
password: joaedSSGsdf
crt: ""
graphset: test_graph
nodeConfig:
- schema: student
file: ./student.txt
types:
- name: age
type: uint32
- schema: course
file: ./course.txt
types:
- name: title
- name: professor
- name: crsNo
type: _id
- name: credit
type: float
edgeConfig:
- schema: enroll
file: ./enroll.csv
properties:
- name: _from
type: _from
- name: _to
type: _to
settings:
separator: ","
yes: true
fitToHeader: true
Remote Export
Example: export from a remote database all node properties of @student, node properties _id and 'title' of @course, and all edge properties of all schemas.
Command line:
./ultipa-exporter --config ./out.yml
out.yml:
server:
host: "192.168.35.151:60024"
username: employee533
password: joaedSSGsdf
crt: ""
graphset: test_graph
nodeConfig:
- schema: student
- schema: course
properties:
- name: _id
- name: title
edgeConfig:
- schema: *
settings:
writeHeader: false
outPath: ./export/temp
FAQ
Q: What system environment supports running Transporter (GO Version)?
A: Transporter (GO Version) is supported by MacOS, Windows and Linux. Make sure to place the node/edge file to be imported, the yml file and the importer/exporter under the same directory, and run the importer/exporter with command line tool.