Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit
v4.0

Transporter Instructions

Function Preview

Ultipa Transporter is a command-line-based lightweight tool for fast import/export meta-data to/from the Ultipa Graph database. Batch import/export is supported in local/remote modes.

A command-line can execute one of the four operations, either import or export, and either remote or local. Any operation can import or export multiple node files and edge files.

  • Local Operation (to be declared in local in the yml file)

Local operation is executed against graphsets in a local Ultipa database. The directory of the local database server needs to be declared via parameter path (by default is './data').

Local import operation is normally used for initializing the Ultipa server, in which case the Ultipa server should be stopped and there is no previous import executed.

We strongly recommend the local import operation to be implemented under the supervision of a certified Ultipa Graph Database engineer.

  • Remote Operation (to be declared in server in the yml file)

Remote operation is executed against graphsets in a remote Ultipa database. The IP and port of the remote database server should be declared via parameter host,as well as username and password if required.

Node/Edge File

Files carrying node/edge information (either exported or to be imported) are stored in the local directory. A file contains either nodes of a specific schema or edges of a specific schema; each row (except the header) in the file represents a node or an edge, and each column represents a property of the node/edge. Files are encoded in the format of csv, tsv, or json (which are subject to the specific versions of Transporter), and delimiters supported include ,, \t, |, ;.

Data contained in the files (either node or edge) should be declared via nodeConfig and edgeConfig. Format (and delimiter if contained) of all files in one operation should be consistent.

Import Declaration

Import declaration can be done based on file or file folder:

File Based

Information that are consistent for all files (to be declared in settings in the yml file): delimiter separator, number of threads threads and so on, see settings introduced later for details. A csv or tsv file to import can have header or not, and the column name should be <property_name> or <property_name>:<property_type>.

Information to be declared for each file (to be declared in nodeConfig, edgeConfig in the yml file): file directory file, file format fileType, schema of data - schema, start position of import skip, number of rows to import limit, and property list properties or types:

(for csv or tsv file)

  • when the file is headerless, for each column in turn, use properties to:
    • declare property name and type using - name and type;
    • declare property name and type to 'string' using - name only;
    • when there are less - name than expected, columns will be omitted from the left;
  • when the file contains header, for some columns, use types to:
    • declare or modify property type using - name and type;
    • declare or modify property type to 'string' using - name only;
    • for columns whose - name doesn't appear, the property type will either be 'string' if the header is <property_name>, or be the <property_type> from the header if the header is <property_name>:<property_type>.

Valid type are listed below:

  • _id, _uuid, _from, _to, _from_uuid, _to_uuid (for declaration of Ultipa system properties);
  • string, float, double, int, int32, uint32, int64, uint64, datetime, timestamp;
  • _ignore (for ignoring a column)

The data type of properties declared should satisfy the data in the columns. A misparsing of data, reading a column of integers as strings for example, will not be prompted as an error.

Folder Based

When the data files to be imported are store in one or several folders and all files satisfy:

  • naming convention of <schema>.node.csv or <schema>.edge.csv, and
  • file header contains column names in the format of <property_name>:<property_type>

Then use multipe - dir to declare the file folders; information that are consistent for all files are similar as those in the 'file based' case. All files in the declared folders that satisfy the above two conditions will be imported, while the rest of files will not be imported.

Export Declaration

Information that is consistent for all files (to be declared in settings in the yml file): file directory outPath, and log directory logPath. An exported csv (delimiter of ,) or tsv (delimiter of \t) file can either be headerless or include header (with <property_name>:<property_type> as the column name). Batch size is auto-defined and need not be set.

Information to be declared for each file (to be declared in nodeConfig, edgeConfig in the yml file): file format fileType, schema of data - schema, property list (not including Ultipa system properties) properties. Ultipa system properties will always be exported and need no declaration.

Metadata Knowledge

Unique Identifier

Node has two types of unique identifiers which are _id (32bit-string) and_uuid (uint64-integer); edge has therefore starting node as either _from or _from_uuid, and ending node as either _to or _to_uuid. Edge, on the other hand, has only one identifier _uuid. These six properties are Ultipa system properties.

Import Mode

When the import mode importMode is set to upsert or overwrite, a node/edge that already exists (meaning that its identifier is provided in the file and the identifier is found in the graphset) will update or overwrite its corresponding record in the graphset; otherwise (which is, either there is no identifier in the file, or the identifier is not found in the graphset) the node/edge will be inserted into the graphset as a new record.

When the import mode importMode is set to insert, only new node/edge (either there is no identifier in the file, or the identifier is not found in the graphset) will be inserted; any identifier that already exists in the graphset will trigger error.

The absence of _id or _uuid from the file will be automatically generated by the system up insertion.

Special Requirements on Edges

An edge file must contain starting node and ending node of edges.

To successfully import edges, both ending nodes of the edges should already exist, otherwise the import will fail. In this case, set createNodeIfNotExist to 'true' and let the system create the non-existing ending nodes so as to import the edge.

How does Transporter judge whether an ending node exists when importing edge files? In a remote operation, Transporter will search the node file being imported in the same command line as well as the graphset for the ending nodes; in a local operation, Transporter will only search the node file, unless increment is set to 'true' to trigger the Transporter to search in the graphset as well.

When only system properties _from and _to are provided as unique identifiers of ending nodes, or only _from_uuid and _to_uuid are provided, in a remote operation, Transporter will automatically project the other two system properties; in a local operation, the projection will be done by default, but for the sake of acceleration the parameter localIdCheck can be set to 'false' to omit the projection from _from and _to to _from_uuid and _to_uuid, in which case the _from_uuid and _to_uuid must be included in the edge file. (Please use parameter localIdCheck with caution, and check for more details about it in the parameter table listed below).

Import Error

There are 4 types of error that may occur during an import operation:

  1. server returned error;
  2. network error;
  3. parameter config error;
  4. data format error (inconsistency between declared file header and data columns);
  5. duplicated identifier (node or edge already exists under insert mode).

By default when an error occurs during importing a data batch, the whole batch will be skipped and the operation will continue from the next batch; one can use parameter stopWhenError to make the import operation fully stop once an error occurs, without importing any later batches. The error type, skipped data rows (represented by the start row position and the total number of rows) will both be recorded in the log file, which is for easy re-import using parameters skip and limit.

The 4th type of error which is data format related occurs when:

  • in the case of a headerless file, the number of - name declared under properties is different than the number of columns in the file;
  • in the case of a file with header, some data columns have no column name, or some column names in the tail have no data.

To avoid the 4th type of error, set fitToHeader to 'true', and let Transporter import data columns according to the properties declared, which means columns without names will be ignored and names with no data will be taken as properties and filled with empty value or 0.

Command and Parameters

Ultipa Transporter has two tools --- ultipa-importer and ultipa-exporter --- and both have below parameters:

Parameter Description
--help Help
--config <config_file_path> Path of the configuration file

Example of Import:

/opt/ultipa-transporter/ultipa-importer --config ./in.yml

Example of Export:

/opt/ultipa-transporter/ultipa-exporter --config ./out.yml

Example of Help:

/opt/ultipa-transporter/ultipa-exporter --help

Configuration File

Both import and export operations need a configuration file in yml format. There are four parts in a yml config file:

  • server / local
  • nodeConfig
  • edgeConfig
  • settings

server

Parameter Specification Description
host <ip>:<port> The IP and port of the remote server
username string The username to log in to the remote server if required
password string The password to log in to the remote server if required
crt <file_path> The absolute path of the SSL certificate to communicate with the remote server, required when both servers are in SSL mode
graphset string The graphset name, or take 'default' when not using this parameter; a non-existing graphset name will lead to failure, otherwise make Transporter auto-create the desired graphset as per instruction of yes under settings

local

Parameter Specification Description
path <db_path> The path of the local database, or take './data' when not using this parameter
graphset string The graphset name, or take 'default' when not using this parameter; a non-existing graphset name will lead to failure, otherwise make Transporter auto-create the desired graphset as per instruction of yes under settings

nodeConfig | edgeConfig

Parameter Specification Operation Description
- dir <path> Import The path of folder that contains files to import, the other parameters in this table are all invalid when setting this parameter
- schema string Import/Export The schema of nodes/edges, or take 'default' when not using this parameter; a non-existing schema name will lead to failure; in the case of an import operation, make Transporter auto-create the desired schema as per instruction of yes under settings; in the case of an export operation, use '*' to declare all properties of all schemas
file <file_path> Import The absolute path and name of the node file to import, such as '/opt/ultipa-server/import/amz/node.csv'
fileType string Import/Export The file type, support csv, tsv and json
skip int Import The number of records that will be skipped and not imported, from the beginning of the data file, or do not skip any data when not using this parameter
limit int Import The number of records to import, or to import until the end of the file
properties/types Import/Export Prompt that file columns are to be declared next, use properties when importing a headerless file or when exporting files, and use types when exporting a file with header; a - schema will no have properties and types simultaneously
- name string Import/Export The name of a property, a non-existing property name will lead to failure; in the case of an import operation, make Transporter auto-create the desired property as per instruction of yes under settings
type string Import The data type of a particular column, valid types are: string, int, int32, int64, uint32, uint64, float, double, datetime and timestamp for custom properties; _id _uuid _from _to _from_uuid and _to_uuid for system properties; _ignore for omitting a column; take 'string' when not using this parameter; properties that already exist in the graphset need to be set with a data type consistent with the record in the graphset

Note: Parameters with a dash '-' ('- dir', '- schema' and '- name') and their sub-parameters carried can appear multiple times.

settings

Parameter Specification Operation Description
name string Import/Export The name of the log file, or use timestamp when not using this parameter
logPath <path> Import/Export The path of the log file, such as '/opt/ultipa-server/log/', or write to './log/' when not using this parameter
separator string Import The delimiter that separates data fields in the csv (or tsv) file during an import operation, support ',' '\t' '|' and ';', or take ',' when not using this parameter
threads int Import The maximum threads (an integer no less than 2) during an import operation, or take 2 when not using this parameter; 5 ~ 8 threads are recommended
batchSize int Import The number of records in each batch during an import operation, valid from 500 to 10000; an integer of 100000/number_of_properties is recommended, or take 10000 when not using this parameter
importMode string Import The mode of an import operation, support 'insert', ''upsert' and 'overwrite', or take 'insert' when not using this parameter
createNodeIfNotExist bool Import Whether to create nodes for the non-existing _from, _to, _from_uuid or _to_uuid of edges, or leave them non-existing and their related edges un-imported
increment bool Import Whether to check the existence of _from, _to, _from_uuid or _to_uuid of edges in the target graphset during a local import operation, or to check only within the node file imported at the same time; this parameter must be set to true when createNodeIfNotExist is true in a local import operation
localIdCheck bool Import Whether to a) convert the _id in a node file to its corresponding _uuid and verify against the _uuid if provided in the node file; b) convert the _from and _to in an edge file to their corresponding _from_uuid and _to_uuid and verify againt the _from_uuid and _to_uuid if provided in the edge file, during a local import operation. The conversion and verification are by default executed, or omitted when this parameter is set to 'false' to speed up the import process. Please note that any inconsistency of _id and _uuid, or missing of _from_uuid and _to_uuid in the edge file shall require this parameter to be 'true'
stopWhenError bool Import Whether to terminate the import operation once an error occurs, or to skip the error data batch and continue with the next batch when not using this parameter
yes bool Import Whether to auto-create graphset, schema and properties that do not exist
fitToHeader bool Import Whether to omit or fill up data columns according to the header in the data file or header configured in the yml file; the inconsistency between data columns and property header will trigger an error when not using this parameter
writeHeader bool Export Whether to write header into the csv (or tsv) file during an export operation, or to write when not using this parameter, with column names in the form of <property>:<type>
outPath <path> Export The path of the exported files, such as '/opt/ultipa-server/import/amz/', or write to './export/' when not using this parameter

Note: exported files (node and edge) are automatically named, eg., node file will be named with format: <schema>.node.<file_type>, which can be 'default.node.csv'

YML Samples

Remote Import

Example: Given the following 3 files, import them into a remote server.

Node file-A (txt): @student, in which the proterty type of 'age' is mistakenly written as 'string'; please revise it into 'uint32' during import:

stuNo:_id,name:string,age:string,gender:string
20215865,Alice,24,f
20215925,Jack,25,m
20215973,John,28,m
20215990,Grace,25,f

Node file-B (txt): @course; please create an empty column for property 'professor':

crsNo,title,credit
CS202104,Computer Principle and Application,4
SH202127,Art of File and Television,2.5
MS202104,Calculus,3

Edge file (csv): @enroll, in which the 1st column is stuNo number and the 2nd column is crsNo; please ignore the 3rd column when importing:

20215865,SH202127,84.5
20215925,MS202104,77.5
20215973,CS202104,86
20215990,SH202127,64.5

Command line:

./ultipa-importer --config ./in.yml

in.yml:

server:
  host: "192.168.35.151:60024"
  username: employee533
  password: joaedSSGsdf
  crt: ""
  graphset: test_graph
  
nodeConfig:
  - schema: student
    file: ./student.txt
    fileType: txt
    types:
      - name: age
        type: uint32
  - schema: course
    file: ./course.txt
    fileType: txt
    types:
      - name: title
      - name: professor
      - name: crsNo
        type: _id
      - name: credit
        type: float

edgeConfig:
  - schema: enroll
    file: ./enroll.csv
    fileType: csv
    properties:
      - name: _from
        type: _from
      - name: _to
        type: _to

settings:
  separator: ","
  yes: true
  fitToHeader: true

Local Export

Example: export from a local database all node properties of @student, node properties _id and 'title' of @course, and all edge properties of all schemas.

Command line:

./ultipa-exporter --config ./out.yml

out.yml:

local:
  path: ./ultipa_graph_DB_test
  graphset: test_graph
  
nodeConfig:
  - schema: student
  - schema: course
    properties:
      - name: _id
      - name: title

edgeConfig:
  - schema: *

settings:
  writeHeader: false
  outPath: ./export/temp

Import Folders

in.yml:

local:
  path: ./ultipa_graph_DB_test
  graphset: test_graph
  
nodeConfig:
  - dir: "./import_nodes_2/"

edgeConfig:
  - dir: "./import_edges_1/"
  - dir: "./import_edges_2/"
  - dir: "./import_edges_3/"

settings:
  separator: ","
  yes: true
  importMode: upsert
  createNodeIfNotExist: true
  increment: true

FAQ

Q: Will the result of a local import be synchronized to other instances in the cluster?

A: A local operation requires the Ultipa Server to stop and data are imported only through disk I/O, no internet update is implemented and hence the clusters won't get in sync.

The local mode is recommended to be used when importing data for the first time, to speed up the process. In case of incremental import into a cluster, use remote to guarantee all instances are synchronized.

Q: How to remote import with Docker?

A: Follow below steps:

  1. Start container: use command docker run and parameter -itd, declare host directory, container directory via parameter -v, declare container name via parameter -name:
docker run -itd \
-v /tmp/transporter_test/data:/opt/ultipa-transporter/data \
--name transporter4.0test <Transporter_image>
  1. Enter container: use command docker run and parameter -itd:
docker exec -it transporter4.0test bash
  1. Run import command:
./ultipa-importer --config ./data/in.yml

Q: When remote importing edges whose start-node and end-nodes already exist in the graphset, should I set increment to 'true' to make the program search for nodes in the graphset?

A: increment is valid only in a local operation. When importing edges in a remote operation, the nodes in the graphset are always checked hence need no to set increment.

Please complete the following information to download this book
*
公司名称不能为空
*
公司邮箱必须填写
*
你的名字必须填写
*
你的电话必须填写
*
你的电话必须填写