Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit
Search
v2.x
    v2.x

    Transporter Instructions

    Function Preview

    Ultipa Transporter is a command-line-based lightweight tool for fast import/export meta-data to/from the Ultipa Graph database. Batch import/export is supported in local/remote modes.

    Local Operation

    Local operation is executed against graphsets in a local Ultipa database. The directory of the local database server needs to be declared via parameter --db_path (by default is './data'). Local import operation is normally used for initializing the Ultipa server, in which case the Ultipa server should be stopped and there is no previous import executed.

    We strongly recommend the local import operation to be implemented under the supervision ofa certified Ultipa Graph Database engineer.

    Remote Operation

    Remote operation is executed against graphsets in a remote Ultipa database. The IP and port of the remote database server should be declared via parameter --host,as well as --username and --password if required.

    Node/Edge File

    Files carrying node/edge information (either exported or to be imported) are stored in the local directory. A file contains either nodes or edges; each row (except the header) in the file represents a node or an edge, and each column represents a property of the node/edge. Files are encoded in the format of ANSI, UTF-8, or UTF-8 BOM.

    Import Operation

    A node file and/or an edge file can be imported through an import operation. Data file directories should be declared via --node_file and --edge_file, and the files can contain header or not.

    Some information of the import files can be but are not necessary to be declared:

    • delimiter via --separator;
    • concurrency via -j;
    • batch size via --batch_size;
    • starting row position via --skip and the total number of rows to import via --limit;
    • log file directories via --node_log_file and --edge_log_file.

    Property Name

    The property each column represents can be declared by two means:

    1. use the column names in the file header. Ultipa system properties that are not using their default names (which are _id, _from_id, _to_id, _o, _from_o and _to_o) should be declared via --id and other five parameters;
    2. use parameters --node_headers and --edge_headers to declare properties when the file has no header.
    About _id and _o

    Node id of int64 type can be declared as _id, and nodes will be imported in insert/upsert/overwrite mode based on the uniqueness of the value of _id; the _o will be automatically generated if absent. Otherwise, in which case the node id is string or integer but beyond int64, declare node id as _o and let _id be automatically generated if absent.

    If node ids of both types exist in the node file, then declare both, but only \_id will be checked for uniqueness.

    About _from_id, _to_id, _from_o and _to_o

    The start-node id and end-node id of the edge should be, according to the situation, declared as either _from_id and _to_id, or _from_o and _to_o and let the system convert to the corresponding _from_id and _to_id. The start node and end node should already exist in the graphset (or the node file to be imported simultaneously), or use parameter --create_node_if_not_exist to create the non-existing nodes so as to import the edge successfully.

    Property Type

    The data type of Ultipa system property needs not to be declared, but that of customized property should be declared via parameters --node_properties_type and --edge_properties_type, otherwise the type is by default 'string'.

    Import Error

    There are 4 types of error that may occur during an import operation:

    1. server returned error;
    2. network error;
    3. parameter config error;
    4. data format error.

    By default when an error occurs during importing a data batch, the whole batch will be skipped and the operation will continue from the next batch; one can use parameter --stop_when_error to make the import operation fully stop once an error occurs, without importing any later batches. The error type, skipped data rows (represented by the start row position and the total number of rows) will both be recorded in the log file, which is for easy re-import using parameters --skip and --limit.

    The 4th type of error which is data format related occurs when the property list (either provided in the file header or declared with parameters --node_headers and --edge_headers) has a length different than the number of columns, which means some columns have no property names or some properties have no data in the columns. Parameter --fit_to_header can help to ignore the discrepancy and import data columns according to the property list, which means columns without column names will be ignored and column names with no data will be taken as properties and filled with empty value or 0. The 4th type error won't be triggered if --fit_to_header is used.

    Export Operation

    An export operation will generate either a node file or an edge file. The file header is included, data contained (node or edge) should be declared via --export_type, and the customized properties via --export_properties. File directory can be declared via --export_file and delimiter via --separator. Ultipa system properties will always be exported and need no declaration.

    Command and Parameters

    A command-line can execute one of the four operations, either import or export, and either remote or local.

    The command line is composed of the directorry of Transporter executable file and its parameters:

    <transporter_directory> <param1> <param2> <param3> ... <paramN>
    

    For ease of reading the commands line in the upcomming examples are separated into multi lines with back slash \:

    <transporter_directory> <param1> \
    <param2> \
    <param3> <param4> <param5> \
    ...
    

    Global Parameters

    Header:Ultipa Transporter

    Parameter Description
    --help Get Help
    --export Export from database to file; or import from file to the database when not using this parameter
    --graph [arg] The name of the graphset in the target database; by default the graph set name is 'default'; the nonexistence of the graph set name will lead to auto-creation of graph in a local operation, but failure in a remote operation
    -j [arg] The amount of concurrency during the import operation expressed with an integer no less than 2; by default dual-core is used
    --cpu [arg] Alternative command of -j [arg]
    --batch_size [arg] The number of records in each batch during the import operation, valid from 500 to 10000; by default the number is 10000, an integer of <100000/NumberOfProperties> is recommended
    --separator [arg] The delimiter that separates data fields in the file; by default the delimiter is ',', valid delimiters are ',' '\t' '|' and ';'
    --node_log_file [arg] The path and file name of the log file that records node import operation, such as '/opt/ultipa-server/log/node.log'; by default the log is created under the current path './transport.node.log'
    --edge_log_file [arg] The path and file name of the log file that records edge import operation, such as '/opt/ultipa-server/log/edge.log'; by default the log is created under the current path './transport.edge.log'

    Remote Parameters

    Header:Remote option

    Parameter Description
    --host [ip]:[port] The IP and port of the remote server
    -u [arg] The username to log in to the remote server if required
    --username [arg] Alternative command of -u [arg]
    -p [arg] The password to log in to the remote server if required
    --password [arg] Alternative command of -p [arg]
    --crt [arg] The path and file name of the SSL certificate in the local server, required in SSL communication with the remote server; both servers must be in SSL mode when this parameter is used

    Local Parameters

    Header:Local option

    Parameter Description
    --db_path [arg] The path of the target database in the local server; by default the path is './data'

    Import Parameters

    Header:Import option

    Parameter Description
    --node_headers [p1] [p2] [p3]... The names of properties that each column in the node file represents; valid when the node file is headless, properties that do not exist in the target graph set will be created
    --edge_headers [p1] [p2] [p3]... The names of properties that each column in the edge file represents; valid when the edge file is headless, properties that do not exist in the target graph set will be created
    --node_properties_type [p1]=[type1] [p2]=[type2]... The data types of customized properties that each column in the node file represents; by default the type is string, valid types are string, int32, int64, uint32, uint64, float, double; data type must stay consistent in case the property already exists in the target graph set
    --edge_properties_type [p1]=[type1] [p2]=[type2]... The data types of customized properties that each column in the edge file represents; by default the type is string, valid types are string, int32, int64, uint32, uint64, float, double; data type must stay consistent in case the property already exists in the target graph set
    --id [arg] The name of the column that represents _id of the node in the node file; by default the name is _id
    --o [arg] The name of the column that represents _o of the node in the node file; by default the name is _o
    --from_id [arg] The name of the column that represents _from_id of the edge in the edge file; by default the name is _from_id
    --to_id [arg] The name of the column that represents _to_id of the edge in the edge file; by default the name is _to_id
    --from_o [arg] The name of the column that represents _from_o of the edge in the edge file; by default the name is _from_o
    --to_o [arg] The name of the column that represents _to_o of the edge in the edge file; by default the name is _to_o
    --upsert
    (not available for v3.1)
    Execute insert+update; or execute only insert when not using this parameter, namely the insertion of records with existing _id or _o will fail
    --overwrite
    (not available for v3.1)
    Execute insert+overwrite; or execute only insert when not using this parameter, namely the insertion of records with existing _id or _o will fail
    --create_node_if_not_exist Create nodes from the edge file that do not exist in the target graph set
    --increment Load id of nodes from the graphset and check for the existence of starting/ending nodes of edges, to guarantee a successful import of edges; used when the starting/ending nodes of the edges already exist in the target graphset. In case that the starting/ending nodes of an edge are neither imported simultaneously nor previously, the parameter --create_node_if_not_exist should be used, otherwise the import of edges will fail
    --cache Cache the _from_id and _to_id of an edge to local when it is being imported during a remote import operation, to speed up the import of the edges after it; this is strongly recommended when the _from_id and/or _to_id in an edge file are highly shared
    --skip [arg] The number of records that will be skipped and not imported, from the beginning of the data file; expressed with a positive integer, by default the value is 0
    --limit [arg] The number of records to import, expressed with a positive integer; by default the value is -1, namely to import all the rest of records from current position
    --stop_when_error Terminate the ongoing import operation when the first error occurs and no consequent data batch will be imported; the import operation will continue from the next data batch when not using this parameter
    --fit_to_header Ignore the data columns that have no property names and fill empty values for the properties that have no corresponding data columns; when not using this parameter, the inconsistency between data columns and property header will trigger errors, and parameter --stop_when_error can be used to terminate the import operation
    --node_file [arg] The path and file name of the import node file in the local server
    --edge_file [arg] The path and file name of the import edge file in the local server

    Export Parameters

    Header:Export option

    Parameter Description
    --export_type [arg] The data to be exported, either 'node' or 'edge'
    --export_properties [p1] [p2] [p3]... The names of customized properties to be exported
    --export_file [arg] The path and file name of the export file in the local server; by default the file is created under the current path './export/export.csv'

    Examples

    Local Import

    Example: Import a node file and an edge file, both have no header. Parts of the start-nodes and end-nodes of the edges are included in the node file, while others already exist in the graphset. All node ids are string, data delimiter is ',':

    --db_path ./data \
    --graph test \-j 6 --batch_size 10000 \
    --increment \--node_headers _o name age \
    --node_properties_type age=int \
    --edge_headers _from_o _to_o name rank \
    --edge_properties_type rank=int \
    --node_file "/opt/ultipa-server/amz/amz_nodes.csv" \
    --edge_file "/opt/ultipa-server/amz/amz_edges.csv"
    

    Remote Import

    Example: Import an edge file with header. Start-node and end-node ids are of int64 type; column names are 'start_id', 'end_id', 'name' and 'rank'; data delimiter is \t:

    --host 127.0.0.1:60061 -u root -p root \
    --graph test \
    -j 6 --batch_size 10000 \
    --separator '\t' \--cache \
    --edge_log_file "/opt/ultipa-server/log/node.log" \
    --from_id start_id \
    --to_id end_id \
    --edge_properties_type rank=int \
    --edge_file "/opt/ultipa-server/amz/amz_edges.txt"
    

    Local Export

    Example: Export 'name' and 'age' of nodes into csv format:

    ./ultipa-transporter --export \
    --db_path ./data \
    --graph test \
    --export_type node \
    --export_properties name age \
    --export_file "/opt/ultipa-server/export/out_nodes.csv" 
    

    Remote Export

    Example: Export 'name' and 'rank' of edges into txt format with '\t' as delimiter, server connection in ssl mode:

    ./ultipa-transporter --export \
    --host 192.168.3.171:60061 -u mid -p mid --crt "/opt/ultipa-server/sslCirtificate.crt" \
    --graph test \
    --separator '\t' \
    -export_type edge \
    --export_properties name rank \
    --export_file "/opt/ultipa-server/export/out_edges.txt"
    

    FAQ

    Q: Will the result of a local import be synchronized to other instances in the cluster?

    A: A local operation requires the Ultipa Server to stop and data are imported only through disk I/O, no internet update is implemented and hence the clusters won't get in sync.

    The local mode is recommended to be used when importing data for the first time, to speed up the process. In case of incremental import into a cluster, use remote to guarantee all instances are synchronized.

    Q: How to remote import with Docker?

    A: Use command docker run and parameter -it, declare host directory, container direcotry and image name via parameter -v, followed with Transporter parameters. eg., the host is /root/pro/ultipa-transporter/csv_data/gridsum, the container is /data, and the image is ultipa-transporter:v2.0.centos, use below command line to remote import edges:

    docker run -it -v /root/pro/ultipa-transporter/csv_data/gridsum:/data ultipa-transporter:v2.0.centos \
    --host 192.168.3.188:60061 -username root --password root \
    --graph_name test\
    -j 6 --batch_size 10000 \
    --edge_properties_type type=string \
    --edge_file "/data/edges.csv" 
    

    Q: When remote importing edges whose start-node and end-nodes already exist in the graphset, should I use --increment to make the program search for nodes in the graphset?

    A: --increment is valid only in a local operation. When importing edges in a remote operation, the nodes in the graphset are always checked hence need no --increment.

    Please complete the following information to download this book
    *
    公司名称不能为空
    *
    公司邮箱必须填写
    *
    你的名字必须填写
    *
    你的电话必须填写
    *
    你的电话必须填写