docker pull [docker url]/tag:versiondocker load -i < 'docker image file' >docker imagesshelldocker run -itd --net=host -v <absolute_path_to_data_folder>:/opt/ultipa-server/data -v <absolute_path_to_config_folder>:/opt/ultipa-server/config -v <absolute_path_to_log_folder>:/opt/ultipa-server/log -v <absolute_path_to_mlog_folder>:/opt/ultipa-server/mlog -v <absolute_path_to_algo_folder>:/opt/ultipa-server/algo -e TZ=Europe/Paris --name ultipa-server --ulimit nofile=90000:90000 --privileged `image ID` docker exec -it ultipa-server bash start
Docker will report 'No config files found' error during the first start, while at the same time, it creates an example file named example.config automatically. Copy and rename example.config to server.config, and modify this file by following Server Configuration (See below) before restarting.
-e TZ=Europe/Paris-ulimit nofile=90000:90000 --privileged--cap-add=SYS_PTRACE --security-opt seccomp=unconfined| Error | Description |
|---|---|
| Machine binding failed!!! | Failed to bind MAC address of the machine. Please make sure that the license is copied to the right location, and MAC address of new machine must be added into the license |
| License expired! Server is stopping... | License is expired, new license is required |
| Rpc BuildAndStart failed! | Server public port is occupied |
ultipa_doctor_linux_v4.0 gen_config to generate cluster configuration file, then copy the file to the corresponding machine, as detailed in documentation Ultipa Doctor 4.0. You may also configure the cluster manually.ultipa_doctor_linux_v4.0 health_check as detailed in documentation Ultipa Doctor 4.0. You may also use curl http://private_addr:private_ip/raft_admin, the cluster is successfully deployed if the Leader is elected normally.cd example && sh import_example.sh.config/server.config
section:[Server]
| Item | Default | Description |
|---|---|---|
| host | 0.0.0.0:60061 | Server listening address |
| server_id | 1 | server_id must be unique for cluster; single server should use the default value |
| dbpath | data | Database file storage path |
| resource_path | resource | Resources storage path, resources such as full-text dictionary |
| worker_num | 10 | Number of service threads, that is, worker_num * 2 |
| slow_query | 5 | Threshold of slow query, 5 seconds by default |
| authorized | true | Whether to enable authentication mode |
| use_ssl | false | Whether to enable ssl |
| ssl_key_path | ./resource/cert/ultipa.key | Valid only if ssl is enabled |
| ssl_crt_path | ./resource/cert/ultipa.crt | Valid only if ssl is enabled |
| license | ./resource/cert/ultipa.lic | License |
| mem_threshold_percent | 80 | Limit of memory usage, 80% by default. Memory protection is turned on once exceeds the limit, and requests will be terminated. |
section:[Log]
| Item | Default | Description |
|---|---|---|
| level | 3 | 1: fatal, 2: error, 3: info, 4: debug |
| stdout | true | Whether to get log on standard output |
| color | true | Whether to display colors in the terminal |
| file_retain_counts | 5 | Number of log files to retain |
| log_file_size | 200 | Maximum size of single log file, 200M by default |
section:[Supervisor]
| Item | Default | Description |
|---|---|---|
| stat_period | 3 | Count QPS within 3s |
section:[BaseDB]
| Item | Default | Description |
|---|---|---|
| db_buffer_size | 512 | Size of db buffer, 512M by default is recommend |
| lru_cache | false | Whether to use lru cache |
| lru_cache_size | 5000000 | Size of lru cache |
section:[Engine]
| Item | Default | Description |
|---|---|---|
| template_max_depth | 10 | Maximum template query depth, 10 by default is recommended |
| super_node_size | 100000 | Super node size, 100000 by default means that any node with 100000 or more neighbors is regarded as super node |
| ab_timeout | 15 | AB query timeout, 15 seconds by default, which can also be set via SDK request interface |
| default_max_depth | 30 | Maximum query depth, 30 by default is recommended |
| load_direction_mode | 1 | 1: Two-direction load 2: Single-direction load (a-->b) 3: Single-direction load (b<--a) (Use the default, no need to change) |
section:[HTAP]
| Item | Default | Description |
|---|---|---|
| private_addr | 127.0.0.1:60161 | Private address, IP and port for inter-cluster communication, the port must be unique with the server port |
| conf | 127.0.0.1:60161|127.0.0.1:60061:1, 127.0.0.1:60162|127.0.0.1:60062:3, 127.0.0.1:60163|127.0.0.1:60063:1 | Cluster configuration,127.0.0.1:60161|127.0.0.1:60061:1. The first ip:port is private_addr, the second ip:port:role is public_addr:role. conf of all machines must be consistent |
| data_path | data/htap | Cluster data storage path |
| election_timeout | 5000 | Election timeout, the allowed range is 5000ms to 30000ms+. When there is heartbeat delay due to pressure, the election timeout can be turned up appropriately. |
| election_heartbeat_factor | 10 | Ratio of election timeout to heartbeat time |
| snapshot_interval | 3600 | Interval to compress log, 3600 seconds by default |
Raft normally uses odd number of nodes, such as 3, 5, 7, etc. That is because Raft can only serve when more than half of the nodes are online, and 'more than half' means N/2+1 rather than N/2. For example, it requires >=2 nodes online for 3-node cluster, and >=3 nodes online for 5-node cluster.
For clusters with an even number of nodes, Raft requires 2-node cluster's 2 nodes both online, and 4-node cluster has >=3 nodes online.
Although even number of nodes consume more hardware resources than odd number of nodes but bring worse availability, in some strong consistency scenarios, even number of nodes have more advantages in data disaster tolerance to guarantee data safety.
In Ultipa system, each GraphSet has a Raft group, and each Raft group is completely independent with different Leader and Follower.
Note: Newly created GraphSet cannot be used immediately until Raft cluster elects the Leader. The delay is determined by the configured election time.
| Role | Unreadable as Follower | Readable as Follower | Algorithm Node | Backup Node |
|---|---|---|---|---|
| 0 | Y | N | N | N |
| 1 | N | Y | N | N |
| 2 | N | N | Y | N |
| 3 | N | Y | Y | N |
| 4 | N | N | N | Y |
Notes:
To avoid simultaneous elections initiated by several nodes, Raft sets random election timeout:
Actual Election Timeout = random(election_timeout, 2*election_timeout)
Heartbeat Time = max(election_timeout/election_heartbeat_factor, 10)
Cluster log: htap-yyyymmdd-HHMMSS.log
Server log: server-yyyymmdd-HHMMSS.log
Logs begin with [HTAP] in server log are related to cluster synchronization, to view them: cat server-yyyymmdd-HHMMSS.log |grep "[HTAP]"
ultipa_cluster_cli
Please use it with extra caution!
The initial data of the node to be added must be the same with other nodes. Start the newly joined node while ensuring the data/htapp directory is empty, otherwise new data may not be synchronized.
ultipa_cluster_cli add --peer="$peer_being_added" --conf="$current_conf"
./ultipa_cluster_cli add \
--peer="127.0.0.1:60163|127.0.0.1:60063:1" \
--conf="127.0.0.1:60161|127.0.0.1:60061:1,\
127.0.0.1:60162|127.0.0.1:60062:1"
ultipa_cluster_cli remove --peer="$peer_being_removed" --conf="$current_conf"
./ultipa_cluster_cli remove \
--peer="127.0.0.1:60163|127.0.0.1:60063:1" \
--conf="127.0.0.1:60161|127.0.0.1:60061:1,\
127.0.0.1:60162|127.0.0.1:60062:1,\
127.0.0.1:60163|127.0.0.1:60063:1"
ultipa_cluster_cli change_conf --conf="$current_conf" --new_conf="$new_conf"
ultipa_cluster_cli snapshot_leader --peer="$target_peer"
ultipa_cluster_cli change_leader --peer="$target_leader" --conf="$current_conf"
When a node has data inconsistency, it has to be reset. Strictly follow the steps below to resynchronize data. Note: The following operations must be performed on the node that needs to be resynchronized!
./ultipa-server stopMonitoring page: http://private_host:private_port/raft_admin
1. Common Errors
2. If the following error occurs when restarting ultipa-server, it indicates that server may have data inconsistency due to abnormal exit during the log apply process. For data consistency, use ./ultipa-server -d -safe startup to repair data automatically.
[HTAP] IMPORTANT! xxx pre_applied_index xxx != applied_index xxx, there might be data inconsistencies.
[HTAP] Run ./ultipa-server -d -safe to safe-start the server.
If ./ultipa-server -d -safe fails to start, please strictly follow the corresponding steps to resynchronize.
[HTAP] safe-start failed! Please Resync this peer according to the document!
3. Error 'reject term_unmatched AppendEntries' occurs when restarting
It indicates that the cluster status of the node is incorrect, which may be caused by the deletion of the htap directory. Since the last prev_log_term of the node is recorded in the cluster, and local_prev_log_term becomes 0 after the data is deleted locally, data synchronization is rejected as local_prev_log_term < prev_log_term.
When there is problem with the cluster status of the node, use ultipa_cluster_cli to resynchronize.
4. Modify the configuration file and restart in order to change the listening IP or port of the node does not take effect. It is because cluster has already recorded the node's information, so you need to use ultipa_cluster_cli change_conf to update cluster information after the restart.
./ultipa_cluster_cli change_conf \
--conf="127.0.0.1:60161|127.0.0.1:60061:1,\
127.0.0.1:60162|127.0.0.1:60062:1,\
127.0.0.1:60163|127.0.0.1:60063:1" \
--new_conf="127.0.0.1:60261|127.0.0.1:60061:1,\
127.0.0.1:60262|127.0.0.1:60062:1,\
127.0.0.1:60263|127.0.0.1:60063:1"
5. Modify the configuration file and restart in order to change role does not take effect. It is because cluster has already cached this configuration, so you need to use ultipa_cluster_cli change_conf to update cluster information after the restart.
./ultipa_cluster_cli change_conf \
--conf="127.0.0.1:60161|127.0.0.1:60061:2,\
127.0.0.1:60162|127.0.0.1:60062:1,\
127.0.0.1:60163|127.0.0.1:60063:1" \
--new_conf="127.0.0.1:60161|127.0.0.1:60061:1,\
127.0.0.1:60162|127.0.0.1:60062:1,\
127.0.0.1:60163|127.0.0.1:60063:1"
6. If deletes htap directory without deleting GraphSet data directory, there will be problem when you use ultipa_cluster_cli tool to remove node and re-add node after the service is restarted. It is because data is synchronized from scratch due to the deletion of htap directory, and while data directory is not deleted, there will be data inconsistency due to duplicate execution logs! Thus, remember to delete all GraphSet data directories (including global) when you delete htap directory!
7. When neither htap directory nor data directory is deleted, data will be synchronized from the last location when node is removed and re-added by using ultipa_cluster_cli tool. It is because the cluster still remembers the location when synchronizing log last time.