Change Password

Input error
Input error
Input error
Submit

Change Nickname

Current Nickname:
Submit

ULTIPA GRAPH Benchmark Report (2022)

1.    Executive Summary

Ultipa has recently released v4.0 of its flagship graph database product, on top of its v3.0’s already world-leading performance, real-time-ness, ease-of-use.

  • Ultipa Graph Database is 10x to 10000x faster than any other graph database system in terms of query response time, data-loading speed, graph traversal capabilities.
  • Ultipa Graph is the world’s only 4th-generation graph system, leveraging its patent-pending high-density parallel computing, ultra-deep graph traversal and dynamic graph pruning technologies.
  • Ultipa is currently powering some of the world’s largest banks’ sophisticated and sea-volume data analytics and real-time decision making (anti-fraud) systems. Previously, no other systems have the capability to address customers’ challenges with speed and celerity.

This benchmark Focuses on examining the following characteristics of graph system:

  • Data Loading
  • Graph Traversal
    • K-Hop, Shortest Paths (All Paths), etc.
  • Graph Algorithms
    • PageRank, LPA, Louvain, Similarity, etc.
  • Comparison with the following systems:
    • Neo4j, Tigergraph, JanusGraph and ArangoDB

2.    Testing Bed

2.1.          Hardware Platform

The benchmark testing-bed cluster is composed of 3 cloud-based server instances with the following configurations:

Server

Configuration

CPU

Intel Xeon 16-core (32-vCPU)

Memory

256GB

Disk

1TB HDD (Cloud-based)

Network

5Gbps

2.2.          Software

Software

Description

OS

Linux

Graph Database

Ultipa Graph v4.0

Neo4j v4.0.7 Enterprise Edition

Tigergraph v3.1

JanusGraph v0.6.1

ArangoDB v3.7

Note: Benchmark results across multiple graph databases are show under 3.2.3.

2.3 Datasets

Dataset

Description

Twitter-2010

http://an.kaist.ac.kr/traces/WWW2010.html

Dataset

Twitter_rv.tar.gz

Vertices: 41.6M

Edges: 14.7B (1470M)

Data Modeling

Extend the dataset to allow vertices and edges to have attributes, for instance, while running PageRank/LPA/Louvain graph algorithms, results can be written back to the vertices as attributes, which can be updated from time to time. On the other hand, edge attributes can be created to

 

3.    Functional Testing

3.1.         Summary of Functional Testing

Testing Items

Testing Standards

Ultipa Results

Installation

The total time to have the graph database system deployed (the installation phase)

~30 min

Extensibility

Support of distributed architecture, data partitioning, horizontal and vertical scalability.

HTAP Distributed Architecture, scalable both horizontally and vertically

Graph Update

Graph modeling update can happen without suspense or shutdown of services, including real-time updates to vertices and edges.

Online update to vertices/edges, changes are reflected instantly to queries or algorithms’ results.

Data Loading

Support of batch or stream type of data loading; support of delimited text (i.e., CSV) or JSON format ingestion; support of stop-n-resume.

Support

Query Language

Natively supports graph query language.

Powerful UQL (Ultipa Query Language), easy-to-learn and easy-to-use. Can be tech and business personnel oriented at the same time.

High-concurrency Query

Ability to execute sophisticated graph queries in a highly concurrent fashion.

Support

Influence Algorithm

Support of LPA, PageRank graph algorithms

Support

Community Detection Algorithms

Support of WCC/SCC, LPA and Louvain algorithms

Support

Graph Interaction

Support of meta-data interaction, modification, display, highlight, expansion, etc.

Support

Management & Monitoring

Support of system run-time monitoring and management, such as CPU, RAM, Disk, networks, etc.

Support

Log Management

Detailed logging mechanism.

Support

Graphsets/Graphs

Support of multiple graphsets, sharing of nodes/edges across multiple graphs.

Support

Privilege Management

User privilege, access-control mechanism.

Support

Backup & Restore

Online backup and restoration support.

Support

High Availability

Does the system support HA setup?

Support

Disaster Recovery

Does the system support multi-city disaster recovery?

Support

 

3.2.        Performance Testing

3.2.1. Summary of Ultipa’s Performance Testing

Testing Items

Testing Standard

Ultipa Testing Results

Data Loading

Ingesting 100% of the data of testing dataset, measure the total time

520 seconds

Storage Size

Loaded data size versus raw data size

1.3

1-Hop Query

Given a seed file with multiple vertices, check each vertex’s total number of 1-hop neighbors, log average execution time.

0.00062 second

(Average Time)

2-Hop Query

Given a seed file with multiple vertices, check each vertex’s total number of 2-hop neighbors, log average execution time.

0.027 second

(Average Time)

 

3-Hop Query

Given a seed file with multiple vertices, check each vertex’s total number of 3-hop neighbors, log average execution time.

0.520 second

(Average Time)

6-Hop

Given a seed file with multiple vertices, check each vertex’s total number of 6-hop neighbors, log average execution time.

1.408 second

(Average Time)

23-Hop

Given a seed file with multiple vertices, check each vertex’s total number of 23-hop neighbors, log average execution time.

1.295 second

(Average Time)

Shortest-Path

Given any random pair of vertices, count their total number of shortest paths, mark the calculation time.

0.18 second

(Average Time)

Topology-Change & Query Results Change

Change the topology of the dataset, examine query results change (i.e., K-hop results of a particularly affected vertex).

In Real-time.

Jaccard Similarity

Calculate and return Top-10 vertices based on Jaccard Similarity against 10 nodes, log average execution time.

4.99s

(Average Time)

Page Rank

PageRank algorithm. Each algorithm is run multiple times, log the average time.

23s

(Average Time)

LPA

Label Propagation Algorithm. Each algorithm is run multiple times, log the average time.

80s

 

Louvain

Louvain community detection algorithm. Each algorithm is run multiple times, log the average time.

210s

 

3.2.2. Itemized Performance Testing

3.2.2.1. Data Loading

Testing Purpose: Loading the entire dataset into graph database and start providing services. This test can show how fast a graph system ingest large-volume of data.

Ultipa Testing Results:

Testing Item

Description

# of Vertices

# of Edges

Loading Time

Data Loading

Mark the total time for the entire dataset is loaded and system up-n-running.

41652330

1468365182

520 second

 

Testing Results by All Systems:

Dataset

Twitter-2010

Graph System

Ultipa

TigerGraph

Neo4j

JanusGraph

ArangoDB

Loading Time(second)

520

1550

3120

20800

32200

Time Normalized

1

3

6

40

62

Storage Size

30GB

12GB

55GB

56GB

128GB

Size Normalized

2.5

1

4.58

4.67

10.67

3.2.2.2 K-Hop Neighbor Querying

Testing Purpose:K-hop query is a fundamental type of query with any graph system. It allows user to quickly identify the scope of impact of an entity (the subject vertex). K-hop must be implemented using BFS (Breadth-First-Search) method, and there are a few caveats:

  • There are usually 2 ways of defining K-Hop and the results are different. One way is all neighbors from the 1st hop all the way to the Kth hop, the other way is the neighbors exactly Kth-Hop away from the starting vertex.
  • If a vertex appears on the Nth Hop of a specific vertex, it will NOT appear on any other hop. While executing the algorithm, it’s pivotal to conduct de-duplication of vertices across different hops, otherwise, the results would be wrong.
  • # of a vertex’s immediate neighbors are NOT the same as the # of edges, because a pair of connecting vertices may have 2 edges connecting them bi-directionally.

Testing Results:

Note: The below testing results pertaining to Ultipa system log the # of vertices on the Nth hop. For K-hop query, this is slightly more time consuming than other systems calculating only the total number of vertices from 1st all the way up to the Kth hop.

Testing Item

Item Description

Vertex ID

# of Neighbors at Nth Hop

Execution Time (in ms)

Average Execution Time (in ms)

1-Hop

For each vertex, query for the # of vertices that are exactly 1-hop away, log execution time.

20727483

973

0.509ms

0.62ms

50329304

4746

0.815

26199460

19954

0.868

1177521

4272

0.777

27960125

7

0.484

30440025

3386

0.733

15833920

2181

0.505

15015183

2279

0.492

33153097

116

0.491

21250581

37

0.536

2-Hop

For each vertex, query for the # of vertices that are 2-hop away, log execution time.

20727483

2874132

5.823ms

27.33ms

50329304

2714697

5.606

26199460

7818213

69.953

1177521

19318826

70.281

27960125

533108

1.983

30440025

11294131

21.967

15833920

7858255

22.746

15015183

5933114

27.222

33153097

12181518

25.157

21250581

11063971

22.537

3-Hop

For each vertex, query for the # of vertices that are 3-hop away, log execution time.

20727483

27206363

400.011ms

520ms

50329304

29939223

407.992

26199460

31324330

526.919

1177521

17139727

741.438

27960125

20280156

101.114

30440025

23120607

748.711

15833920

26263805

649.454

15015183

27727527

475.516

33153097

20512136

507.208

21250581

20804932

641.811

6-Hop

For each vertex, query for the # of vertices that are 6-hop away, log execution time.

20727483

10028

1213.226ms

1408.28ms

50329304

9052

1510.373

26199460

3022

1372.880

1177521

3101

1220.169

27960125

25838

1576.181

30440025

5437

1659.473

15833920

5618

1210.724

15015183

6197

1320.371

33153097

6033

1229.563

21250581

6738

1769.825

23-Hop

For each vertex, query for the # of vertices that are 23-hop away, log execution time.

15738828

3

1203.212ms

1295.19ms

9358362

1

1279.990

9358352

1

1529.716

17571568

1

1167.854

Ultipa Query Language:

khop().src({_id == Vertex_ID}).depth(#_of_hop).boost() as n return count(n)

 

Testing Results by All Systems:

Dataset

Twitter-2010

Graph System

Ultipa

TigerGraph

Neo4j

JanusGraph

ArangoDB

1-Hop (second)

0.00062

0.024

0.2

0.39

1.667

2-Hop (second)

0.055

0.46

18.3

27.7

28.9

3-Hop (second)

0.52

6.6

290

4300

3888

6-Hop (second)

1.82

62.5

N/A

N/A

N/A

23-Hop (second)

1.92

N/A

N/A

N/A

N/A

3.2.2.3 Shortest Paths

Testing Purpose: Finding shortest paths is a fundamental type of graph query. Like K-hop query, it is also implanted using BFS method, namely breadth-first search. The caveat with Shortest Path finding is that there are usually multiple paths instead of one, some graph systems can NOT return the correct number of paths due to faulty implementation. It’s often more challenging to query for shortest paths on large dataset like Twitter, therefore it’s meaningful to examine how swiftly a graph system can run this query.

Testing Results: Find all shortest paths between any pair of nodes and log the execution time and total number of paths.

Testing Item

Item Description

Starting Vertex

End Vertex

# of Shortest Paths

Execution Time

Average Execution Time

Find All Shortest Paths between Random Pairs of Nodes

For any pair of nodes, query for all the shortest paths between them, log the execution time and total # of paths.

50329304

21613682

25389

0.152s

0.18s

49449489

15645246

2080710

2.484s

39687658

47978366

46

0.053s

15489748

38548456

8

0.017s

17524616

29884615

5

2.22s

33033471

37029346

273

0.018

37048837

17555248

1114

0.01s

26895497

37893465

2

0.004s

26468497

37164965

6

0.002s

54151557

37190965

30837

0.05s

Ultipa Query Language

ab().src({_id == Starting_Node_ID}).dest({_id == Ending_Node_ID}).depth(6).shortest().boost() as paths return count(paths)

Testing Results by All Systems:

Dataset

Twitter-2010

Graph Database

Ultipa

TigerGraph

Neo4j

JanusGraph

ArangoDB

All Shortest Paths

 (in seconds)

0.18

9.2

Note: Some ≥6-hop paths can’t return in 10-min.

Can’t return any ≥3-Hop paths.

Can’t return any ≥3-Hop paths.

Can’t return # of paths correctly (only 1 path found)

 

3.2.2.4 PageRank

Testing Purpose: Iterative algorithm like PageRank can be used to validate

Testing Results: It is imperative that PageRank algorithm should be conducted in an iterative fashion that traverses all nodes and edges. Some graph databases, by default, do this wrongly. Neo4j is one such example, it allows running the algorithm on limited number of vertices instead of on ALL the vertices. On the other hand, returning results in an ordered way is useful in determining the top-ranked pages (nodes or entities), unfortunately, many graph systems do NOT support result-ranking natively.

Testing Item

Item Description

Ultipa Testing Results

Top-10 Highest Ranked Vertices

Execution Time (in seconds)

Impact Analysis

Run PageRank algorithm for 10 iterations, with damping-0.8, return the top-10 highest ranked vertices and their rankings, log the total execution time.

813286,14276.1

14224719,12240

31567254,10288.6

15131310,9860.32

16049683,8546.38

7040932,6775.97

14075928,6672.9

12687952,5345.58

5380672,5021.32

26784273,2886.91

23s

Ultipa Query Language:

algo(page_rank).params({order:’desc’,init_value:0.2,loop_num:10,damping:0.8,limit:10}).write({file:{filename:”my_pagerank_result”}})

Note: The chart below shows how Ultipa compares with Tigergraph and other systems. Please note that ONLY Ultipa returns the PageRank results in an ordered way, which is more time consuming. The systems that do not carry a result couldn’t finish the algorithm.

3.2.2.5 Community Detection

Testing Purpose: LPA and Louvain are the two commonly used community detection algorithms. It’s useful to verify how fast and swiftly can a graph database system finish both algorithms on a relatively large dataset such as Twitter with over 1.5 billion nodes and edges.

Testing Results:

Testing Item

Item Description

# of Communities

Execution Time       (in seconds)

LPA

Launch the LPA algorithm, set iterations=10, print out # of communities and total execution time.

116602

80s

Louvain

Launch the Louvain Community Detection algorithm, set iterations=10, modularity=0.0001, print out # of communities and total execution time.

1207

210s

Ultipa Query Language:

algo(lpa).params({loop_num:10}).write()

algo(louvain).params({phase_1_loop_num:10,min_modulatry_increase:0.01}).write

 

Testing Results by All Systems:

Dataset

Twitter(2010): 41.6M, 1470M, 24.6GB

Graph DB

Ultipa

TigerGraph

Neo4j

JanusGraph

ArangoDB

PageRank

(Seconds)

23

258

 

600

N/A

Unfinished

N/A

Unfinished

LPA

(Seconds)

80

900

N/A

Unfinished

N/A

Unfinished

N/A

Unfinished

Louvain

(Seconds)

402

N/A

Not-Tested

N/A

Can’t finish projection in 30-min

N/A

Not-Tested

N/A

Not-Tested

Not: Due to resource or feature limitations, the following tests were only conducted against Ultipa Graph v4.0 but not on other graph database systems.

 

3.2.2.6 Update Dataset and Compare Query Results

Testing Purpose: To validate if the graph database system has the ability to update data contents and allow for accurate data querying in real-time. It is not uncommon that some graph systems use caching mechanism to pre-calculate and store-aside results which do NOT change accordingly after the data contents are change. In this testing item, we create a new edge connecting the provided starting vertex with one of its 3rd-hop neighboring vertex, and compare the 1-to-3-to-6 Hop results before and after the edge is inserted.

Testing Results:

Testing Item

Item Description

New-Edge Starting Vertex ID

New-Edge Ending Vertex ID

Query Depth

K-Hop Before Edge Adding

K-Hop After Edge Adding

Update the dataset, do queries against it twice to compare results

Add a new edge between the starting vertex to one of its 3rd-hop neighboring vertex, and check its 1-hop,3-hop, 6-hop results before and after the edge-adding event

20727483

28843543

1

973

974

3

27206363

27210397

6

10028

10027

50329304

21378173

1

4746

4747

3

29939223

29939314

6

9052

9052

26199460

32278263

1

19954

19955

3

31324330

31324333

6

3022

3022

1177521

6676222

1

4272

4273

3

17139727

17139725

6

3101

3101

27960125

48271231

1

7

8

3

20280156

20283107

6

25838

25836

30440025

38232241

1

3386

3387

3

23120607

23121930

6

5437

5431

 

3.2.2.7 Similarity Algorithm

Testing Item

Testing Item Description

Provided ID of Starting Vertex

Execution Time

Average Time in Seconds

Similarity

Given a vertex, query for Top-10 vertices that are most similar using Jaccard Similarity.

20727483

1.586s

4.99s

50329304

2.003s

26199460

10.573s

1177521

10.52s

27960125

0.359s

30440025

2.865s

15833920

5.042s

15015183

5.283s

33153097

5.201s

21250581

4.525s

Ultipa Query Language:

find().nodes({_id == Provided_ID}) as n with n

algo(jaccard).params({ids:[n],top_limit:10}) as r return r

 

 

Want to read more?