Change Password

Input error
Input error
Input error
Submit

High Density Parallel Computing

The so-called High-Density Parallel Graph Computing can make multi-CPU, multi-core, and multi-thread fully concurrent, which is not implemented by each graph system. Take Neo4j, an established graph database manufacturer, as an example, its typical enterprise-level deployment (top-of-the-range) uses three 8-core-CPU instances. There is always one instance participating in graph computing while the other two are performing hot-backup. The maximum concurrent capacity of the system is 4 threads, that is to say, each graph operation has a maximum concurrency of 400%. Another example is graph computing using Python NetworkX, which only achieves a 100% of CPU utilization for the sake of its single-thread implementation.

On the contrary, the concurrency capability is linearly scalable with the underlying hardware in the Ultipa Graph System, that is, a 32vCPU deployment yields a concurrency of 3200%, and 6400% for that of a 64vCPU, and so on! There are two instant benefits of this linear scalability and high-density parallel computing:

  • Exponential increment of the overall system performance and throughput: 6400% vs 100%;
  • The improvement of resource utilization: gives full play to the parallel ability of modern CPU.

It is introduced in another article, The Evolution of Graph Data Structure, the feature of this high-density parallel computing of Ultipa Graph System, without which it is impossible to achieve exponential system performance improvement.

The parallel computing of CPU relies on the underlying concurrent data structure and yields the capability of deep graph traversal and dynamic pruning. The combination of these two capabilities empowers many business scenarios. For example, sequential path calculation, in-depth correlation analysis, K-Hop influence evaluation, etc., and these are all basic functions widely used in scenarios such as risk control, anti-fraud, precision marketing, collaborative filtering, NLP customer service robot, supply chain network, etc.

Many graph operations, such as whole-graph analysis, rely on graph algorithms, of which a typical feature is to traverse the whole graph repeatedly and recursively, with a huge amount of computation. However, those original graph algorithms published by the academic communities are mostly done in series (such as the Louvain community detection algorithm) and are far to serve in the industrial environment if not being modified and upgraded into the parallel pattern.

A comparison shows the difference: it takes 10 hours to complete Louvain community detection with Python NetworkX on a million-metadata graph, and 3 hours with Huawei's graph system, while it is completed in pure real-time —— in 1 second —— with Ultipa Graph system. This performance gap is of tens of thousands of times and stems from whether the high-density parallel native graph computing is introduced.

Want to read more?