Knowledge graph is a term that's populated by Google as the search engine giant was building and improving the world's largest search infrastructures. The rationale behind Google is simple -- every search is driven by an intention, the keyword being searched is the surface of such intention, but the results returned and orchestrated by the search engine can be elevated with the help of knowledge graph instead of a mere page-rank of the matching URLs (of course the whole ranking thing can be further complicated with AdWords and things like that... and in the case of Baidu, the whole ranking system can be scrambled so much by factors like which advertiser is paying more for the top spots and user experience can be further damaged consequentially).
A GP-LP-Investment-Competition Knowledge Graph Web GUI
Building a general knowledge graph takes a lot of time, effort and ability to structure the knowledge system, and to visualize it and compute within the graph, especially when it grows super large, the challenge for deep graph search is simply unthinkable. Note that: PageRank is considered highly distributed yet very shallow (1-to-2 hop deep) graph computing.
Given all the aforementioned challenges, most KG players opt to build knowledge graphs in vertical domains. The above diagram shows an investment-n-competition KG, where 5 types of nodes are correlated, they are: GP investors, LP investors, projects (invested companies), company executives, institutions/universities/firms (that executives or investors have worked for or studied at). Such a relatively simple knowledge graph can be very effective in helping analysts, investors or entrepreneurs to make social connections or investment decision makings. Without a KG system, you would have to either use many sheets of complex excel to track all those relationships or use a bespoke system which is both costly to build and hard to maintain.
An Interactive Knowledge Graph – For Industry Investment Research
The above screenshots illustrate how user can interact with the KG with ease, you can edit, expand, categorize, filter, collapse, hide or show any component on the graph. This helps user to have a panoramic view while getting to know the entity being focused on. One unique feature with graph system is that, you can recursively work on it – for instance to expand from a node for multiple hops to understand its radius-of-influence (or vice versa).
If this were to be broadened into a general-purpose KG scenario, and the entity is a person, it may have categorized relationships like friendship, kinship, enemies, achievements, places he had been to, other people's comments for him, things he had invented, which organizations he belongs to, books he had written (and you can click and follow the URL to read it directly...).
And, if you care to know if there are any "butterfly effect" kind of connections between a person and another person, or in between any two events or things, you can do so easily as well. If this inspires you to expand further, a dedicated feature offered by Ultipa Graph called "AB Path" can allow you to find all paths (in the graph) in real-time.
The AB-Path feature allows you to find connections between any pair of entities in real-time, no matter how deep the graph engine has to search in the graph. Depending on your specific requirement, the search results can range from shortest-path(s) to specific-depth paths or to any number of entities forming an automated graph network, i.e., from entity (node) A to entity B, C, D, and E to form a network that connecting these 5 entities and all possible shortest path between any two entities, the minimal paths to find is a C(5,2) problem --- which is equal to 10 paths to be searched for, and if the nodes are 1000, the paths to be searched brutally for is as many as: 5 million (1000 * 999 / 2).
There are many other recursive-graph-search use cases with need for real-time-ness:
- Finding paths that follow certain filtering rules, edge weight or having to contain certain nodes or edge types, and etc.
- Finding communities, patterns or other specific things within a graph.
- Finding all or specific neighbors for a certain node with some filtering rules.
- Finding nodes or edges with similar behavior within a certain period of time
These listed scenarios are to solicit the readers to think: a knowledge graph has to be transcended (boosted) to unite with some sort of real-time graph computing engine, or graph database to have the ability to deep traverse the graph. Without real-time computing, knowledge graph is a mere large set of data that claims with OLAP capabilities, but nothing is doable online, ironically.
In our humble opinion, OLAP (short for Online Analytics Processing) is quite misleading, it really should be called OffLine Analytics, because very few operations atop it is real-time. HTAP (Hybrid Transaction & Analytics Processing) is the way to go. Ultipa Graph is dedicated to bringing many OLAP types of operations to real-time (truly online) by accelerating many operations by several orders of magnitude, therefore effectively turning OLAP into OLTP (Let’s put aside ACID/CAP debate aside for now, which warrants an expanded and separate discussion).
Real-time Network (Subgraph) Formation – Search Depth:6
Knowledge Graph is a powerful tool for enterprise data (meta-data to be precise) management, it facilitates decision making through strong visualization and interactions for heightened white-box user experience, meaning, if you put all your network or service components into the graph, you have the potential ability to fully manage it visually in a graph, and that ease the life of business professionals and system administrators like never before.
A Typical AML Mapping – Four Customers Forming A Ring
Money-Laundering is an international problem. It's one of the most popular ways for organized crime rings to "white-wash dirty money" by circulating it in the vast currency flow. And all governments are seeking ways to identify those "dirty money flow".
Graph is generally considered the most natural way to express the money flow mathematically and graph-theoretically. Especially if the money flows through multiple hops, the criminals are orchestrating things in a highly graphical way, so the best tools law enforcement around the globe should utilize are powerful graph systems -- and Ultipa Graph is built to support this kind of AML scenarios.
Typical AML goes like this:
- Symptom: Starting from multiple accounts (sender accounts), transferring money in steps (hops if you view the problem as in a graph) to another set of accounts (recipients accounts). Of course, there are naïve-form of AML that start transferring money from the original account, and after multiple intermediate accounts, receiving money back into the originating account or closely affiliated accounts, this makes the entire transaction life-cycle a ring (or loop or an AML ring).
- Characteristics: The amount of transferred money accounts for high percentage of account balance; The accounts involved may have shared IP-address(es), device ids or other commonalities (forming a ring or rings, essentially).
Because there may have millions of accounts, most of which are actually legal accounts, and to filter them out as quickly as possible is a key to make sure only suspicious AML-related accounts are focused on, and this "noise filtered" process is based on many factors, such as recent activities, tags, relationship with known problematic/suspicious accounts, activity pattern matchings, and as soon as the sender accounts are identified, the search within the graph can be downstream and upstream (two-way graph search/traversal), and the end of the traversal is to find the accounts that consume (receive) all the money.
Real-time AML Pattern – Deep Ring/Circle Finding w/ Ultipa Graph
The above diagram shows that an account has been using many layers of intermediate accounts to transfer money away from his account, but eventually collecting most parts of the money back (minus fees and other costs). The resulting subgraph is a multi-hop ring(circle) – it may not be illegal but worth investigation.
Real-time AML Pattern – Deep Traversal w/ Ultipa Graph
The caveat here is that the traversal in the graph may be very deep, sometimes over 10-hop, this is impossible with conventional graph databases such as Spark, Neo4J or Janus Graph (Note the immediate above diagram, the data flows do NOT converge until 10-hop deep, if you can only search 5 or 8 hop deep, you are NOT going to see any problem), and this needs to be done in real-time with extremely high concurrency, exponentially higher throughput. After all, speed is everything in the world of business -- every second wasted on not being able to identify money-laundering accounts means loss of money.
Ultipa is designed and built to handle AML in pure real-time with its patented and proprietary highly parallel computing engine so that you can relax with the idea of traditionally naïve AML solutions that are not able to perform deep data correlations, Ultipa can search extremely deep (deeper than regulations’ suggested depth) and with superior concurrency and a lower TCO. It’s for the next-gen AML IT infrastructure and solution platform.
To understand the value of graph system powered recommendation solution, you have to know the status and problem with existing/traditional recommendation systems, most of which share the following characteristics:
- Traditional system tends to have multiple models, varied and not-unified.
- Requires pre-calculation work (therefore not exactly real-time oriented)
- The latency for recommendation refreshing can be for hours if not days.
- Lots of redundant data are generated (waste of storage and cost on storage is high!)
- Client-side (app or browser) consolidation work are necessary (this makes coding on the client-side challenging)
Real-time Smart Recommendation w/ Ultipa Graph & Knowledge Graph Frameworks
The above diagram illustrates and visualizes (in a knowledge graph) how the graph-powered recommendation system works:
- User A browses (favorites, adds-to-cart or purchases) Product A;
- Product A is also browsed (or purchased) by Users B, C, D...
- User B and C also browse other products like Product B.
- User C and D also browse Product D and other products.
- By aggregating and ranking these data, you have candidate products B and D.
- Now, looking into other factors like how Product B or D is related (close-ness in the graph) to Product A, this can be expanded into a more sophisticated and very human-decision-making and reasoningframework, in this demo, we are using a very simplistic way to make the decision: Product D and Product A shares certain attribute (note their type can be different -- so that we don't recommend a refrigerator after you have just purchased a refrigerator, which is both dumb and annoying to the customer) -- in this case, Product A is a camera, Product D is a camera accessory kit, which may be something that the customer is looking for right after having his camera.
In Ultipa Graph, realizing CF (Collaborative Filtering) is both easy and fast, it does NOT require any data training, nor does it contain any black-box components, the whole process is very much black-box oriented, highly explainable. It is because that so-called CF is very much graph-oriented, allow me to list the logical steps next.
Template-based Graph Query for Collaborative Filtering
CF steps, the thinking is reversed on the graph:
- Starting from a user, find all products (viewed, added, or purchased)
- Find all users that have had actions on these products
- Find all other products that are taken actions by users in Step 2.
These 3 steps can be done in one amazingly simple uQL:
Of course, real-world CF usually has a lot of bells and whistles – you can fine-tune the above template-based graph query to narrow down your list of merchandise recommendations.
On Ultipa Graph, there are a few ways to empower this process, and they are categorized as:
- CRBR, short for Community Recognition Based Recommendation
- GEBR, short for Graph Embedding Based Recommendation
CRBR is comparable to CF, but is much more agile, efficient, and faster in terms of overall recommendation system productivity and quality. The core concept is outlined here:
- Louvain community detection against all merchandise (product)
- Grouping customers by varied behaviors (i.e., visit, cart, or transaction)
- Locating most popular #louvain groups, but excluding those associated with super merchandise such as towel papers or bottled waters during an epidemic like CoVID-19
- The above 3 steps are comparable to work done on the training dataset, in graph, this can be seen as a sub-graph within a holistic graph, whereas, the other part of the graph is the actual real-time self-refreshing sub-graph that contains new user and merchandise activities. Having said this, now, if any user generates any new behavior, time to analyze and push for recommendation:
- A user with lower activity can be assigned a higher-ranking score (this seemingly counter-intuitive logic is actually simple: in the same community, a less-active user’s action is more meaningful than a hyper-active user because the latter’s activities are too broad therefore less interesting to lure less-active users’
- The above logic can be further finetuned with item and user properties such as item related category, environment, time and location information, and user’s gender, age, preference and other relevant behaviors.
The below diagrams show how steps 1-6 were performed in an a-la-carte fashion:
Highly Parallel and Accelerated Graph Tasks (Near Real-Time Louvain!)
Pay attention to the Louvain task execution time, on a reasonably large graph with multi-million nodes and edges, the process may otherwise take hours to complete using Python and other graph systems, but it’s real-time with Ultipa Graph! This is a performance improvement of 1,000 times or more! We achieve this through algorithm and data-structure parallelization, on top of that, we apply a very powerful yet intuitive Ultipa Manager to give user visual cues – it’s an important step to enable white-box explainable AI – given graph operations’ natural explainability and definitive characteristics (See below diagram for Louvain visualization, please note we extracted 5,000 nodes out of the full graph dataset to form connectivity networks in real-time, the whole process is traditionally very complicated, time consuming and compute-resource demanding! On Ultipa Graph, this is with ease and lightning-fast, period.).
Of course, all the complexities and burdening computing optimization are undertaken by the superior Ultipa Graph Engine, period again.
Louvain-based Community Detection/Recognition & Visualization
The core concept of CRBR is to first identify all the communities encompassing all the items and users, based on that, plus other additional information which can be leveraged for finetuning the recommendation scheme, users can be further categorized and identified. Doing a real-time recommendation or a rolling recommendation is as simple as running a template-based path search, so that a new activity by a user, or a group of activities by a group of users can be collected in real-time and recommended to other users. Of course, the rationale can be debated as to which logic is more superior and which isn’t. The key concept here is that:
- It’s super easy to adjust the templated queries on the graph with Ultipa.
- It’s fast, many times faster than, say, Spark-based CF infrastructure.
- It’s always explainable. No more black-box AI, please.
CRBR isn’t the only way that Ultipa Graph can handle as a way of recommendation. GEBR, for instance, is another way, which utilizes and empowers deep-learning on the graph, it could achieve equally great recall/precision ratio, but there is a caveat, that is the gray-box (or black-box) part of the operations involving random walk, which is a necessary step for operations like node2vec, word2vec or struct2vec. This still is a new area for graph deep learnings, and we are actively developing the graph-centric foundations to empower such needs. In a separate article, titled XAI and Graph Learning, authors Ricky Sun and Victor Wang outlined how graph computing can ante-up AI with efficiency and clarity. This is an ongoing path that we are enthusiastically pursuing.
GEBR: Recommendation based on less-active users in the community (illustrated)
Graph-based Recommendation Solution has the following advantages:
- Real-time recommendation made possible.
- Previous bullet-point implies that real-time data refreshing is also made possible.
- Working with Knowledge Graph, such as Merchandise Knowledge Graph, the recommendation is very much human-like -- 100% intelligent, instead of relying on pure aggregated statistical data results!
- Recommendation Graph = Real-time Merchandise Graph + Customer 360-degree Graph, it offers unified all-in-one recommendation solution.
If you do NOT yet have an existing recommendation system that's built on top of Hadoop/Spark frameworks and bunch of collaborative filtering mechanism that you have to rely on lots of user behavior data, lengthy period of data training, opportunities to adopt graph-powered solution makes every sense, welcome to the rapidly evolving world of IT -- graph is on the way to become main-stream.