Fraud detection is a very common scenario in many industries like banking and financial services sectors, telco carriers, energy grid, or any Internet services industry. In this use case, we’ll be introducing a real-time call fraud detection in a large-scale call network – any call that’s taking place is monitored and graphically calculated in real-time for its potential level of fraudulence, and follow up actions can be taken to mark the caller and to protect the recipient of the call by alerting him/her.
Call Fraud Detection in Telco Carrier Network
There are two input sources from the telco network that are directly call-record related, these data will be used for graph computing:
- Call Event: caller ID, callee ID, call Time
- Call Detail Record (CDR): Caller ID, callee ID, call Time, and Call Duration (in seconds).
Additional information (data sources) may include the following:
- Blacklist of callers; Known list of spammers (these resources may be taken from the Internet sources).
- Attributes to be calculated for each call event/CDR, and etc.
The following logics are implemented and executed in real-time by Ultipa Graph:
- White-listed call Identification:
- Stable call group
- In-group connections
- High % of call-back
- 3-hop friend relationship (A-B-C-D-A, A to D is 3-hop)
- Black-listed call Identification:
- No (empty) stable call group
- High % of rejected calls
- Friend relationship is longer than 3-hop
- No call-back phone call.
- Most calls are short-duration (hang-up < 10 seconds)
The above bullet points can be 100% implemented by solely relying on graph embedded features, however, by working with machine learning algorithms and models, there are possibilities to continue to improve the overall efficiency of the anti-fraud system.
The above scenario is greatly simplified to illustrate a viable path that how telco-carrier can utilize Ultipa Graph to detect fraudulent phone calls and notify users in real-time so that to avoid potential financial losses.
Supply Chain Management
A Typical Commodity Supply Chain Network (Graph)
The problem is relatively easy to describe: In a large supply chain network, a supplier in South Africa is having a problem supplying certain parts for certain types of cell phones, how would that affect the phone manufacturers in Shenzhen, China?
This is a typical deep network analytics, entity-link analysis, problem, and by using real-time capable Ultipa Graph, the following questions will be addressed:
- How long does it take for the ripple effect to reach the manufacturers in Shenzhen?
- To what scale will the business in Shenzhen be affected?
- Are there any alternative suppliers that can provide the parts?
- If the phone manufacturers can't provide the phone, what other manufacturers can fit in?
- Other questions may arise out of the network-wide analysis.
The same scenario can be expanded to analyze:
- A down node in a large network will affect how many other nodes or network segments, and to alert the network administrator to take remedies or precautions to deal with such glitches.
- Inventory management: Most inventory information is managed as a tree (data structure), which is a peculiar type of graph, and one component change has its effect-scope which can be easily calculated in the graph.
Log analysis has traditionally been a typical task undertaken by Hadoop or Spark since the invention of Hadoop in 2006 (and Spark in 2012). It may sound weird (an over-kill case?) if you use graph database to handle such a formidable volume of data in memory. Because log data tends to be time-series centric, if you would want to define a series of activities as a pattern and to match this pattern across a certain period of data, all pattern-matched data can be collected for further analysis, this is a typical graph-analytics effort!
One such example is this:
- A user logs into a session on a certain device ID
- Next, this user inserts an external/removable disk
- 3rd, this user copies some files over.
The above 3-step forms a "suspicious" pattern according to a company policy or high-risk behavior pattern, and from the IT admin's perspective, such activities logged in log-file and modeled in the graph can form a sub-graph containing follow nodes/edges:
- Computer (device-id)
- User (user-id)
- User Activity (log-in w/ session-id and timestamp)
- User Activity (disk-insertion)
- User Activity (file-copying)
- Disk Insertion follows Log-in
- File Copying follows Disk Insertion.
The pattern to be found against all the user sessions and activities in a certain period of time is as easy as this:
- Finding a ring that has the user doing disk-insertion followed by file-copying, the minimum of 3 nodes forming a triangle (if you visualize that!) is a typical pattern-matched.
- This is a typical template-based pattern-matching graph query, assuming that the graph dataset is constantly updated with real-time data ingestion of monitored user activities from the log system.
Please refer to other Ultipa Graph documents for insights, namely:
- UltipaQuery Language Handbook v2.x
- Ultipa Manager Handbook v2.x
- The Evolution of Graph Data Structures (03/2020)
- The Specific Evaluation of Graph Query Languages (03/202)
Real-time Fraud Detection in Insurance (Left Flow)
On-demand Personalized Insurance Recommendation (Right Flow)
In the above diagram, there are two scenarios illustrated, they are:
- Real-time Fraud Detection (Payout Validation)
- Real-time On-demand Personalized Insurance
During the insurance pay-out process, a key thing to validate is to check the possibility of this case being a fraud one, so fraud detection is a must-have. Traditionally this can be a lengthy offline process that takes some agent quite some time (hours if not days or weeks) to process the case by looking into various documents, supporting materials, and with the advent of digital transformation, lots of contents are being structured and loaded into siloed systems, and these siloed systems (mostly are databases) can subjugate data into Ultipa Graph for network analytics:
- Calculating in the graph for the case holder's connection with known blacklist entities --- factors to consider including, but not limited to, shorter distance means higher risk of fraud (long-distance or no-connection at all means lower to no risk of being a fraud case), forming a ring or sharing something in common with known suspicious or blacklisted accounts.
Another scenario that has great monetization potential is for personalized insurance, a user submitting a request for on-demand insurance, very similar to the process of fin-tech online load application, only that insurance case may be more complicated with the concept of EMR (Electronic Medical Record) and EHR (H stands for Health), and this also constitutes a network-analytics case, and graph database is naturally built to facilitate such kind of inter-data-entity connections search problem:
- Given a person's data such as basic info, EMR, EHR, family contacts, income, and other factors to model everything into a dynamic graph and to calculate the risk and benefits of offering assorted medical insurance (i.e., Major Disease Insurance Coverage that covers specific types of diseases), by assigning weights to each relationship that are pertinent to the overall weight contributing to the final recommendations, it's possible to subjugate the results in real-time.