A Long Background
History of Graph Theory
Graph database has traditionally been considered a sub-type of NoSQL database (in contrast to SQL-centric database, also known as relational database which has been the dominant type of database since the late 1980s, and is still very popular amongst enterprise IT environment, large or small). Its core concept is based on Graph Theory that's best known for the Seven Bridges of Königsberg problem that's published by the world-renown mathematician Leonhard Euler in 1735-1736. Of course, Graph database didn't start taking its shapes and forms until the recent 10 years and nearly about 40 years after the invention of the Internet, not to mention the appearance of the modern-sense computer.
A lot of Graph algorithms have been invented, from the very famous Dijkstra's algorithm (1956 - shortest path problem in graph) to PageRank that's invented by the co-founder of Google in the late 1990s, and to Louvain Modularity (detecting communities in a graph).
In a sense, a lot of today's greatest Internet companies are built on top of graph technologies, to name a few:
- Google: PageRank is a large-scale web-page (or URL if you will) ranking algorithm.
- Facebook: The core feature of Facebook is its Social Graph, and the last thing that it will ever open-source will be it. It's all about Friends-of-Friends-of-Friends, and if you have heard of the Six-Degree-of-Separation theory, yes, Facebook builds a huge network of friends, and for any two people to connect, the hop in between won't be exceeding 5 or 6.
- Twitter: Twitter is the American (or world-wide) edition of Chinese Weibo (and you can say the same thing that Weibo is the Chinese edition of Twitter), it ever open-sourced FlockDB in 2014, but soon abandoned it on Github. The reason is simple, though most of you open-source aficionados find it difficult to digest, that is, graph is the backbone of Twitter's core business, and open-source it simply makes no business sense!
- LinkedIn: LinkedIn is a professional social network, one of the core social features it provides is to recommend a professional that's either 2 or 3-hop away from you, and this is only made possible by powering the recommendation using a Graph engine (or database).
- Goldman Sachs: If you recall the last world-wide financial crisis in 2007-2008, Lehman Brothers went bankruptcy, and the initial lead was Goldman Sachs withdrawing deals with Lehman Brothers. The reason for the withdrawal was that Goldman employs a powerful Graph DB system – SecDB, which was able to calculate and predict the imminent bubble-burst.
- Paypal, eBay, and many other BFSI or eCommerce players: Graph computing is NOT uncommon to these tech-driven new era Internet companies – the core competency of graph is that it helps reveal correlations or connectivities that are NOT possible or too slow with regular relational databases or traditional big-data technologies, which were not designed to handle deep connections.
The modern concept of Graph was arguably (re)invented by the father of the Internet -- Tim Berners-Lee, who coined the concept of Semantic Web. He proposed that the Internet can be seen as a gigantic graph with all its URLs, each matching to a web page with contents embedded and linked inside of it, being the entities within the graph, and all the entities cross-referencing each other, therefore forming the web of WWW (World Wide Web). The Semantic Web concept was coined in the early 1990s, but the first industrial-grade Graph systems didn't come into realization until many years later. Initially, academic researchers created RDF specification (first edition in 2004, adopted by W3C in 1999, and v1.1 in 2014), which was originally data-modeling for metadata, and usually used within academic fields for knowledge management, the default query language for RDF is SPARQL. One thing about RDF is that it's heavy, it's verbose and it's difficult to maintain -- in short, developers don't like it. To draw a comparison, would you prefer XML or JSON? Probably JSON since it's simple, it's lightweight and it's fast, period.
The same holds true for the invention of LPG -- Labeled Property Graph, which didn't come into existence until 20 years after the invention of Semantic Web, and it was populated by Neo4J, a Swedish founded tech firm that has released the first LPG graph in 2011. A few other players have also invested in this area, to name a few: TitanDB (defunct since 2016), Apache TinkerPop, JanusGraph, Amazon Neptune, Baidu HugeGraph, Google's GraphD (and DGraph) ...
The Evolution of Data → Big Data → Fast Data → Deep Data
Graph database (or generally Graph system, Graph platform, Graph solution or Graph engine, these all refer to the same thing -- a system that's graph centric, works around a graph and computes within a graph...) is considered the crown-jewel of NoSQL, especially in the era of data connections, in the era of maximizing value of large volume of data through deep and fast mining.
Graph is the ideal solution, if you also consider the timing factor, say, real-time restriction, it makes graph the only tangible solution. You can't otherwise achieve real-time deep data correlations with other types of NoSQL or relational databases. Key-Value Stores, Column databases, Hadoop or Sparks, Document databases are simply ill-equipped to handle the data correlation problems. It is this aforementioned problem and challenge that gives birth to and enables the speedy growth of Graph database market.