Not All Graph Databases Are Created Equal: Why You Need a Native Graph


Building a dependable database management system is no easy task. You need to understand what the design trade-offs in the construction of a database management system are and also how those trade-offs impact end-user problems we are aiming to solve.

Each database management system chooses differently from a very broad set of design choices. Given that, not all databases are created equal. When it comes to determining which database is appropriate for our needs, we need to understand the requirements of our applications, balanced against potential trade-offs the database designer has chosen.

Graph databases are increasingly popular. In fact, according to DB-Engines graphs are the fastest growing of any database category since 2013. This growth is fueled in part because many organizations are realizing the value of understanding connections in their data. For companies looking to use a graph database to build behavior and decision-making applications based on real-time evaluation of connected data, there are several key attributes, including integrity, performance, efficiency and scalability.

If all databases are not created equally, which graph database is best for your solution? Fortunately we have experience to draw upon that can guide us toward a pragmatic technology investment. Primary amongst these is the native and non-native design decision of the database management system.

As the name suggests, native graph databases are those specifically built to handle graph workloads across the entire computing stack. The alternative - non-native, comes in two types: 1. those that layer a graph API on top of an existing, native-to-other kind of database management system and 2. those that claim multi-model semantics where one system can purportedly support several data models.

There is a considerable difference between the architecture of native graph storage and querying, compared to non-native. Predictably, native tends to perform queries faster, scale better (retaining their query speed as datasets grows in size), and run more efficiently (even upon less hardware).

Why Native?

A native graph database is distinguished by an exclusive preference to serve graph workloads across its entire stack. That stack - from query language through to the database management engine and file system considerations, and from clustering to backup and monitoring - epitomizes graph thinking throughout.

The native graph database ensures that end-user application developers can work with the graph productively and humanely. It also needs to ensure that your precious data is safe and that the system as a whole is dependable. To achieve all of this, it must optimize every layer of its stack for graphs – no responsibility is abdicated to non-graph native software. As such, components of the native graph database are continuously “graph-affined” as hardware trends emerge and evolve, because each component in the architecture must make sure that graph workloads run efficiently and safely on that hardware.

Native Graph Storage

Graph storage refers to the underlying structure of connected data persisted (often, but not always) on disk. When the storage system is built specifically for graph data, it’s known as native graph storage.

Native graph databases are designed to use the file system in a way that understands and is sympathetic to graphs, which means it is both highly performant and safe for graph workloads. For example a traversal across a relationship in such a database has constant cost irrespective of the size of the graph and that constant cost is minimal because of mechanical sympathy between the software and hardware.

Conversely, graph storage is non-native when it is optimized for any other storage model To translate columnar, relational, document, or key-value data as a graph, the database management system has to perform costly translations to and from the the primary model of the database. While implementers can try to amortize these translations through radical denormalization, this non-native approach typically leads to high latency when querying graphs. It also has very well-understood safety risks when persisting graph data - risks which radical denormalization exacerbates.

The disconnect between graph data with non-graph storage is problematic for both performance and scalability. Our research and development experience indicates that the the only way to ensure data safety is to update the graph via ACID transactions. Maintaining relationships between records is far more demanding than weaker-than-ACID consistency models can provide.

Native graph databases include transactional mechanisms to ensure that data safety remains impervious to network blips, server failures, and even contention from competing transactions or scaling decisions. Non-native graph architectures, especially the variants that are built on eventually consistent stores, can (and will eventually) corrupt graph data.

Furthermore, native storage allows for implementations of the evolving hardware architectures of tomorrow. As memory and disk technology evolves, a native graph database implementation evolves to support ever more ambitious graph workloads. In coming years we fully expect to see the emergence of native storage models for novel disk storage platforms and memory architectures like non-volatile RAM.

Native Graph Query Processing

Native graph querying is a critical consideration of graph technology. It refers to how a graph database describes, plans, optimizes and executes queries. With a native graph system, every architecture layer – from the user’s expression in the Cypher graph query language to the files on disk – is optimized for storing and retrieving graph data.

Through radical denormalization, non-native graph databases can be designed to try and avoid mechanical penalties. A non-native store may be optimized for three levels of traversal depth by duplicating and co-locating data or by creating increasingly arcane set of indexes for each query. Beyond that, the traversal performance reduces drastically whereas the native approach provides consistently high traversal performance at any depth. The upshot is that initially queries seem performant, but then there is a mechanical cliff edge which causes latency to rapidly increase for reasons that will seem completely innocuous to the end user.

We at Neo4j have seen this first hand. Our early implementation of a graph database (back in the early 2000’s) was non-native with a graph API fixed atop a relational database. When our queries involved around three levels of depth or more they degraded substantially in performance. Worse, reversing the direction of a traversal is also extremely difficult with non-native graph processing in a relational database. To be able to reverse traversal direction, you must either create a costly reverse-lookup index for each use-case, or perform a brute-force search through the original index. Neither workarounds are performant or maintainable over time.

Key Advantages of Native Graph Architecture

A native graph architecture provides many other advantages that make it  generally superior to non-native graphs.

  1. Minutes-to-Milliseconds Performance – Native graph databases handle connected data queries far faster than non-native graph databases. Even on modest hardware, native graph databases can easily handle millions of traversals per second between nodes in a graph on a single machine, and many thousands of transactional writes per second.
  2. Data Integrity for Graphs – Native graph databases that support ACID transactions, which means that once a transaction is complete, its data is consistent and durable, which may involve multiple servers. Transactions also occur concurrently through transaction infrastructure. Even deadlocking transactions are automatically detected and rolled back. If there is a fault, no partially written records will exist.
  3. Efficiency – Native graph databases can deliver constant time traversals with index-free adjacency without complex schema design and query optimizations. This intuitive property-graph model eliminates the need to create any additional, and often complex, application logic to process connections.

Why This Matters

The common misconception of non-native graph technology is that it’s “good enough,” particularly if that non-native technology is already installed for its native use-case. But this is short-sighted: data is growing, and today’s datasets are more variably structured, interconnected and interrelated than ever before.

The value is in the connections of the data - a non-native approach limits that value. A native graph database will serve you better over the long-term and won’t require extraordinary hardware investments.

The choice is not always clear, but we believe that enterprises hoping to get the most out of the connections in their data will find the integrity, performance, efficiency and scaling advantages of a native graph database critical.



Newsletters

Subscribe to Big Data Quarterly E-Edition