In a new book titled Next Generation Databases: NoSQL, NewSQL, and Big Data, Guy Harrison shares what every data professional needs to know about the future of databases in a world of NoSQL and big data.
The first revolution in database technology was driven by the emergence of the electronic computer, and the second by the emergence of the relational database, writes Harrison, who leads the team at Dell that develops the Toad, Spotlight, and Shareplex product families. "This book is about a third revolution in database technology."
The following article is an excerpt adapted by Harrison from the book:
It would be hard to deny the seismic shift that has occurred in the database landscape. Hadoop, Spark, MongoDB, Cassandra, and many other non-relational systems today form an important and growing part of the enterprise data architecture of many, if not most, Fortune 500 companies. It is, of course, possible to argue that all these new technologies are a mistake and that the relational model and the transactional SQL relational database represent a better solution and that eventually the market will “come to its senses” and return to the relational fold. But this seems unlikely.
Nevertheless, a critic of non-relational systems might fairly claim that the latest breed of databases suffer from the following weaknesses:
A return of the navigational model. Many of the new breed of databases have reinstated the situation that existed in pre-relational systems, in which logical and physical representations of data are tightly coupled in an undesirable way.
Inconsistent to a fault. The inability in most non-relational systems to perform a multi-object transaction, and the possibility of inconsistency and unpredictability in even single-object transactions, can lead to a variety of undesirable outcomes. Phantom reads, lost updates, and nondeterministic behaviors can all occur in systems in which the consistency model is relaxed.
Unsuited to business intelligence. Systems like HBase, Cassandra, and MongoDB provide more capabilities to the programmer than to the business owner. The absence of a complete SQL layer that can access these systems isolates them from the broader enterprise.
Too many compromises. There are a wide variety of specialized database solutions, and in some cases these specialized solutions will be an exact fit for an application’s requirements. But in too many cases the application will have to choose between two or more NQR (not quite right) database architectures.
A vision for a converged database
I’ve become convinced that we can “have it all” within a single database offering. For instance, there is no architectural reason why a database system should not be able to offer a tunable consistency model that includes at one end strict multi-record ACID transactions and at the other end an eventual consistency style model. In a similar fashion, I believe we could combine the features of a relational model and the document store, initially by following the existing trend toward allowing JSON data types within relational tables.
An ideal database architecture would support multiple data models, languages, processing paradigms and storage formats within the one system. Application requirements that dictate a specific database feature should be resolved as configuration options or pluggable features within a single database management system, not as choices between disparate database architectures.
Specifically, an ideal database architecture would:
- Support a tunable consistency model which allows for strict RDBMS-style ACID transactions, Dynamo-style eventual consistency, or any point between.
- Provide support for an extensible but relational compatible schema by allowing data to be represented broadly by a relational model, but also allowing for application-extensible schemas, possibly by supporting embedded JSON data types.
- Such a database would support multiple languages and APIs. SQL appears destined to remain the primary database access language, but should be supplemented by graph languages such as Cypher, document-style queries based on REST and the ability to express processing in MapReduce or other Directed Acyclic Graph algorithms.
- An underlying pluggable data storage model should allow the physical storage of data to be based on row oriented or columnar storage is appropriate and on disk as B-trees, Log Structured Merge trees or other optimal storage structures.
- Support a range of distributed availability and consistency characteristics. In particular, the application should be able to determine the level of availability and consistency that is supported in the event of a network partition and be able to fine tune the replication of data across a potentially globally distributed system.
Disruptive Database Technologies
So far, I’ve described a future in which the recent divergence of database technologies is followed by a period of convergence toward some sort of “unified model” of databases.
Extrapolating existing technologies is a useful pastime, and is often the only predictive technique available. However, history teaches us that technologies don’t always continue upon an existing trajectory. Disruptive technologies emerge which create discontinuities that cannot be extrapolated and cannot always be fully anticipated.
It’s possible that a disruptive new database technology is imminent, but it’s just as likely that the big changes in database technology that have occurred within the last decade represent as much change as we can easily accept.
That having been said, there are a few computing technology trends which extend beyond database architecture and which may impinge heavily on the databases of the future. Three that I’m watching particularly carefully are:
Universal Memory
Since the dawn of digital databases, there has been a strong conflict between the economics of speed and the economics of storage. The medium that offers the greatest economies for storing large amounts of data (magnetic disk, tape) offers the slowest times and therefore the worst economics for throughput and latency. Conversely, the medium that offer the lowest latencies and the highest throughput (memory, SSD) is the most expensive per unit of storage.
However, should a technology arise that simultaneously provides acceptable economics for mass storage and latency then we might see an almost immediate shift in database architectures. Such a universal memory would provide access speeds equivalent to RAM together with the durability, persistence and storage economics of disk.
Most technologists believe that it will be some years before such a disruptive storage technology arises though, given the heavy and continuing investment, it seems likely that we will eventually create a persistent, fast, and economical storage medium that can meet the needs of all database workloads. When this happens, many of the database architectures we see today will have lost a key part of their rationale. For instance, the difference between Spark and Hadoop would become minimal if persistent storage (aka. disk) was as fast as memory.
Blockchain
Blockchain is the distributed ledger that underlies the Bitcoin cryptocurrency.
Blockchains arguably represent a new sort of shared distributed database. Similar to systems based on the Dynamo model, the data in the block chain is distributed redundantly across a large number of hosts. However, the Blockchain represents a complete paradigm shift in how permissions are managed within the database. In an existing database system, the database owner has absolute control over the data held in the database. However, in a Blockchain system, ownership is maintained by the creator of the data.
Consider a database that maintains a social network such as Facebook: Although the application is programmed to allow only you to modify your own posts or personal details, the reality is that the Facebook company actually has total control over your online data. They can – if they wish – remove your posts, censor your posts, or even modify your posts if they really wanted to. In a Blockchain-based database, you would retain total ownership of your posts and it would be impossible for any other entity to modify them.
Quantum Computing
Using Quantum effects to create a new type of computer was popularized by physicist Richard Feynman back in the 1980s. The essential concept is to use subatomic particle behavior as the building blocks of computing.
Quantum computers promise to provide a mechanism for leapfrogging the limitations of silicon based technology and raises the possibility of completely revolutionizing cryptography. The promise that quantum computers could break existing private/public key encryption schemes seems increasingly likely, while quantum key transmission already provides a tamper-proof mechanism for transmitting certificates over distances within a few hundreds of kilometers.
If quantum computing realizes its theoretical potential it would have enormous impact on all areas of computing – databases included. There are also some database–specific quantum computing proposals:
- Quantum transactions: Inspired by the concept of superimposition, it’s proposed that data in database could be kept in a “quantum” state, effectively representing multiple possible outcomes.
- Quantum search: A quantum computer could potentially provide an acceleration of search performance over a traditional database. A quantum computer could more rapidly execute a full table scan and find matching rows for a complex non-indexed search term. The improvement is unlikely to be decisive when traditional disk access is the limiting factor, but for in-memory databases it’s possible that quantum database search may become a practical innovation.
- Quantum Query Language: The fundamental unit of processing in a classical (e.g., non-quantum) computer is the bit - which represents one of two binary states. In a quantum computer, the fundamental unit of processing is the qubit which represents the superimposition of all possible states of a bit. To persistently store the information from a quantum computer would require a truly quantum-enabled database which was capable executing logical operations using qubit logic rather than Boolean bit-logic. Operations on such a database would require a new language which could represent quantum operations instead of the relational predicates of SQL. Such a language has been proposed: Quantum Query Language (QQL).
This article has been excerpted from Next Generation Databases: NoSQL, NewSQL, and Big Data, 1st Edition; Guy Harrison; Copyright 2015, Apress Media, LLC. Adapted with permission from Apress Media, LLC.
To order the book, go to www.amazon.com/Next-Generation-Databases-NoSQL-NewSQL-ebook/dp/B015PQPALM.