Most commentators agree that “NoSQL” is a poor description for the new breed of non-relational databases that have arisen over the past 5 years. NoSQL describes what these databases remove rather than what they add and implies that the SQL language is a problem—and the only problem—to be solved. In reality, the non-relational movement was driven more by the limitations of fixed schemas and adherence to strict consistency in mainstream relational systems than to any perceived problems with SQL.
Shortly after the explosion of non-relational databases, around 2009, it became apparent that rather than being part of the problem, SQL would instead continue to be part of the solution. If the new wave of database systems excluded the vast population of SQL-literate professionals, then their uptake in the business world would be impeded. Furthermore, a whole generation of business intelligence tools use SQL as the common way of translating user information requests into database queries.
Nowhere was the drive toward SQL adoption more clear than in the case of Hadoop. Even in some of the earliest Hadoop deployments—such as Facebook—Hadoop sat side-by-side with relational data warehouses. The data warehouse supported real-time BI queries, while the Hadoop system offered access to fine-grained, unsanitized, aggregated raw data. Facebook needed to open up Hadoop to its existing population of analysts, which resulted in the birth of Hive—essentially an implementation of SQL on top of Hadoop.
Since then, a bewildering array of SQL-on-Hadoop technologies have emerged. There are pure Hadoop SQL solutions such as Hive and Impala, and hybrid solutions such as Presto. Almost every commercial relational database has some form of SQL interface to Hadoop as well.
Operational non-relational systems—key-value stores and document databases—have been slower to adopt SQL query capabilities. Cassandra implemented Cassandra query language (CQL), a simple subset of SQL, but this was designed primarily to provide an abstraction layer for developers, rather than a query language for end users.
JSON-oriented document databases such as Couchbase and MongoDB do not enforce a schema—an application is free to store whatever it wants within the database’s documents. However, JSON documents are self-describing, which provides N1QL statements with a basis for evaluating SQL column expressions. It’s typical in JSON to embed child documents that would be represented as master-detail table relationships in a relational database. The N1QL syntax allows these subdocuments to be un-nested and retrieved without complex application coding.
Most significantly, the N1QL language supports join operations between documents. Until now, such joins have required application coding in Java, Python, or other languages. The N1QL syntax also supports a full range of operations for creating, modifying, and deleting documents, as well as syntax for creating indexes and other server side structures.
N1QL contains significant deviations from the ANSI SQL standard, but implements a surprisingly broad set of SQL functionality. Although it currently only works with Couchbase, there’s no technical reason why it couldn’t be extended to other document databases such as MongoDB. And, almost every relational and non-relational database is implementing robust support for JSON. At a minimum, N1QL illustrates that the core features of SQL still can apply, even for systems that have, until now, described themselves as “NoSQL” databases.
A bewildering array of SQL-on-Hadoop technologies have emerged, but operational non-relational systems—key-value stores and document databases—have been slower to adopt SQL query capabilities.