There has been a lot of interest lately in NoSQL databases and, of course, many of us have strong backgrounds and experience in traditional relational "SQL" databases. For application developers this raises questions concerning the best way to go.
One recurring truth that eventually surfaces with all new software technologies is that "one size does not fit all." In other words, you need to use the right tool for the job, as each has its own strengths and weaknesses. In fact, a danger of many new architectural approaches is one of "over-adoption" - using a given tool to address a wide array of situations when originally they were designed for the specific problem domain in which they excel.
Therefore, the right answer to the question of whether to use a "SQL or NoSQL?" database is: "it depends." The best solution for your application may be a traditional SQL database, a NoSQL database, or possibly a mix of both. Each technology has its own areas of use, and the best recommendation is to investigate specific products to meet your specific needs. It's also important to consider your existing investment in what you have that is functional and proven, working out ways to preserve that investment while extending into new ways of doing things for improved application performance and capabilities.
Everyone is familiar with traditional SQL databases, so let's examine some of their perceived shortcomings, characteristics that are helping to spur interest in NoSQL technologies. Traditional RDBMS products are perceived as difficult to scale, with the most common solution being to "buy a bigger box." Unfortunately, this option is expensive and often yields only modest performance improvements. Relational databases are heavily structured, meaning that developers cannot create new data types (tables) easily "on the fly." Despite decades of use, it can be difficult to configure traditional databases for a high-availability scenario. Lastly, the SQL language can itself be overcomplicated, due to the stream of enhancements to the language and in some cases lack of understanding by application developers.
The truth is that traditional SQL databases are proven and reliable, and are especially fitted to complex application models. Scaling a relational database can be achieved through innovative Database sharding techniques (see the related article on Database sharding). The structure of a SQL database, while rigorous, ensures that data types and relationships are enforced, something that can avoid a huge number of difficult to find defects in an application. As for high availability, it is very feasible to implement with advanced replication models now available. And, finally, the SQL language can be overused and complex, but this is largely the choice of the developer or adopted standards within an application group.
So if most of the perceived weaknesses for traditional SQL databases can be overcome, why would you consider a NoSQL database?
At first glance, NoSQL databases can be very appealing. They offer speed that often exceeds a traditional database, an extremely simple API that can be understood in minutes, flexibility of structure, and often built-in Database sharding and reliability features. Due to these features, setup and development can proceed at a fast pace, with little investment of time and a short learning curve. However, as with any technology, it is important to understand the cost of these benefits, and where they can be best applied in a given application. Let's examine some of the qualities of NoSQL databases and then look at some ways to effectively use them.
The term NoSQL applies to many different products offering very different features and characteristics. While it is outside the scope of this article to review them all, there are two basic categories that should be examined: key/value stores and document databases (actually an extension of the key/value Store as you will see).
The NoSQL concept evolved from key/value stores, supporting the storage of a "key" (unique identifier) and a "value" (any arbitrary data type or sequence of bytes). These databases utilize an "eventually consistent" model across a number of servers, such that an application can write or read from the store without worrying about data location or built-in redundancy. Many of these products have database sharding (partitioning) and "eventually consistent" replication as a built-in capability. The API is simple and elegant, with two fundamental operations: "put" (a value) and "get" (a value). In its most basic form, a key/value store is a "structure-less" database, allowing developers to store any arbitrary simple or complex data type as required "on the fly." This means that if you need another field or type of information to be stored, you just do it in your application code. Because of the "eventual consistency" model, writes can be very fast when compared to a traditional database (especially if you are using an in-memory NoSQL database). The basic key/value store has been extended in some offerings to incorporate structures within a given value, such as nested set and list structures, supporting simple one-to-many relationships built into the engine itself.
The key/value store is a very simple concept, it can be very fast, and is extremely flexible (i.e., its "free form"). However, such a structure is very difficult to search, and your application code must understand the meaning of the keys to make it useful. Further, to search it may be required to iterate through the values to find what you are looking for, a cumbersome and potentially inefficient process.
So with an elegant API, the ability to store anything in any format you like, the promise of high-performance, scalability and eventually consistent replication, why wouldn't you use a NoSQL database for every need? There are several reasons, these being the most important:
- - It's true that NoSQL databases can scale and perform lots of writes with high transaction volumes using the eventually consistent model. But what isn't generally understood is that if you require durable writes, the NoSQL products that support it can run as much as 90% slower when you invoke this option. This yields single server speeds that are comparable to the traditional SQL database durability we have all come to expect.
- - A critical element in all application data is relationships, a capability inherent in the very fabric of SQL table structures. In fact, relationships are the core defining element of application data itself, as unrelated "data" would not be useful at all (imagine an order with no customer, it just wouldn't work). With NoSQL relationships are still required, but the developer is responsible for defining, creating and maintaining data relationships within the application code. The typical method used to accomplish this is by storing and maintaining de-normalized lists as part of the "value" in the store. For simple application structures this can work well, but with complex models it can be difficult to achieve. Further, the integrity of related data is not enforced in any way, therefore this could be a source of application defects and failures.
- - There is no doubt that the "type free" structure of NoSQL makes for very fast application prototyping and development. However, a "structure-less" engine requires a lot of discipline among a team of application developers to ensure that the data makes sense. Otherwise the application could be very difficult to manage and understand, particularly as new developers are added to the team.
- - Using new advances in database sharding and high-performance replication techniques, it is possible to achieve excellent linear scalability and high-availability with traditional SQL databases, yielding the same type of scalability and reliability normally associated with NoSQL databases. With these techniques you can preserve your existing investment where it makes sense.
Therefore, the best places to use NoSQL technology is where the data model is simple, where flexibility is more important than strict control over defined data structures, where high performance is a must, strict data consistency is not required, and where it is easy to map complex values to known keys. One good example is a flexible user object with many free-form profile attributes or a growing set of configuration parameters. If you have an application requiring many lists of things, and those lists are confined to definable entities, this can also be a good fit (especially for the key/value stores that support nested structures within the value).
On the flip side, if you need consistent data, durable writes, transactions and you have complex data structures and relationships that require type safety and search-ability, a traditional database is still be your best bet. Application areas that are a strong fit for traditional SQL databases can include any complex business application, especially accounting, inventory or other complex specialized business requirements where transaction integrity is vital.
The upshot is that some data is best suited to free-form types and lists using NoSQL technology, with excellent performance using this model, while other types of data demand a fully structured relational approach, especially financial-oriented applications where consistency and durability are a must.
In the best of all possible worlds, as NoSQL technology matures, it would be great to have seamless access to both types of environments. Then developers would be truly free to take full advantage of the best data store for a given application requirement. It is easy to see that you could store NoSQL keys in a traditional relational structure, and utilize the best of both technologies for differing requirements within the same application. This capability is evolving rapidly, and many are predicting products that will support a merge of the two environments over time. This compatible approach is very appealing, as it will help to preserve the inherent simplicity and performance of NoSQL databases (rather than having them suffer the same "bloat" that relational database have undergone over the past few decades), while capitalizing on the proven strengths of traditional relational databases where they can be best utilized. The end result will be faster application performance, increased flexibility, and accelerated development of quality applications.