A Primer on the Basics of NoSQL Databases

A relatively new concept in the world of database systems is the NoSQL DBMS. What is NoSQL? Well, I bet you guessed that it doesn’t use SQL, right? I mean, it is sort of right there in the name. But NoSQL does not exactly mean no SQL, at least not anymore. The movement (and its name) originally gained popularity when the primary providers did not use SQL. But these days there isn’t exactly much rigor in terms of defining exactly what a NoSQL database system is, or what it must be able to do. And many in the field are redefining NoSQL as NOSQL, where the NO stands for “Not Only.”

At a high level, NoSQL implies non-relational, distributed, flexible, and scalable. Many are also open source. NoSQL grew out of the perceived need for “modern” database systems to support web initiatives. Additionally, some common attributes of NoSQL DBMSes include the lack of a schema, data clustering, replication support, and an “eventually consistent” capability (instead of the typical ACID transaction capability).

NoSQL Does Not Mean That SQL is Not Used

So, it really does not mean that SQL is not used. And that is a good thing, because SQL is the lingua franca of database access, and therefore I believe that adding SQL support to the NoSQL database offerings will help to boost their popularity.

The next question usually asked is “If they are not relational, what are they?” And the answer is that there is not a single data model followed by the NoSQL providers. Instead, there are four popular types of NoSQL database offerings: document stores, column stores, key/value pairs, and graph databases.

A document store manages and stores data at the document level. A document is essentially an object and is commonly stored as XML, JSON, BSON, etc. A document database is ideally suited for high performance, high availability, and easy scalability. You might consider using a document store for web storefront applications, real-time analytical processing, or to front a blog or content management system. They are not very well-suited for complex transaction processing as typified by traditional relational applications, though. MongoDB is the most popular document database, but others include Couchbase, RavenDB and MarkLogic.

A columnar DBMS turns the traditional notion of a relational database on its side, storing data as sections of columns rather than as rows. By changing the focus from the row to the column, column databases can achieve performance gains when a large amount of data is aggregated for a single column. Data warehousing and CRM applications can benefit from column stores. Examples of columnar databases include Cassandra, Cloudera, and HBase (which is based on Hadoop).

The key/value database system is useful when all access to the database is done using a primary key. There typically is no fixed data model or schema. The key is identified with an arbitrary “lump” of data. A key/value pair database is useful for shopping cart data or storing user profiles. It is not useful when there are complex relationships between data elements or when data needs to be queried by other than the primary key. Examples of key/value stores include Riak, Berkeley DB, and Aerospike.

Finally, we have the graph database, which uses graph structures with nodes, edges, and properties to represent and store data. In a graph database every element contains a direct pointer to its adjacent element and no index lookups are necessary. Social networks, routing and dispatch systems, and location aware systems are the prime use cases for graph databases. Some examples include Neo4j, GraphBase, and Meronymy.

NoSQL database systems are becoming popular for big data implementations and for the types of applications mentioned in the paragraphs above (for each specific type of NoSQL offering). The other term that is bandied about within the NoSQL community is polyglot persistence. Don’t let the multiple syllables frighten you away. All it really means is using different database systems for different applications and use cases based upon how the database supports the needs of the application. Which kind of makes sense, doesn’t it?

The Problem with the Hype Around NoSQL and Big Data

One of the problems with the hype surrounding NoSQL and big data is the confusion that the hubbub creates. These NoSQL database systems are not going to wholesale replace relational DBMSs. Relational systems like DB2 and Oracle and SQL Server run the financial and business applications of most major corporations these days… and that is not going to change no matter how successful the NoSQL offerings become.

Instead, what we will be seeing is additional non-relational, NoSQL systems being implemented for certain systems where and when it makes sense. And, I think we will also see the major relational vendors augmenting their relational DBMSs with additional engines for supporting document, column, key/value, and graph capabilities. In other words: polyglot persistence using a single DBMS, but with multiple engines “under the covers” to store and manage the data according to the needs of the applications.

So NoSQL is on its way to your shop … now if they can only do something about the name NoSQL … I mean, why would you want your product to be defined by what it doesn’t do, instead of what it can do?