The Rise of NoSQL Databases
Businesses are struggling to cope with and leverage an explosion of complex and connected data. This need is driving many companies to adopt scalable, high performance NoSQL databases - a new breed of database solutions - in order to expand and enhance their data management strategies. Traditional "relational" databases will not be able to keep pace with "big data" demands as they were not designed to manage the types of relationships that are so essential in today's applications.
Not every company can design its own custom NoSQL database, and so a few categories of open source and commercially available NoSQL have emerged. But NoSQL databases built to manage these large web properties are not necessarily designed for the majority of enterprise applications. The enterprise has seen the same explosion in data complexity and volume as the web world, yet few of the NoSQL databases available today can meet the demands of the enterprise.
What are the Enterprise Requirements for a NoSQL Database?
As enterprises introduce new interactive applications - from banks offering self-service applications to retailers suggesting additional products based on a customer's business network - they expect their database to perform much as it did before, even though their data is much more complex and connected.
From meetings with hundreds of developers, architects and CIOs at Fortune 500 companies, a few essentials have emerged as enterprise requirements for a NoSQL database. Not surprisingly, many are characteristics that have been proven for years by traditional enterprise-strength databases:
- Ability to Handle Today's Complex and Connected Data
The biggest difference between a relational database and a NoSQL database is the ability to store not only huge volumes of data, but also data types that are complex and connected. In other words, data such as audio, video, social network feeds, Web logs, email, documents, and other text-centric information are very difficult, if not impossible, to squeeze into the confines of a traditional relational database. A NoSQL database should enable high performance queries on complex, connected data inherent in today's applications. Users should be able to ask questions such as "Who are all my contacts in Europe?" and "Which of my contacts ordered from this catalog?"
- Simplify the Development of Applications Using Complex and Connected Data
A NoSQL database should be able to easily represent the complex and connected data that makes up today's enterprise applications. Unlike traditional databases, a flexible schema that allows for multiple data types enables developers to easily change applications without disrupting live systems. More collaborative development practices such as Agile have replaced waterfall processes and databases must be flexible and adaptable to keep the lights on amid constantly changing infrastructures.
- Support for End-to-End Transactions
A surprisingly few number of NoSQL databases commercially available today are able to conduct "all or nothing" transactions the way traditional databases do. Although this is a must- have for relational databases, not all NoSQL databases can do this. Enterprise developers want to be able to group operations and have all of them succeed or not at all. An example of this would be taking $100 out of one bank account: the database should confirm that $100 has been deposited into another account before committing it to the database log. Twitter will probably survive if a single Tweet is lost, but an enterprise application such as online banking cannot afford such a mistake.
A NoSQL database for the enterprise should support ACID transactions including XA-compliant distributed two-phase commits. The connections between data should be stored on a disk, in a structure designed for high-performance retrieval of connected data sets, all while enforcing strict transaction management. This design delivers significantly better performance for connected data than offered by relational database technologies.
- Enterprise-grade Durability so that Data is Never Lost
An NoSQL database for the enterprise needs to have enterprise-grade durability that ensures any transaction committed to the database will not be lost. In database systems, durability means the ACID property that ensures that transactions committed will be there, no matter what. In other words, if you book an airline ticket and the system goes down, that seat should still be booked after the system is recovered. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transaction in spite of any software or hardware failures. Some NoSQL databases tout single machine durability, but how can a business-critical application put all its eggs in one basket? Relational databases have employed replication for years to guarantee enterprise-strength durability. NoSQL databases should also be able to ensure durability.
- Java Still Reigns for Enterprise Development
In order to be serious about enterprise development, a NoSQL database must support Java. Java remains the most prevalent programming languages in today's enterprises. Developers need a Java-friendly way to handle complex, connected data using the transactional guarantees necessary for critical business applications. While hooks to other languages such as Ruby, Python, Groovy and others are convenient; a NoSQL database must first and foremost support Java to be a serious contender in the enterprise arena.
Emerging Categories of NOSQL Databases
There are four emerging categories of NoSQL databases available today: Key-Value stores, Column Family databases, Document databases and Graph databases. Each was designed to accommodate the huge volumes of data stored today as well as the new data types that are not easily stored within the confines of a traditional relational database. The type of NoSQL database you choose should be based on the type of data you need to store, its size and complexity.
- Key-Value Stores are the Simplest of NOSQL Databases
A Key Value data model is simple: it stores data in key and value pairs where every key maps to a value. It can scale across many machines, but cannot support other data types. A Key-Value store is ideal for applications that require massive amounts of simple data like sensor data or for rapidly changing data such as stock quotes. Key-Value stores support massive data sets, of very primitive data (hence the term "store" and not "database"). They are ideal for capturing time- series data, like every vital statistic from your morning run, and everyone else's morning run, over the last decade. Amazon's Dynamo was built as a Key-Value store.
- Column Family Databases Store Large Amounts of Data, But Not Rich Data
A Column Family database can handle semi-structured data, because in theory every row can have its own schema. It has few mandatory attributes and few optional attributes. It's a powerful way to capture semi-structured data, but often sacrifices consistency for availability. Column Family databases can accommodate huge amounts of data, with basic organization to help sift through the information. Writes are faster than reads; so one natural niche is real-time data analysis. Logging real-time events is a perfect use case or any time when you need random, real-time read/write access to your Big Data. Google's Big Table was built on a Column Family database. Apache Cassandra is another example, which was originally developed for Facebook to store billions of columns per row. However, it is unable to support unstructured data types or query end-to-end transactions.
- Document Databases Store Multiple Data Types, But Lack Transaction Support
A document database contains a collection of key-value pairs stored in documents. While it is good at storing documents, it was not designed with enterprise-strength transactions and durability in mind. Document databases are the most flexible of the key-value style stores, perfect for storing a large collection of unrelated, discrete documents. MongoDB and CouchDB are examples of document databases.
- Graph Databases Show the Connections Between Data
A graph database uses nodes, relationships between nodes and key-value properties instead of tables to represent information. This model is typically substantially faster for associative data sets and uses a schema-less, bottoms-up model that is ideal for capturing ad-hoc and rapidly changing data. Much of today's complex and connected data can easily be stored in a graph database where there is great value in the relationships among data sets.
In the Enterprise, There is Value in Relationships
A graph database models real-world connections better than other NoSQL databases. It can support today's complex and connected data types, and scale to billions of nodes and relationships. It is ideally suited for any application where knowledge is obtained by relationships. For example, you may want to know which of your customers on the East Coast have made a purchase in the last six months and will be attending an upcoming conference. The ability to cross-reference these data points gives you much more context to an individual customer than just a single record. Take it a step further and you can find out more about an individual customer - whether you have worked in a similar industry or play soccer on the weekends - all of which you can reference when you meet in person at the show.
A NoSQL graph database can easily perform these queries without impacting performance or being as cost-prohibitive as traditional databases. They were designed to quickly and easily compare how individual records relate to one another.
NoSQL for the Enterprise
NoSQL has emerged to manage new data types, huge volumes of data and the relationships between complex and connected today inherent in modern applications. The type of NOSQL database you choose depends on what type of data you need to store and how you want to access it. Each of the NoSQL databases serves a specific purpose.
NoSQL databases often coexist with traditional relational databases. That's why the term "NoSQL" has evolved to mean "Not Only SQL". Enterprises are too big and too complex for a one-size-fits-all solution.
When evaluating a NoSQL database, it is critical to demand enterprise-readiness. An enterprise delivering modern applications needs a N0SQL database that can manage today's complex and connected data while still delivering the enterprise strength, transactions and durability that IT departments have relied on for years.
About the author:
Emil Eifrem is CEO of Neo Technology and co-founder of the Neo4j project. Before founding Neo, he was the CTO of Windh AB, where he headed the development of highly complex information architectures for enterprise content management systems. Committed to sustainable open source, he guides Neo along a balanced path between free availability and commercial reliability. Eifrem is a frequent conference speaker and author on NOSQL databases.