VoltDB Pushes the Boundaries on In-memory Databases

Michael Stonebraker is widely recognized as one of the pioneers of the relational database.  While at Berkeley, he co-founded the INGRES project, which implemented the relational principles published by Edgar Codd in his seminal papers.   The INGRES project became the basis for the commercial Ingres RDBMS, which, during the 1980s, provided some of the most significant competition to Oracle.

Ingres lost the RDBMS battle and eventually was bought by CA.  But, in the meantime, Stonebraker and colleagues created the Postgres database, which became - with MySQL - one of the leading open source databases.  The Postgres BSD licensing model was friendlier to commercial exploitation than the MySQL GPL and, as a result, Postgres became the technology underlying a number of significant commercial database projects such as Aster, Greenplum and EnterpriseDB.

In 2005, Stonebraker and colleagues published a paper critiquing the relational model as a "one-size-fits-all" solution for database management.  They argued that for every significant application type, a customized database design could deliver performance that was at least 10 times improved, compared to today's one-size-fits-all relational database.

Stonebraker and his team followed up with concrete designs for database systems optimized for Data Warehousing (C-store) and OLTP application processing (H-Store).  The C-Store design heavily influenced the design of important data warehousing databases, most significantly Vertica.

While C-store attempted to imagine a new storage model for the data warehouse, the H-Store model was intended to push the boundaries of OLTP database processing. H-Store is described by the Stonebraker group as a "complete re-write" of the OLTP DBMS. 

Disk IO remains the biggest bottleneck for DBMS systems; while Moore's Law is creating exponential growth in the memory and CPU capacity, IO performance has improved only slightly.  To avoid this, H-Store uses a memory-based model.  Rather than guaranteeing data persistence by writing to a disk, persistence is guaranteed through replication across multiple machines. In-memory data still can be backed up to disk or tape, of course, but, for normal operations, no disk IOs are required.   If you need more memory than a single machine can support, you add more machines to the H-Store.

H-Store employs a hierarchical data model.  While hierarchical organization is less flexible (and, arguably, less "correct") than the relational model, it allows for highly optimized partitioning and shared-nothing clustering, which in turn allows for scale out across large numbers of machines: a necessity as well as a virtue given the memory-based storage model. 

H-Store radically simplifies the concurrency model employed by relational databases to avoid many of the overhead and contention issues that arise.   Each H-Store instance is single-threaded, which radically simplifies locking and latching, although multiple instances can be deployed on a single machine to take advantage of multiple CPUs. 

H-Store transactions are made more atomic than in the relational database model by encapsulating them into a single stored procedure call, rather than being represented by a collection of separate SQL statements.  This ensures that transaction durations are minimized (no think-time or network time within transactions) and further reduces locking issues.

The H-Store proposal recommended that SQL language be depreciated in favor of a stored-procedure only programming model using more object-oriented programming paradigms. 

The VoltDB company was formed to commercialize the H-store design.  The VoltDB database follows the H-Store design, but implements a fairly limited subset of the SQL language and Java stored procedures for transactions.  

Adding SQL to the H-store model puts VoltDB into the gray zone between totally NoSQL databases like Cassandra, and fully relational databases like Oracle.  However, VoltDB and H-store are definitely not minor variations on the RDBMS theme; they are truly floor-to-ceiling rewrites.  For applications that want to push the envelope on transaction processing, VoltDB is worth considering.