Apache Cassandra and Apache Ignite: Better Together

Bookmark and Share

Apache Cassandra is a popular open source, distributed, key-value store columnar NoSQL database used by companies such as Netflix, eBay, and Expedia for strategic parts of their business. When combined with Apache Ignite, Apache Cassandra becomes even more powerful, allowing it to be used for today’s most demanding web and cloud applications.

Some of the features that make Apache Cassandra  appealing include a fully distributed, peer-to-peer architecture. Apache Cassandra has no single point of failure, so it is well suited for high-availability applications. It supports multi-datacenter replication, allowing, for example, organizations to store data in the cloud across multiple Amazon Web Services (AWS) availability zones for greater resiliency. Cassandra also offers massive and linear scalability. Any number of nodes can be added to (or removed from) any Cassandra cluster in any datacenter, enabling users to reliably store ever-growing amounts of structured and unstructured data.

In addition, Apache Cassandra combines distributed systems technologies from the Amazon Dynamo key-value store and Google’s BigTable column-based data mode, making it possible to model complex data structures that would be difficult to model in traditional relational databases.

Providing tunable consistency, users can configure replication to balance speed and reliability.

And, because it is supported by an open source community, Cassandra users benefit from a large and active network that continues to refine the solution and provides community-based support through a number of websites.

Common uses for Apache Cassandra include storing and analyzing sequentially captured measurements from sensors and application logs, and storing key-value data with high availability. This can be very beneficial for use cases such as web-scale applications or IT monitoring which require counts of high velocity data.

As powerful as Apache Cassandra is, it does have some limitations. It is disk-based, which ultimately limits the speed of some operations because data needs to be written to and read from disks. As a result, in today’s world of real-time performance demands, organizations may want to increase the performance of their Cassandra deployments beyond the software’s native capabilities.

That’s where the Apache Ignite in-memory computing platform comes in. In-memory computing is at least 1,000x faster than disk-based storage. Until recently, however, the high cost of RAM made implementing an in-memory computing solution cost-prohibitive for many use cases. Organizations that needed the speed of in-memory computing had to cobble together multiple products. Now that the cost of memory has dropped – roughly 30%  per year since the 1960s – it can be cost-effective to utilize terabytes of RAM in an in-memory computing cluster for a given use case. And with the introduction of Apache Ignite, every organization has access to a high-performance, integrated and distributed in-memory computing platform that is offered as a free downloadable open source solution. Apache Ignite is capable of real-time processing of very large datasets, and companies implementing it have reported processing transactions much faster than disk-based alternatives.

Benefits of Using Apache Ignite with Apache Cassandra

Apache Ignite is deployed as an in-memory computing layer between an organization’s existing data and application layers. An Apache Ignite cluster can be deployed and inserted between Apache Cassandra and an existing application layer, adding the speed and scalability benefits of in-memory computing without compromising the benefits of Apache Cassandra.

Speed and Flexibility

Beyond the speed benefits of storing data in RAM, Apache Ignite integrates with Cassandra in several flexible ways. Simply sliding Apache Ignite between the application and Cassandra and caching the active portion of Cassandra data in-memory in Apache Ignite is a quick and easy way to achieve a speed boost. In addition, a greater speed boost comes from holding Cassandra data completely in memory. This architecture greatly improves query speed as data does not have to be read from and written to disk to run queries and provides more flexibility for how the key values are used by the application. Also, in contrast to Cassandra’s on-disk indexes, Apache Ignite indexes reside in memory, allowing for ultra-fast SQL queries.

High Availability

Like Apache Cassandra, Apache Ignite is a peer-to-peer computing system that is always available. If a node goes down, applications continue to read from and write to any of the defined backup nodes. Apache Ignite also automatically redistributes data as a cluster grows. Further, Apache Ignite offers sophisticated clustering support, such as detecting and remediating split brain conditions, enabling the combined Cassandra/Ignite system to  be more available than a standalone Cassandra system.

Horizontal and Vertical Scalability

Like Apache Cassandra, Apache Ignite is horizontally scalable, so capacity can be added just by adding nodes to the Ignite cluster. As new nodes are added, more memory is available for caching data from Apache Cassandra. In addition, the combined system is more efficient as it can use all the memory on a node, not just the JVM memory. Objects can be defined to live on or off heap and use all the memory on the machines. This allows the Apache Ignite environment to scale vertically as well simply by increasing the amount of memory on each node.

ANSI SQL-99 and ACID Transaction Guarantees

Apache Ignite is powered by an ANSI SQL-99 compliant engine and offers ACID transaction guarantees for distributed transactions. It includes an In-Memory SQL Grid which provides in-memory database capabilities and it also offers ODBC and JDBC APIs. When Apache Cassandra and Apache Ignite are combined, any type of OLAP or complex SQL query can be written against the Cassandra data currently in-memory in Apache Ignite. While Apache Cassandra offers eventual consistency, Apache Ignite can be operated in multiple modes from eventual consistency to real-time full ACID compliance.

No Data Remodeling

Adding Apache Ignite does not require the data in an existing Cassandra database to be modified. Apache Ignite can read from Cassandra and other NoSQL databases just as well as it does relational databases. There is also no need to modify the schema, which will migrate directly into Apache Ignite as is.

Apache Ignite requires no “rip-and-replace” so it is a convenient solution for organizations with a relational database considering a move to Apache Cassandra that are concerned about having to redo their data model to match Cassandra’s requirements. Instead of remodeling the data for a move directly to Apache Cassandra, an organization can use Apache Ignite on the relational database, change the application to interface with Apache Ignite, and then migrate the relational database to Apache Cassandra. The application will see no difference between the original relational database and Apache Cassandra, if it goes through Apache Ignite.

Apache Ignite works equally well with NoSQL, RDBMS, and Apache® Hadoop® data stores, so Apache Ignite can be used to speed them up and scale them out as well. Apache Ignite can also be used with Apache® Spark™, and the Ignite file system can be used to pin resilient distributed datasets (RDDs) into memory, using data from Apache Cassandra, Apache Hadoop, or a relational database, to make Spark faster and to share state between Spark jobs.

A Mature Codebase

While Apache Ignite is fairly new to the Apache Software Foundation (ASF), it has a very mature codebase. It originated as a private project in 2007 and was donated to ASF in 2014. Ignite graduated to a top-level project in about a year – the second-fastest Apache project to graduate (after Apache Spark). Apache Ignite has an active worldwide community and includes over one million lines of code with a robust feature set.

Integrating the Solutions

Architecturally, integrating Apache Ignite with Apache Cassandra is straightforward. Apache Cassandra users typically have some type of application that reads and writes out of the Cassandra cluster (possibly with Apache® Kafka™ or other clients). Apache Ignite slides between Apache Cassandra and the application and integrates using the Cassandra connector in Apache Ignite. The application then no longer reads and writes out of Apache Cassandra. Instead, it reads and writes out of Apache Ignite, so it is accessing data in memory instead of on disk. Apache Ignite handles the reads and writes out of Apache Cassandra.

What’s Ahead

Organizations using Apache Cassandra – or considering it – but concerned about meeting the performance demands of extreme OLTP and OLAP workloads of today’s web-scale applications should consider taking advantage of the Apache Ignite in-memory computing platform. Combining the two solutions will allow applications to access data in memory instead of on disk, an approach that is 1,000 times faster than disk-based approaches. Adding Apache Ignite to Apache Cassandra maintains Cassandra’s high availability and horizontal scalability while also providing several additional benefits, including more flexible ANSI SQL-99 compliant query capabilities, horizontal and vertical scalability, and more robust consistency with ACID transaction guarantees. All this is achieved without the need to remodel the data.