Lessons Learned from Instaclustr’s Colossal Migration to Apache Kafka and Apache Cassandra

Apache Kafka and Apache Cassandra pose undeniable benefits to proprietary data structures, offering high data throughput, scalability, durability, availability, consistency, and more. It’s clear why an organization might migrate to one—or, more likely, both—of these technologies. Yet, what more can be revealed about Apache Kafka and Apache Cassandra migrations when examining them at a large scale?

Ben Slater, CPO, Instaclustr by NetApp, joined DBTA’s webinar, The World's Largest Apache Kafka and Apache Cassandra Migration: Lessons Learned and Best Practices, shared their firsthand experience in managing a migration project of colossal scale, delivering valuable insights in areas such as planning, security, performance optimization, monitoring, and more.

Slater offered context to Instaclustr’s migration size, where from Cassandra, the company had to move 58 clusters—or a little over 1,000 nodes—ranging in 17 different node sizes, across two cloud providers (AWS and GCP), and six provider regions. The Apache Kafka side had similar conditions, except with 154 clusters and 21 different node sizes. The migration consisted of “pretty much every angle of complexity you can get,” according to Slater.

Instaclustr’s migration began in February 2023, completing the project in October of the same year—a fairly rapid transition considering the scope of the project.

Examining how Instaclustr managed this migration, Slater emphasized that “it’s really important, when entering into a large project, to have a good overall framework for how you’re managing that project.”

Beginning with a high-level methodology that “gives us enough of a framework of where we need to go” is favorable to starting with an in-depth, detailed strategy, Slater explained. This is because securing buy-in to the change, as well as making sure everyone is on the same page, ensures that the migration has a sturdy foundation before getting overly complex.

A migration of this scale also necessitates staffing key roles to ensure that the transition is seamless, noted Slater. Instaclustr employed a variety of positions, including an overall program manager, Cassandra and Kafka migration project managers and technical leads, as well as a key customer product manager. As a result, “the team worked directly with our customer counterparts and established communication mechanisms that were vital to the project,” according to Slater.

Slater broke down the overall migration process into 5 building steps, listed as the following:

  1. Assess: Understand your existing environment as well as technical and business constraints.
  2. Determine approach: Review pros and cons of available migration approaches to determine the best fit.
  3. Per-cluster design: Review each cluster configuration in detail and document specific migration plans for each.
  4. Test: Test migration by transitioning non-production clusters.
  5. Execute: Migrate clusters in a joint effort with client teams.

The challenge of maintaining security and compliance during a migration is a precarious complexity, as failure to adhere to regulatory standards and security best practices can have dramatic consequences. Instaclustr established several enhancements prior to executing production migrations that maintained security standards, closing gaps where risk may surface.

Slater guided webinar viewers through a variety of migration approaches for Apache Kafka and Apache Cassandra, including a shared cluster approach, a method using ZooKeeper instances to alleviate complexity, a zero-downtime strategy, and more.

For the full discussion of the world’s largest Apache Kafka and Apache Cassandra migration, with examples, best practices, and detailed migration advice, you can view an archived version of the webinar here.