The proliferation of connected devices has been decades in the making. Progressively, data emanating from these devices have grown at unprecedented rates. A 2016 article suggested that a Tesla car generates two-to-five terabytes of data every week. And according to a recent MIT News report, the Large Hadron Collider (LHC) facility is soon expected to produce one petabit of data per second. While these two examples could be argued as specialized outliers, the data deluge is a reality even for common business applications.
According to Gartner, the number of Internet of Things (IoT) devices is doubling every 5 years. And by 2029, Gartner projects that more than 15 billion IoT devices will be connected to the enterprise infrastructure. In order to gain significant competitive advantages, organizations must be successful in taming increasingly unwieldy data and mining insights to improve the customer experience in new and dynamic ways.
Enterprise Challenges for Managing Data
The volume, velocity and veracity of this data deluge has put immense pressure on underlying data platforms and organizations’ abilities to manage them effectively. And the pandemic has only exacerbated the problem. According to a 2021 survey, nearly half of digital architects are under high or extremely high pressure to deliver digital projects, but 61% blame legacy technology for making it difficult to complete modernization efforts.
That said, databases of all types—SQL, NoSQL, or NewSQL—be they on-prem, cloud, hybrid, or edge, are struggling to navigate this new reality. As databases become increasingly distributed, additional operational complexities are introduced. When there’s a lack of automation, service levels (e.g., reliability, availability, and security) of business-critical data services are negatively impacted, inevitably leading to customer dissatisfaction and revenue loss.
Though the need for database infrastructure automation is acknowledged by modern enterprises, the complexity and cost have created hurdles to adoption. As a result, innovative companies with adequate resources jumped on the opportunity to step in and solve this problem for their customers.
Kubernetes Emerges to Augment Database-as-a-Service
Cloud vendors such as AWS, GCP, and Azure seized this opportunity and undertook automation at huge scale and offered fully managed database-as-a-service (DBaaS) to customers. In some cases, existing open-source versions of other database stacks were used as the underlying data platform, bundled with a cloud service provider-specific automated management functions. Not to be left behind, database vendors (e.g., Couchbase, MongoDB, DataStax, Redis Labs, and others) joined the DBaaS fray and offered this service on cloud, in addition to their on-prem offerings.
The core automation and orchestration framework for some of these DBaaS implementations were based on Kubernetes and Kubernetes Operators, which are extensions to Kubernetes to manage stateful applications such as databases. Select set of these vendors made their database operators for Kubernetes available to the general community. Some customers took advantage of this and chose to do “Databases on Kubernetes” on their own. This was done to meet their specialized workload profiles and their CI/CD pipelines, instead of subscribing to generic DBaaS solutions.
Today, databases on Kubernetes Operator have emerged as a standard automation platform with huge potential and popularity.
Kubernetes Operator Fuels Enterprise Automation and Scalability
In the early 2010s, we faced a similar problem with demand spikes on user web traffic. Earlier approaches, rooted in applying well-known centralized architectures from the client-server era, did not scale. Big, centralized applications could not provide the flexibility required for on-demand scaling to meet sudden spikes in user traffic.
Fortunately, the advent of microservices running inside light-weight containers—that were easy to horizontally scale—helped alleviate the situation. The shift in architecture was aided by the emergence of Docker containers and container orchestration tools such as Kubernetes. Introduced publicly in 2014, Kubernetes, formerly known inside Google as Borg, quickly emerged as a top candidate for container and microservices automation. Today, Kubernetes is an industry standard for modern software management pipelines and a key cloud-native tool promoted by the Cloud Native Computing Foundation. Echoing the idea of automated software installation packages, Kubernetes not only abstracts away specific infrastructure implementations but also automates environment creation and deployment procedures.
Predominant use cases of Kubernetes-based automation were still restricted to “stateless” applications—the web layer, business logic layer and other compute centric modules. The ephemeral nature of containers was better suited to such stateless application layers. Today, most organizations that use Kubernetes trust it to run at least 50% of their overall workloads, based on findings from the 2021 DOK Survey.
However, “stateful” applications such as databases were not easily managed through vanilla Kubernetes frameworks until the advent of Operator extensions to Kubernetes.
Kubernetes Evolves to Support Stateful Applications
The transient nature of containers make it difficult to manage stateful applications. Also, databases rely on underlying storage for persistent data. When containers move between nodes, the storage needs to follow the containers to get mounted appropriately and quickly. In the earlier versions of Kubernetes, there were no well-defined semantics to resolve these management challenges.
An important milestone in the evolution of Kubernetes was support for stateful sets (API object for stateful applications) and persistent volumes (manage storage). These two features made it possible to consider automation of stateful applications such as databases.
Human operators and DBAs were dealing with scaling, backups, patching, and other routine database maintenance tasks outside of the normal developer workflow. The labor-intensive nature of these operations tasks were not easily scalable, and moreover, they introduced unforced errors.
Some of these operational tasks were specific to the database platform (i.e.., different tasks were better suited for different databases). This required a way to encode such logic as extensions into Kubernetes, so that the users are able to simply access and automate such management functions using declarative methods.
The Operator extensions to Kubernetes happened to be an excellent framework to do exactly this—custom automation tasks for databases. This allowed database vendors to automate tasks for life cycle management of database fleets, including provisioning, monitoring, upgrading/patching, backup/restore, rebalancing and more.
Employing elements of control theory, operators work as Kubernetes plugins/extensions and use custom resources to define and control the state of organizational services. Eventually, building database environments with declarative custom resource definitions became a norm. This meant that operators were able to read database service definitions and not only create them, but also monitor them, as well as any drift from the desired state, which could be adjusted dynamically. Additionally, database environments were standardized, described in a declarative language, and worked as other modern deployment pipelines, such as Infrastructure as Code. The earlier high-ops nature of database fleet management became part of the light weight GitOps/DevOps model. The ability to detect unusual events in the database cluster and react to those events in real-time (dynamically) is often referred to as self-detect and self-heal, or in other words, autonomous operator.
Couchbase Autonomous Operator was one of the first NoSQL products to utilize this framework for database operations automation. Many other database vendors and community-driven database operators have also emerged (check out the growing community here). Even communities and interest groups have sprouted and flourished around this technology. One such example is the DOK (Data on Kubernetes) community. As this ecosystem burgeons, the future looks bright for this technology and associated communities.
The rise of DevOps, DBaaS, Kubernetes, and Operators have created a compelling end-to-end platform for distributed applications. Developers need not worry about how their code is deployed, or how different components communicate with each other. Instead, developers can concentrate on the data and the logic to get insight and decision-making abilities for the business. Finally, the same consistent tool/framework can be used for managing all layers of the application stack, including the mission-critical database layer.
The new frontiers to cross for this technology include elegant management of hybrid and multi-cloud database clusters, solving multi-tenancy effectively and securely, adding first class support for AI/ML based data analysis to the ecosystem and finally, reducing the latency overhead so that the experience for an end-user is instant and frictionless.
Freeing important resources for any organization from the mundane labor-intensive tasks through automation creates space and time for innovation and further progress. More widely, the opportunities for enterprises adopting Kubernetes provide competitive advantages that can differentiate organizations from competitors in an increasingly crowded business landscape.