MongoDB, meet Kubernetes

Kubernetes—together with Docker—has transformed the way in which distributed systems can be packaged and deployed.

In the old days, deploying a multi-node distributed system was a major endeavor. Typically, specialized hardware would be required, and dedicated network configuration required to allow the nodes to communicate. Then a laborious process of software installation and configuration would be required to establish distributed system. A timeframe of weeks or months would typically be required.

The advent of virtualization reduced some of this complexity. Instead of the acquisition of dedicated hardware, one could simply spin up a new virtual machine with the appropriate configuration. However, all of the complexity of distributed software installation remained.

Docker provides a more convenient model for virtualized environments but doesn’t itself solve any of the issues relating to the interactions between containers. However, by providing a standard model for deploying small virtualized components— containers—Docker provides the building blocks for a better mode of defining and deploying distributed systems.

Kubernetes has emerged as the most popular framework for orchestrating container-based applications. Orchestration involves the creation, configuration, scaling, and management of multiple containerized micro-services to deliver an integrated distributed application. Using Kubernetes, complex application topologies can be deployed and migrated with ease.

Database servers traditionally stood somewhat outside the scope of virtualized and containerized environments. The traditional monolithic database servers required dedicated access to high-speed persistent storage, which was difficult to provide in virtualized environments. And because legacy database servers tended to run on single large hosts, they did not easily fit within the scale-out architecture of containerization.

However, modern databases are increasingly distributed systems comprised of multiple processes running across multiple nodes. Cassandra, CockroachDB, and MongoDB are all examples of databases that are rarely deployed as single-instance database servers.

With MongoDB, even the simplest deployment is typically a replica set—consisting of a single primary node and multiple secondary nodes. More advanced MongoDB deployments comprise multiple “sharded” replica sets together with router (mongos) processes and a special replica set containing configuration information. The simplest MongoDB sharded deployment would consist of at least 10 processes running on as many nodes. 

Setting up a distributed MongoDB cluster by hand is a complex and error-prone process. However, MongoDB provides a Kubernetes operator, which allows such a deployment to be established within a Kubernetes cluster amazingly easily. The “operator” is a controller program that runs within the Kubernetes cluster and contains the MongoDB-specific logic for establishing MongoDB cluster topologies. One need only supply the operator with a configuration file, and the operator will do the rest—creating and configuring MongoDB nodes, setting up best-practice security, and handling the connectivity between nodes.

Using the Kubernetes operator, one can launch a complex sharded MongoDB cluster in a matter of minutes.  And since Kubernetes is available on all cloud platforms, on-premise, and even on a desktop, one can configure test and production environments with ease. Furthermore, a Kubernetes-based MongoDB cluster can scale within the constraints of the physical Kubernetes platform on demand. Kubernetes also monitors the health of containers and will restart any failed nodes.

The MongoDB corporation provides a Kubernetes operator for the community edition of MongoDB, and another operator for the Enterprise edition. The enterprise edition operator includes support for enterprise features such as backup and advanced security features.

For those that are using the MongoDB Atlas cloud service, MongoDB provides a Kubernetes operator for Atlas as well. This operator allows access to Atlas cloud-based services from within the Kubernetes environment. Although Atlas itself does not run within the Kubernetes cluster, the operator allows applications within a Kubernetes environment to provision Atlas database services.

Percona also offers a Kubernetes operator to support their distribution of MongoDB. The Percona distribution of MongoDB is an open source distribution that includes many features of the MongoDB Enterprise edition. Percona provides professional services based around support of this distribution. The Percona Kubernetes operator supports all of the features of the MongoDB corporation operators together with some additional functionality. The Percona operator is definitely worth investigating if you are looking to run MongoDB within a Kubernetes environment.