Schema Design in MongoDB

Document databases such as MongoDB garnered rapid and strong support from application developers for multiple reasons. But clearly, one of those reasons is the support for flexible schemas.

In the RDBMS world, the schema—the definition of the tables and columns which make up a database—is relatively fixed. Applications cannot simply add columns to tables or —usually—create tables on the fly.

In contrast, MongoDB allowed developers to create collections implicitly and to add attributes to those collections on demand. This worked very well with agile development and DevOps because when the definition for the database structure is stored in code, changes to the database's schema could be implemented simply by committing new code to the version control system.

Schema Design in MongoDB

However, it's a misconception to believe that schema design is less important in MongoDB than in the RDBMS world. The performance limits of a MongoDB application are largely determined by the document model that the application implements. The amount of work that an application needs to do to retrieve or process information is primarily dependent on how that information is distributed across multiple documents. In addition, the size of documents will determine how many documents MongoDB can cache in memory. These and many other trade-offs will determine how much physical work the database will have to do to satisfy a database request.

Therefore, the design of a schema in MongoDB is just as important as it is in RDBMS.  Indeed, schema design can be even more complicated in MongoDB. At least in SQL databases, we have the "first normal form" representing the starting point for a well-designed first cut data model. In MongoDB, we have more choices, but as a consequence, we have more potential pitfalls.

There are a wide variety of MongoDB schema design patterns, but they all involve variations of these two approaches: embedding everything in a single document or linking collections using pointers to data in other collections. Most applications will employ a mix of linking and embedding.

Creating the Perfect Data Model

It takes judgment, experience, and experimentation to create the perfect data model and unfortunately, there are all too few tools around to help the MongoDB data modeler. 

A third-party data modeling tool Hackolade has been growing in popularity over the last few years. It started as a MongoDB–specific tool to help visualize and implement MongoDB data models. In recent years, it has added support for other NoSQL databases, for relational databases that implement JSON support, and even for API definitions. Hackolade is particularly popular in larger companies, where enterprise data teams seek to understand all of the enterprise's data assets across many disparate databases.

Flexible schemas create an additional challenge for application designers. In a fixed schema database, all of the data in the database must conform to the current version of the schema. However, in MongoDB, a collection may include documents that are structured differently, depending on their origin date. As code evolves the schema, there may be "legacy" documents that are still mapped to the older design.

There's no one solution to these versioning challenges. In some cases, a bulk migration to a new schema might be justified—even if it causes downtime or diminished performance. In other cases, the schema may be adjusted dynamically at the code layer. Or, the code can simply understand how to deal with older schema versions.  Each of these approaches has merits, but it's essential that the application architect understands which approach is going to be used.

MongoDB's flexible schema has a lot of advantages and is clearly popular with modern application developers. However, flexible schema does not mean no schema, and those wishing to develop performant and maintainable applications with MongoDB need to exercise the same diligence in schema modeling as for any other database system.