MongoDB Atlas Auto-Index Creation

As you may know, an index is a database object with its own storage that provides a fast access path into a collection. Indexes exist primarily to enhance performance, so using indexes effectively is paramount when optimizing MongoDB performance.

In MongoDB, indexes are essential to enable fast lookups, implement joins (via $lookup), and optimize sorts. Compared to traditional SQL databases, MongoDB relies more on indexes to optimize sorts because of the tight default limits on the amount of memory that a sort might consume. Joins implemented via the $lookup command also need indexes, as there's no equivalent of the SQL sort-merge or hash-join algorithms—to be efficient, a MongoDB join must employ an index.

Not all indexes are created equal. Indexes that include all of the attributes in a query and which are based on highly selective attributes (attributes with many distinct values) can be hundreds or thousands of times more effective than indexes that include a subset of query attributes or are based on non-selective attributes.

Since time immemorial—or at least all the time I've been working with database systems—creating the best set of indexes has been a primary goal of database performance tuning.  However, I've often wondered why databases need a human being to recognize that a given index would be advantageous. Wouldn't it be better if the database recognized that an index is required and even auto-created it?

MongoDB Atlas auto-indexing does just that. MongoDB auto-indexing analyses the recent query workload and automatically creates indexes where it believes they are appropriate.  Atlas identifies slow queries running within the instance to see if an index might accelerate the query performance.

Slow queries are defined as those that take more than 100ms to complete. For each of those queries, Altas considers the number of scanned documents compared to the number of documents returned. When the number of documents scanned is much higher than the number returned, no existing index is probably sufficiently selective. 

Atlas also must consider how the new index will affect existing indexes. In many cases, a new index might cause an existing index to become redundant. For instance, if an index on columnA and columnB is created, a proposed new index on columnA will no longer be necessary so that Atlas won't create both indexes. However, if an index on columnA already exists, MongoDB will not delete it, so it's still possible that redundant indexes will be created.

The advantages that indexes give to queries are not cost-free. Each index creates overhead for DML operations (insert, delete, and sometimes updates). For instance, in a simple test, I found that insert time increased by about 20% for each additional index created. Atlas will create at most four indexes on each collection to avoid creating too much DML overhead.    Even with this limitation in place, you should consider DML's relative importance to query performance before turning auto-indexing on.  

Watching auto-indexing work is exciting. As the new index is created, query performance accelerates dramatically and does so without any manual intervention. I'm pleased to see that this feature—which I first dreamed about more than 20 years ago—become a reality.