MongoDB 6.0 Introduces New NoSQL-SQL Engine

In previous columns, we’ve noted that the SQL language is in the ascendant. New SQL native databases such as CockroachDB and Yugabyte are showing robust adoption, while non-relational (NoSQL) databases increasingly provide SQL interfaces to their data.

In light of this increasing trend, it’s no surprise to see the introduction of a new SQL capability within the latest release of MongoDB—the Atlas SQL framework. 

MongoDB has supported a SQL “bridge” for some time. The MongoDB BI connector provides a means by which business intelligence tools can read MongoDB data via SQL. The BI connector appears to SQL clients as a MySQL database.

The chief problem in allowing SQL queries to run against MongoDB collections is that SQL expects data to be organised relationally—represented in tabular format. In MongoDB, embedded arrays are often used where a SQL database would use separate parent and child tables.  The BI connector solves this problem by logically restructuring the MongoDB collections during an initial setup.For instance, if a MongoDB collection has a nested array of “child” documents, the BI connector will create two virtual MySQL tables—one representing the collection and another for its nested “children.”  Users can perform a join between the two tables to consolidate the data.  Users can also perform a join between two “tables” as well as other SQL queries such as GROUP BY or UNION.

The BI connector is okay when the MongoDB collections have predictable structures but fails to deal well with evolving schema designs. Furthermore, the BI connector suffers from performance limitations, partially due to having to flatten then un-flatten the data during everyday operations.

The new MongoDB SQL attempts to get around these problems. Most significantly, there is now no need to create a relational schema in advance of writing a query. SQL queries can manipulate nonrelational structures within MongoDB collections using operators such as FLATTEN and UNWIND.  These operators allow the query to expand arrays and nested documents within a single statement.  Theoretically, this provides better flexibility and performance.  

It remains to be seen how this new SQL interface performs. As with the BI connector, the SQL is ultimately resolved by being translated into aggregation framework commands. The aggregation framework is powerful, but experience tells us that performance optimisation is more difficult whenever one language is converted into another before being executed.

Initially, Atlas SQL framework only supports JDBC driver connections and the Tableau BI tool. An upcoming ODBC driver should extend the framework’s reach and usefulness. 

The SQL dialect is currently based on the SQL-92 ANSI standard. While it is true that SQL-92 represents a widely implemented “core” of the SQL language, it is nevertheless 30 years old.  The absence of widely implemented features from subsequent ANSI specifications—such as the SQL2003 analytic functions—is lamentable. However, I’d be surprised if MongoDB did not evolve the language specification in subsequent releases.

The most significant restriction on this new capability is that it is limited to MongoDB’s Atlas cloud. While it is perfectly understandable that MongoDB wants to provide as much incentive as possible for customers to migrate to the Atlas, it also remains true that there are vast deployments running on-premise or on non-Atlas cloud platforms.  For those deployments, the BI connector will remain the only means of SQL access.

In the early days of NoSQL databases, some believed that SQL’s time had passed. They expected BI tools and analysts to learn to speak natively to NoSQL databases. By and large, that hasn’t happened. SQL remains the dominant language for data analysis. It’s great to see MongoDB improve its SQL capabilities, even though it’s disappointing to see those capabilities limited to their proprietary Atlas cloud.