Business Intelligence with MongoDB

MongoDB has become a favorite among developers in no small part because of its alignment with modern software development practices.  Its flexible schemas are compatible with agile software development and the JSON-based document structure is well-matched with modern JavaScript-centric web architectures.

However, databases don’t exist solely for the convenience of software developers: Data in a database is a critical business asset.  Data is used to support decision making, operational efficiency and perhaps become the basis for competitive data driven products.

Prior to the development of the relational database, data was typically “locked” in pre-relational systems to which only programmers held the keys.   Extracting data meant requesting a new COBOL program and IT departments were typically swamped with such reporting requests.  The SQL language unlocked these data assets and a generation of Business Intelligence (BI) tools leveraged the ubiquitous SQL interface to allow data to be leveraged by all who needed it.

No one wants to go back to the pre-relational era, but by default MongoDB databases are inaccessible to those who cannot master JavaScript syntax.  The MongoDB company has released a commercial graphical product – “Compass” – that allows for data query without JavaScript, but this is still disconnected from modern BI products such as Tableau or even Excel.

Obviously, the solution is to provide SQL access to MongoDB databases.  One option for doing this is the MongoDB BI connector, included with the MongoDB Enterprise Advanced product.

The BI connector presents a MySQL compatible interface to selected MongoDB collections.  BI tools that can connect to MySQL can interrogate the connector as if it were a MySQL database.

Of course, MySQL is a relational system, while MongoDB collections consist of JSON documents that do not necessarily have a tabular structure.   However, the BI connector can perform normalization of MongoDB documents into a relational format.   For instance, if a MongoDB collection has a nested array of “child” documents, the BI connector will create two virtual MySQL tables – one representing the collection and another for its nested “children.” Users can perform a join between the two tables to consolidate the data.   Users can also perform a join between two “tables” derived from distinct connections as well as performing other SQL queries such as GROUP BY or UNION.

Under the hood, the BI connector is translating most of the SQL operations into aggregation framework commands.  The aggregation framework is a JavaScript interface which allows complex queries including grouping operations and limited joins between collections.

The MongoDB connector will usually perform acceptably for operations that involve only a single collection, even if it appears as multiple MySQL “tables”.  Unfortunately, performance can be disappointing when joining across large collections or when attempting complex SQL operations such as sub-queries. 

There are several third-party alternatives to the MongoDB BI connector.  The Apache Drill project provides a SQL engine that can query across multiple disparate systems, both SQL-based and NoSQL.  Drill can operate across multiple distributed data stores such as HDFS or Amazon S3, relational databases that support JDBC or ODBC, as well as NoSQL systems such as MongoDB and HBase.  Drill’s architecture is based on Google’s Dremel system which provides the foundation for the Google BigQuery product. 

Drill’s syntax is a little idiosyncratic but its FLATTEN operator can be used to “tabularize” embedded documents in MongoDB collections. Drill has the advantage of being highly scalable and can parallelize query execution across multiple servers.

Several other SQL engines can interact with MongoDB.  Presto – developed at Facebook – is conceptually similar to Apache Drill and also has a MongoDB connector while the Teradata QueryGrid product allows MongoDB data to be queried from within a Teradata Data warehouse.  There are also a multitude of products allowing MongoDB data to be migrated into Data Lakes built on Hadoop or similar technologies. 

People don’t choose MongoDB for its business intelligence or data warehousing features, but luckily there are options for accessing MongoDB data using SQL.