A long, long time ago—in the early 90s—I first worked as a DBA with responsibility for enterprise databases. I will never forget how surprised and disappointed I was to discover that the data in the database files were completely unencrypted. A system administrator with access to the database files could read the data in those files, even if they didn’t have a database username and password.
This seemed like a huge security hole to the younger me and reinforced for me how little protection enterprise databases had from system admins and DBAs.
Things improved slightly over the next few decades as database vendors introduced “encryption at rest” features that would prevent an attacker from bypassing database security. However, the keys to the encryption were typically kept on the same host as the database, and a determined attacker might still get access to those compromised keys. Since an attacker would have to break into the database server, most companies felt that their border security would mitigate any serious breaches of this type.
However, an adversary that gained root access to a system and the encryption keys could conceivably read all database contents. Even worse, without the keys, the contents of some unencrypted data would probably be present in memory since the database would often decrypt the data while retrieving it from disk and loading it into memory. And of course, a DBA with full access to a database would have access to the unencrypted data.
Although these issues were significant in the age of the on-premise database, they became critical as enterprise databases migrated to fully managed cloud services. Virtually, all fully managed cloud databases offer encryption at rest, but the keys are usually provided by—or at least known to—the cloud service provider. Consequently, the cloud provider would potentially be able to access even the encrypted data within a database. And while cloud service providers might, overall, be trustworthy, it became impossible for organizations to fully control database access.
In response to demand for a more robust encryption solution, MongoDB introduced Client-Side Field-Level Encryption (CSFLE) in version 4.2. CSFLE encrypts and decrypts data before it reaches the database server, ensuring unencrypted data is never exposed on the server side. Although somewhat complex, CSFLE is a big step forward, particularly for fully managed cloud databases.
If data is deterministically encrypted—where the same input always leads to the same encrypted output—then we can issue searches on encrypted fields. So, for instance, we could encrypt a social security number and then issue a search for that SSN without issue. However, because the encryption is deterministic, it may be possible to decode encrypted repeated text. For instance, if we see the encrypted value for a Social Security number in one collection, we might be able to find matching records in another collection.
However, if data is randomly encrypted—where the same inputs do not result in the same outputs, then it’s difficult to perform searches on the encrypted data since the driver cannot predict the encrypted values.
This is where MongoDB Queryable encryption—currently in preview in MongoDB 6.0 comes into play. This new feature uses innovative cryptographic techniques—a “functional search index based on a novel Structured Encryption construction” —to allow the driver to generate an encrypted search key that the database can use to retrieve the correct data. This allows us to use much stronger randomized encryption of fields that must be queryable.
Database encryption has certainly come of age since my days as a young (well, younger) DBA. Queryable encryption is a big step forward for MongoDB and places the database at the forefront of secure computing platforms.