Newsletters




Sponsored Content: AI Agents are Querying Your Document Data, Here’s What They Can See 

Page 1 of 2 next >>

By Donovan Kinney, technical enablement specialist at 3T Software Labs

We hear a lot about AI’s benefits to productivity for data teams, but less about the potential risks. 

The moment the problem becomes real usually looks something like this: an engineer connects an AI assistant to a NoSQL database, such as MongoDB, through an MCP server or a similar tool-calling interface. At the time, they consider it safe to grant the AI direct authentication with their own credentials instead of using best practices such as delegated authorization. It takes twenty minutes. The agent can now query live data, return results, and feed them into whatever workflow it's part of. Nobody updates the access policy, scopes the authorization down, or asks what the agent can actually reach.

Across many industries there is an accelerating and deeply concerning trend toward ungoverned, over-privileged AI access to data such as in MongoDB deployments, where agents can see everything the connection string allows. Research by Kurtz and Krawiecka found the industry is experiencing an identity explosion: machine and agent identities outnumber human identities by ratios exceeding 80 to 1, yet the mechanisms used to secure them are severely lagging.

The Access Model was Designed for Humans

The access model was designed assuming a human would be on the other end. AI agents aren't human, and their lack of context and knowledge of what not to do is where the exposure lives.

MongoDB's access controls are well-designed for their intended user, like an engineer who knows the schema connecting to a production environment. They've likely been briefed on which collections contain sensitive data, and can exercise judgement about what to query, when, and why (even when the tooling doesn't enforce that).

This implicit governance layer is invisible until it disappears. AI agents don't have institutional knowledge, or even if it’s within their context window, may not have any directives in their prompt to apply it toward data safety. They don't know that the customers collection contains PII, that correspondence_history shouldn't be surfaced to external systems, or that the accounts collection is subject to data residency requirements. AI agents just follow instructions to query or even change data according to their interpretation of what they’ve been prompted to do, constrained only by the authorization they’ve been granted. 

What an AI Agent Actually Sees

A realistic MongoDB customer data collection may contain name, email address, date of birth, account number, correspondence history, and a handful of internal flags the engineering team uses for support triage. An engineer querying this collection knows to pull only what they need. They wouldn't return the full document to a user-facing interface without stripping sensitive fields first.

But, an AI agent asked to "find recent signups" or "get customers in this region" may return the full documents including all fields since MongoDB access control does not offer per-field privileges. This could include, for example, the date of birth, account number, and correspondence history containing free-text notes that a support agent wrote three years ago.

That's the default, and the exposure runs across three tiers, each worth thinking about. 

What the agent can read is determined entirely by the connection credentials and whatever resource-level permissions are in place. In most engineering tool configurations, that's broad as agents inherit access configured for professional use, not governed access.

What the agent can return is whatever it reads. The recipient could be a user in a chat interface, another system in an automated pipeline, or a context window that persists beyond the session and may be logged by the AI provider. There's no implicit filter between a query result and its destination.

What the agent can write or modify depends on whether write access has been explicitly granted for a resource. This is the tier most teams have thought about. The read tier is where the gap typically is.

Why the Obvious Fixes Have Gaps

There are three things teams typically reach for, and each falls short in the agent context:

Read-only access

While this is necessary, it’s not enough. Granting only the role-based permission to read (and not to ‘readWrite’), prevents the agent from making changes to data, but has no impact on what it can read and surface. An agent with read-only access to a customer collection can still return every PII field it contains. 

Application-layer masking

Works when the agent is going through the application, but agents using tool-calling interfaces often aren't. If an agent calls a MongoDB tool directly or even via MCP, it may bypass the application layer entirely, and any masking logic that lives there goes with it.

Existing access policies

As mentioned above, access policies written for human users assume human judgement. "Don't return full customer documents to external interfaces" is a reasonable instruction for an engineer. It's not a constraint that applies to an agent as they act according to permissions, not policies.

Page 1 of 2 next >>

Sponsors