Bias, Transparency, and the Future of AI: Data Summit 2024 Day 2 Keynote

On the second day of the Data Summit 2024 conference, Beth Rudden, CEO,, and Lalit Ahuja, chief technology officer, GridGain Systems, Inc., explored themes surrounding modern data management challenges—including AI-driven versioning, implementing taxonomies and ontologies, going real time, and more.

The annual Data Summit conference returned to Boston, May 8-9, 2024, with pre-conference workshops on May 7.

Rudden’s section, entitled “Mastering the Data Evolution: AI, Graph Modeling, & Tactical Curation,” focused on confronting the toughest data management challenges head-on. With an innovative approach, Rudden empowered attendees to effectively select, organize, and manage the right datasets, ultimately driving the creation of robust diverse data architectures that are strong enough to withstand rapid technological evolution.

Rudden sought to convince attendees of one thing: Adding versioned data and ontologies to explainable AI can revolutionize how we interpret AI behavior. With this thesis, Rudden declared that explainability and transparency is key in understanding AI.

“The shame of not knowing has driven me my entire career,” said Rudden. “People are afraid to ask questions about AI.”

Rudden ruminated on data and reality, explaining that all data is an artifact of the human experience. They further explained that an ontology, then, is the study of the nature of your reality based on the language you use.

Our current reality is defined by the potential of personalization; AI promises to bring this personalization to the processes we conduct every day.

The envisioned future, Rudden explained, is summed up by a quote from Amy Webb: “This is the worst that our technology will ever be.” Meaning, while the AI ball is rolling fast, the promise of today is that AI will only continue to get better.

Defining AI, Rudden explained that artificial intelligence is a system capable of simulating human intelligence and thought processes. However, humans are full of bias—188 cognitive biases (and counting), in fact.

To combat these biases, information must be organized well—through knowledge graphs—and transparent enough to carry inherent contexts that help us understand the reasoning behind AI.

For transparency, “we can use ontologies to carry metadata in a way that nothing else can,” said Rudden. “AI systems that leverage ontologies can explain the logic behind AI reasoning. Using ontologies as a ‘Rosetta Stone’ of AI helps us understand our reality.”

Rudden transitioned to the importance of versioning, which ensures auditability and traceability of data and transformations, enhancing trust and compliance. Being able to go back in time and render lineage and provenance for every prediction made by the AI offers proof of authenticity, according to Rudden. This is part of that vital transparency necessary to nurture AI to be the technology we envision it to be.

However, Rudden offered a harrowing warning: Without transparent and versioned data, humans will continue to solely rely on information that confirms their existing beliefs or biases.

Ultimately, in this way, AI has the potential to “exacerbate the continuation of societal bias…[and] this will continue unless we demand lineage and providence. You have to be a part of the ecosystem to be able to change it.”

Being part of that ecosystem involves asking questions, unabashedly, forcing AI to become a transparent process, inviting user trust by showing how evolving knowledge shapes AI decisions.

Ahuja, in their presentation, “Modern Data & Analytics Architecture: Solving the Real-Time Challenge,” explained that today’s high-speed operational and AI-driven decision making necessitates ultra-fast analytics.

The definition of “relevant” data has changed, noted Ahuja. This evolution has rendered traditional data architectures as poorly optimized for processing streaming data, and the advantages of real time are often lost amid the analytics of far-off datastores or data lakehouses.

Data problems are multi-dimensional, involving data processing and aggregation, requiring low latency, and high-performance compute, all while managing scale. While there are many options to solve one or two data problems, “but what is missing is a solution that addresses them all,” explained Ahuja.

A unified, real-time data platform is the solution to this challenge, offering the following advantages:

  • Capable of storing the data, executing stream processing, and hosting app logic and various types of analytics
  • Addresses multi-dimensional business needs
  • Executes complex workloads against streaming and transactional data
  • Integrates with multiple systems of record and makes curated data available in real time
  • Enables advanced analytics and ML/AI-based decisioning in real time
  • Achieves sub-second latencies at massive scale

GridGain’s unified, real-time data platform empowers enterprises to make well-informed decisions at the rapid speed of business, solving a multitude of problems and eliminating a variety of latencies. With a simplified, non-intrusive architecture, GridGain supports transaction processing, stream processing, advanced analytics, and AI/ML operations, behaving as a data hub and system of record.

Many Data Summit 2024 presentations are available for review at