Newsletters




SAP and Google Cloud Collaboration Advance Enterprise AI


SAP and Google Cloud are strengthening their partnership, enabling SAP to provide the next wave of enterprise AI by contributing to the new Agent2Agent (A2A) interoperability protocol, which establishes a foundation for AI agents to securely interact and collaborate across platforms.

This work is complemented by two additional areas of progress: first, the expansion of Google Gemini models in SAP’s generative AI hub on SAP Business Technology Platform (SAP BTP); second, the use of Google’s video and speech intelligence capabilities to support multimodal retrieval-augmented generation (RAG) for video-based learning and knowledge discovery in SAP products.

Together, these efforts reflect a shared commitment to deliver enterprise-ready AI that is open, flexible, and deeply grounded in business context, according to the vendors.

This open standard is designed to ensure agents from different vendors can interact, share context, and work together—enabling seamless automation across traditionally disconnected systems.

This is the kind of cross-platform collaboration the A2A protocol is designed to enable: AI agents working together to accelerate business outcomes, reduce friction, and enable people to focus on more strategic work. It also reinforces SAP’s vision for Joule as an agent orchestrator working across enterprise workflows: interoperable, proactive, and deeply connected to business context.

Beyond agent interoperability, SAP is furthering its commitment to openness and flexibility by expanding access to Google models in the generative AI hub, a key capability of the AI Foundation on SAP BTP.

Through the generative AI hub, customers gain enterprise-grade access to a curated portfolio of leading foundation models. That portfolio now includes Google Gemini 2.0 Flash and Flash-lite, which join the existing support for Gemini 1.5 models already available through the hub.

This expanded model choice gives customers the flexibility to build and extend AI-driven solutions using high-performance, low-latency models optimized for enterprise workloads —while staying within SAP’s secure, business context-rich environment.

As part of the continued collaboration with Google Cloud, SAP is also advancing multimodal RAG, a highly requested capability among SAP customers, especially for video-based learning content.

Multimodal RAG enhances information retrieval and generation by integrating multiple data modalities—text, images, audio, and video—into a single, structured process. This approach enriches knowledge sourcing and elevates how users interact with training and support material, according to the vendors.

To address the complexity of extracting meaningful insights from video content, SAP leverages Google Video Intelligence for on-screen text detection across video frames, and Google’s Speech-to-Text API for accurate transcription of spoken audio. During the indexing process, these outputs are stored with corresponding timestamps, creating a structured foundation for retrieving relevant video segments with precision.

“As agentic AI evolves, seamless handling of multi-modal data—text, voice, enterprise videos, and images—becomes paramount,” said Miku Jha, director of AI/ML and Generative AI at Google Cloud. “This introduces significant challenges for agent interoperability. An open protocol like A2A is therefore indispensable, providing the necessary framework and flexibility for agents to effectively communicate and collaborate across these diverse modalities. Multi-modality is not simply a capability; it is a foundational requirement driving the next generation of interconnected agentic systems.”

These efforts reflect a broader strategic alignment between SAP and Google Cloud: a shared belief in AI that is open, composable, and grounded in real business context, according to the vendors.

For more information about this news, visit www.sap.com


Sponsors