Google I/O 2026: AI agents take centre stage

Google unveils Gemini Omni and Spark to pivot towards AI agents
Gemini Omni enables multimodal video editing via natural language
Spark agent rolls out to Gemini AI Ultra subscribers as a beta test

Summarized by AI ⓘ

Mastering AI

SEE ALL

Feedpost Specials

Enterprise AI's Rocky Road: Why 43% of Initiatives Might Stall

NewsBytes

Anthropic's Claude Mythos appears on Claude Code, finds 10,000 bugs

NewsBytes

OpenAI hiring to check self-improving AI, pays up to ₹3.7cr

What is the story about?

Google I/O 2026: AI Agents Take Center Stage with Gemini Omni and Spark

AI's Next Frontier

At Google I/O 2026, CEO Sundar Pichai underscored the company's unwavering commitment to artificial intelligence, specifically highlighting the transformative

potential of AI agents. He emphasized that AI is no longer just about processing information but is evolving into systems capable of reasoning, creating, and actively performing tasks. This strategic pivot marks a new era, with Google processing an astounding 3.2 quadrillion tokens per month, a sevenfold increase from the previous year's Google I/O. This surge in AI usage reflects Google's continuous innovation, from integrating Gemini Intelligence into millions of Android phones to enhancing Google Maps with conversational search capabilities. Pichai articulated Google's long-standing AI-first vision, emphasizing its role in advancing their mission and improving lives at a global scale. This vision is underpinned by a comprehensive, full-stack approach to AI development, encompassing custom silicon, robust foundational security, pioneering research, and sophisticated models, all culminating in products and platforms that reach billions of users worldwide. The conference served as a crucial platform to communicate this evolving strategy, reinforcing Google's position at the forefront of AI development and its dedication to leveraging technology for human betterment.

Gemini Omni's Creative Power

A major highlight of Google I/O 2026 was the introduction of Gemini Omni, a sophisticated natural language creation model. Described as the point where Gemini's reasoning meets its creative prowess, Omni allows for unprecedented multimodal input, accepting combinations of images, audio, video, and text. This allows for the generation of high-quality videos grounded in Gemini's extensive real-world knowledge. The implications for users are immediate and impactful, with Gemini Omni rolling out across various platforms including the Gemini app, Google Flow, and YouTube Shorts. Koray Kavukcuoglu, CTO of Google DeepMind, elaborated on Omni's capabilities, noting its ability to facilitate effortless video editing through natural language commands. Users can provide sequential instructions, and the model maintains character consistency, realistic physics, and contextual memory within scenes. This means a user can edit a video by simply describing the changes they want, with the AI understanding and executing each instruction in relation to the previous ones, making complex video manipulation accessible and intuitive. This advancement signifies a leap in generative AI, bridging the gap between human creative intent and AI-driven content creation.

Gemini 3.5 Flash: Agentic Execution

Complementing Gemini Omni's creative focus, Gemini 3.5 Flash was presented as a powerhouse for task execution within agentic workflows. Jeff Dean, Chief Scientist at Google DeepMind and Google Research, explained that this model merges cutting-edge intelligence with practical action capabilities. Gemini 3.5 Flash is designed to deliver performance comparable to larger flagship models, but at the characteristic high speeds associated with the Flash series. It has been recognized as Google's most capable agentic and coding model to date. In rigorous benchmarks, such as the Terminal-bench 2.1 agentic terminal coding benchmark, Gemini 3.5 Flash achieved a score of 76.2%, outperforming its predecessors and competitors like Anthropic's Claude Opus 4.7, though closely trailing OpenAI's GPT-5.5. Google's internal data further indicates that Gemini 3.5 Flash excels in complex, multi-step workflows, general tool utilization in real-world scenarios, financial analysis, interpreting intricate charts, and multimodal reasoning. This positions it as a critical tool for developers and users requiring efficient and intelligent task automation.

Spark: The Active Partner

The Gemini app is undergoing a significant transformation with the introduction of the Gemini Spark agent, a key component of Google's agentic push. Josh Woodward, VP of Google Labs for Gemini app and AI Studio, highlighted that Spark represents a fundamental shift, moving the Gemini app from a reactive assistant to a proactive partner that actively works on behalf of the user, under their direction. With a substantial user base of 900 million, the Gemini app is poised to leverage Spark's capabilities. Being web-based, Spark is designed for background task execution and seamless integration with other Google applications, enabling recurring tasks or actions even when devices are offline. Spark is accessible across Android and macOS via the Gemini app. Crucially, Spark operates with user consent, allowing individuals to control its activation and the apps it connects to. It is engineered to seek explicit permission before undertaking high-stakes actions, such as financial transactions or sending communications, ensuring user control and safety. Initially, Spark will be available to Google AI Ultra subscribers as part of a beta test, with plans for broader rollout. Furthermore, Spark is enabling new Model Context Protocol connections with creative platforms like Canva, with Adobe, Xiaomi, Samsung, and others slated for future integrations, expanding its utility across a wide ecosystem.