#multimodal

Google DeepMind Launches Gemini 3.5 Live Translate with Near-Real-Time Speech-to-Speech Across 70+ Languages

LLMs Jun 10, 2026

Gemini 3.5 Live Translate enables fluid, continuous speech-to-speech translation without manual language configuration, rolling out to Google Meet, Translate, and the Gemini Live API.

Google DeepMind's Gemma 4 12B Brings Encoder-Free Multimodal AI to Consumer Laptops

LLMs Jun 10, 2026

Google DeepMind releases Gemma 4 12B, a 12-billion-parameter model with unified vision and audio processing that runs on 16GB consumer hardware.

Google Unveils Gemini Omni Video Generation and Gemini 3.5 Flash for Agentic AI

LLMs May 30, 2026

Google announced Gemini Omni, a multimodal model that generates and edits video through natural language, and Gemini 3.5 Flash, optimized for complex agent workflows.

Google I/O 2026: Gemini Omni, Multimodal Search, and AI Agents debut

LLMs May 29, 2026

Google unveiled Gemini Omni for video generation, Gemini 3.5 Flash for agents and coding, and autonomous Search agents that monitor the web 24/7.

Google launches Gemini 3.5 Flash with agentic benchmarks outpacing Pro, rolls Pro in June

LLMs May 22, 2026

Google unveiled Gemini 3.5 Flash at I/O 2026 as an agent-first model claiming frontier intelligence at sub-flagship latency, with Gemini Omni adding physics-aware video generation.

Google's Gemini Omni blurs the line between text prompt and video simulation

LLMs May 21, 2026

At I/O 2026, Google DeepMind unveiled Gemini Omni, a multimodal family that generates video from combined image, audio, and text inputs, signaling a shift from generative to simulational AI.

Google launches Gemini 3.5 models and Omni multimodal family at I/O 2026

LLMs May 20, 2026

Google unveiled Gemini 3.5 Flash as the new default model, introduced Gemini Omni for text-to-video generation, and previewed always-on agents powered by Gemini Spark.

Google DeepMind Launches Gemini Omni Flash for AI-Powered Video Generation and Editing

LLMs May 20, 2026

Gemini Omni Flash enables users to generate and edit videos through natural language prompts, combining multimodal inputs with real-world knowledge.

Google Search Gets Agentic AI Overhaul With Gemini 3.5 Flash Default

LLMs May 20, 2026

Google debuts AI agents, a redesigned search box, and Gemini 3.5 Flash integration in Search, targeting a billion-user base

Google I/O 2026: Gemini Omni and Agent-First Development Mark Shift Toward Agentic AI

LLMs May 20, 2026

Google unveiled Gemini Omni, a multimodal model capable of video creation, alongside Gemini 3.5 Flash and expanded agent capabilities across Search, Gmail, and shopping.

Elmer Data's Watch Test Exposes a Gap Between Conversational AI and Visual Reasoning

Research May 18, 2026

A new analysis shows that large language models excel at language tasks but struggle with seemingly simple visual reasoning—like reading analog clocks.