How does Gemini 3.5 Live Translate differ from traditional turn-by-turn translation systems?

Gemini 3.5 Live Translate generates speech continuously while the speaker is talking, staying just seconds behind rather than waiting for the speaker to finish. This eliminates pauses and maintains conversational flow.

Which languages does Gemini 3.5 Live Translate support?

The model automatically detects and translates across 70+ languages without requiring manual language configuration.

Where can I access Gemini 3.5 Live Translate?

It is rolling out in three channels: public preview via the Gemini Live API for developers, private preview for enterprises in Google Meet, and public availability on Google Translate for Android and iOS.

Does the model preserve the speaker's voice characteristics?

Yes. Gemini 3.5 Live Translate preserves intonation, pacing, and pitch in the translated speech output.

Google DeepMind Launches Gemini 3.5 Live Translate with Near-Real-Time Speech-to-Speech Across 70+ Languages

The Release

Google DeepMind launched Gemini 3.5 Live Translate on June 9, a speech-to-speech translation model that performs continuous, low-latency translation across 70+ languages. According to the DeepMind Blog, the model automatically detects input languages and generates natural-sounding translated speech while preserving speaker intonation, pacing, and pitch. Unlike traditional translation systems that process turn-by-turn exchanges, Gemini 3.5 Live Translate streams audio output as input arrives, remaining just seconds behind the speaker to balance quality and responsiveness.

Deployment Pathway and Integration

The rollout spans three distribution channels starting immediately. According to DeepMind, developers gain public preview access via the Gemini Live API and Google AI Studio; enterprises can test the model in private preview within Google Meet this month; and consumers access the capability through Google Translate on Android and iOS. The DeepMind Blog notes that platform partners including Agora, Fishjam, LiveKit, Pipecat, and Vision Agents have integrated Gemini 3.5 Live Translate into their real-time communication infrastructure, allowing downstream developers to build multilingual voice applications without managing low-level media streaming.

Real-World Deployment: Grab

Grab, the Southeast Asian mobility platform, is testing Gemini 3.5 Live Translate to enable multilingual communication between drivers and passengers during pickups. According to DeepMind, Grab’s users place over 10 million voice calls monthly, positioning the company’s deployment as a high-volume production trial for the model’s robustness in noisy, real-world environments.

Technical Capabilities

The model’s core advantage is continuous streaming rather than buffering for turn-completion. DeepMind reports that Gemini 3.5 Live Translate handles multilingual inputs without manual configuration and includes noise-robustness features for unpredictable acoustic environments. The model processes speech-as-streamed, enabling near-real-time output suitable for live interpretation across calls, meetings, lessons, and broadcasts.

Why This Matters

Real-time speech translation at this latency and scale removes friction from cross-language collaboration. Teams coordinating across geographies—customer service, emergency response, international commerce—now face sub-second translation overhead rather than turn-based delays. The continuous streaming approach is particularly significant for mobility and logistics use cases (as Grab demonstrates) where conversation timing and safety communication depend on minimal lag. For developers, the Gemini Live API integration into existing media-streaming platforms lowers the barrier to embedding translation, likely accelerating adoption in video conferencing, live-streaming, and voice-enabled applications. The 70+ language coverage, if validated under production load at Grab’s scale, establishes a new baseline for accessibility in voice applications across emerging markets.