What is Google's new avatar feature and who can use it?

The avatar tool is available exclusively to Google Gemini AI Pro subscribers ($20/month). It uses the Omni video model to generate photorealistic 10-second video clips featuring a digital clone of the user.

How does Google's avatar approach differ from OpenAI's Sora?

Google restricts avatar generation to the user's own likeness only—preventing others from creating deepfakes of you. OpenAI's Sora previously allowed users to decide whether their likeness could be used by others, but that functionality is no longer available.

What are the practical limitations of the tool?

Users hit usage caps every 5 hours, and avatar setup requires ~5 minutes of facial scanning in well-lit conditions. Generated videos sometimes contain visual artifacts, including misaligned teeth and inconsistent clothing details.

How realistic are the outputs?

According to Wired's testing, the backgrounds are remarkably photorealistic—Google's mapping infrastructure allows accurate recreation of real locations. However, facial details and motion are imperfect, with some 'jumbled moments' and anatomical inconsistencies.

Google's Gemini Avatar Tool Generates Photorealistic Video Clones—With a Catch

Google’s Omni Powers Photorealistic Self-Cloning

Google has embedded a new avatar generation tool directly into its Gemini app, allowing AI Pro subscribers to create photorealistic video deepfakes of themselves. According to Wired AI, the feature taps into Google’s Omni video model and costs $20 per month as part of the premium Gemini subscription tier. Users can generate up to 10-second clips, though the tool resets usage limits every 5 hours, creating a significant throughput constraint for power users.

The setup process is relatively frictionless—approximately 5 minutes of facial scanning in a well-lit room, involving directional head movements and number recitation via smartphone camera. However, the practical tradeoff is immediate: garments visible during enrollment appear in generated outputs, limiting wardrobe flexibility across multiple video projects.

Photorealism in Backgrounds, Imprecision in Anatomy

What distinguishes Google’s implementation is the accuracy of spatial rendering. According to the Wired article, backgrounds faithfully recreate actual locations—complete with landmark details like specific trees, building skylines, and landscape contours. This advantage stems from Google’s planet-scale mapping infrastructure, which other video models cannot replicate. When tested on a Dolores Park scene, the generated video included recognizable architectural elements (the Salesforce building in the distance, palm tree-lined paths) that confirm geographic authenticity.

Facial fidelity, by contrast, reveals the current limits of the technology. The review noted “jumbled moments,” misaligned dentition, and anatomical inconsistencies that preserve recognizability while introducing subtle uncanniness. The chin geometry and overall likeness register as accurate, but micro-movements in the mouth and fine facial details deviate from photorealism.

Restrictive Safeguards: Self-Use Only

Google has implemented a deliberate restriction absent from OpenAI’s now-defunct Sora interface. According to Wired, Gemini avatars can be generated only by the account holder—preventing third parties from synthesizing videos of a given person’s likeness without consent. This contrasts with Sora’s approach, which gave users agency over whether their likeness could be deployed by others, creating a potential misuse vector that Google has preemptively closed.

The limitation applies exclusively to adult account holders, introducing an age-gating mechanism that OpenAI did not employ in its avatar framework.

Why This Matters

The debut of avatar generation inside a consumer product signals that deepfake video synthesis has transitioned from research prototype to monetized feature. The $20/month paywall and 5-hour reset cycles suggest Google is treating avatar generation as a compute-intensive, rationed service rather than an unlimited utility—a pricing signal that reconciles demand management with infrastructure constraints.

For content creators and synthetic media producers, the accuracy of spatial rendering (powered by Google’s mapping assets) creates a competitive moat that other video models struggle to replicate. However, the anatomical imprecision and usage throttling will limit enterprise adoption until fidelity improves and rate limits scale. The self-only restriction, while privacy-protective, also narrows the tool’s creative applications—professional voice actors, digital influencers, and educational content producers cannot easily deploy it at scale.