LLMs

Google's Gemini Omni Raises Questions About Video Generation Quality and Consistency

Google released Omni Flash, the first model in its anything-to-anything Gemini family, but early tests reveal significant flaws in character consistency and object rendering.

Last verified:

Gemini Omni Flash, the inaugural model in Google’s planned anything-to-anything generation family, demonstrates tangible improvements over its predecessor while exposing persistent weaknesses in maintaining visual consistency across video sequences. According to The Verge’s hands-on testing, the model—now available in Google’s Flow platform—generates more coherent video than Veo but still struggles with character stability and object continuity in ways that undermine its claimed real-world knowledge improvements.

Video Generation Improvements on a Shaky Foundation

Google’s Omni Flash can accept both video uploads and text prompts as starting points, a workflow expansion beyond Veo’s text-only input. The model reportedly achieves better semantic understanding—translating narrative prompts into sequences where actions build logically and character appearance remains stable across shots. A test sequence asking the model to generate a character packing for vacation and boarding a cruise ship, complete with a humorous payoff involving misidentified luggage, produced what The Verge describes as “not a bad bit,” suggesting Omni can execute multi-beat narratives.

Yet execution gaps surface immediately. The same sequence depicting a character named Buddy packing a honey jar that appears as an ordinary jar, then a clear water bottle, then a squeeze bottle within minutes of video—each transformation unexplained by the prompt. The final frame devolves into what The Verge characterizes as incoherent, as if the model “just barfed up a bunch of elements,” signaling a breakdown in sequence coherence at the boundary between planned shots.

Consistency Artifacts Persist

Character orientation instability remains a signature flaw. Omni generates scenes where animated subjects “suddenly switch orientation” mid-action—in one test, while skydiving—creating jarring discontinuities that contradict the model’s positioning as improved at maintaining character consistency. These are not edge cases; they appear across multiple test runs, suggesting the consistency gains are incremental rather than fundamental.

Text-based editing capabilities perform better than in Veo but remain unreliable enough that regenerating full sequences from scratch often proves faster than attempting targeted fixes. This undercuts the editing workflow Google likely intended, forcing users into a trial-and-error loop of full video regeneration.

Why This Matters

Video synthesis tools are advancing toward consumer-grade usability, but Omni’s real-world performance suggests claims about “anything-to-anything” conversion remain aspirational. The gap between Google’s positioning and demonstrated output matters for three audiences: content creators evaluating whether AI video tools can replace traditional workflows (the answer is still “not reliably”), platform teams building guardrails around synthetic media (object continuity failures may reduce deepfake persuasiveness, a double-edged safety implication), and regulators assessing whether these tools require licensing or disclosure frameworks (if the quality is visibly flawed, regulatory urgency may decrease). The trajectory shows progress—Omni is measurably better than Veo—but the remaining inconsistencies suggest Google’s “omni” ambitions still require significant engineering work before the model becomes a production-grade tool.

Frequently Asked Questions

What is Gemini Omni Flash?

Omni Flash is the first released model from Google's Omni family, designed eventually to convert any input type (photo, video, text) into any other format. Currently, it generates videos and is available through Google's Flow platform.

How does Omni compare to Veo?

Omni reportedly incorporates more real-world knowledge and maintains character consistency better than Veo. However, both models still exhibit artifacts like sudden character orientation shifts and inconsistent object rendering across frames.

What are the main limitations identified in testing?

Character and object consistency remain problematic—props change form mid-video, characters experience abrupt orientation changes, and final frames sometimes appear incoherent. Text-based editing prompts work better than in Veo but still produce suboptimal results.

#gemini #video-generation #google #generative-ai