AIL Player Card #012 — Gemini Omni Flash: The World Model

92 OVR · WM · Google National Conversational video editing. Any input → any output. Physics-grounded world knowledge baked in. Google National just fielded a player with a position nobody in this league has played before. #AILeague

Gemini 3.5 Flash vs rival models benchmark comparison table across coding, agentic, multimodal, and reasoning dimensions — Google's performance chart showing Gemini 3.5 Flash (Omni Flash shares the same Flash-tier architecture) across multiple benchmark categories 1

The scouting report

Google National's roster has been building toward this for two seasons. Card #003 sent Gemini 2.5 Pro out as a Multimodal Wing — comprehension-first, reasoning-forward. Card #009 deployed Gemini 3.5 Flash as an Agentic Sprinter — four times faster than any rival, built for sub-agent loops. Both were real contributions. Neither changed how the league thinks about what "playing" looks like.

Gemini Omni Flash does.

Announced May 29, 2026, and rolling out now to Google AI Plus / Pro / Ultra subscribers globally2, Omni Flash is the league's first World Model — a player whose primary capability isn't understanding input, but creating output from any combination of input. Text, image, video, audio: you hand Omni Flash a mix of references, it produces a coherent video with physics that holds, characters that stay consistent, and scenes that know what happened before.

blog.googlehttps://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-omni/External link

Loading content card…

The position tag is WM. It's new. It's earned.

The 92 OVR: how the card scores

No Arena ELO for this one — the AI League text/reasoning leaderboards don't run video generation models, and Omni Flash's video quality benchmarks sit on the Artificial Analysis Video Arena, where it hasn't yet posted an official score3. That's not a knock on the model — it's a sign that this is genuinely new territory for the league's scoring infrastructure.

So we score on what we know:

Dimension	Score	Basis
RZN — Reasoning	86	World-knowledge physics simulation; biology + history + science grounding in outputs
CRE — Creativity	97	Conversational multi-turn video editing; any-to-any multimodal generation; highest creative ceiling in the current league roster
SPD — Speed	88	Flash-tier architecture; faster than Omni Pro variant
MLT — Multimodal	99	Text + image + video + audio as simultaneous inputs → video output; no other league player has this input breadth
SAF — Safety	82	SynthID watermarking on all outputs4; red-team evaluations completed; avatar feature restricted to users' own likeness; deepfake policy enforcement ongoing
VAL — Value	74	Subscription-gated (Google AI Plus/Pro/Ultra); API pricing TBD; high quota consumption in early testing; no open pricing for developers yet

OVR: 92. The creative ceiling and multimodal breadth push it into elite territory. The value score anchors it below Gemini 3.5 Flash (#009, 91 OVR), though on a different axis entirely — this isn't a speed-vs-cost play; it's a capability-expansion play.

What the WM position actually means

The AI League taxonomy has needed this slot for a while. "Multimodal Wing" covers models that take in multiple formats. "World Model" describes something different: a model whose output generation is grounded in real-world causal understanding.

When Omni Flash edits a video to make a mirror ripple like liquid when touched, it isn't applying a filter. It's reasoning about what liquid-mirror physics would look like, frame by frame. When it adds sound effects synchronized to leaf touches, it's connecting tactile events to expected audio signatures. When it runs a claymation explainer of protein folding — accurate, no hands, stop motion — it's drawing on biology knowledge to script the visual4.

That's a World Model. The editing interface is just the delivery mechanism.

deepmind.googlehttps://deepmind.google/models/gemini-omniExternal link

Loading content card…

Comparison with the current video generation field3:

Capability	Gemini Omni Flash	Seedance 2.0
Motion realism	★★★★☆	★★★★★
Prompt adherence	★★★★★	★★★★☆
Cross-shot character consistency	★★★☆☆	★★★★☆
Cinematic quality	★★★☆☆	★★★★★
Conversational video editing	★★★★★	★☆☆☆☆
World-knowledge grounding	★★★★★	★★★☆☆

Omni loses the beauty contest. Seedance 2.0 holds the Artificial Analysis Video Arena's image-to-video top rank (Elo 1,351 globally as of May 2026). But Seedance can't let you edit a video through natural conversation, can't synchronize outputs to your reference audio, and doesn't understand that a protein folding animation should follow biochemistry. Different games.

Season highlights

The five plays that define this card's debut season:

1. Conversational video editing from Gemini app — Users edit through natural, incremental prompts. Each instruction builds on the last. Characters stay consistent. Physics holds across cuts. Launched globally to Google AI Plus/Pro/Ultra subscribers May 29, 20262.

2. Any-input reference synthesis — Text, image, video, and audio references combine into a single cohesive output. No other model in the current league takes all four simultaneously. Voice-only audio references supported at launch; other audio types coming.

3. Physics-grounded generation — Gravity, kinetic energy, fluid dynamics. The marble-on-chain-reaction-track demo has made the rounds on social because it works — the physics is continuous and coherent, not stylized approximation.

4. Integration into Google Flow and YouTube Shorts — Immediate distribution play. Not API-first; consumer-first2. YouTube Shorts users get free access starting this week. This is Google National deploying their media infrastructure advantage at full force.

5. SynthID provenance on every frame — All Omni Flash outputs carry Google's imperceptible digital watermark. Verifiable through Gemini app, Gemini in Chrome, and Google Search. The league's most thorough content provenance system at launch.

Head-to-head: WM class comparison

Model	OVR	RZN	CRE	SPD	MLT	SAF	VAL	Position
Gemini Omni Flash	92	86	97	88	99	82	74	WM
Gemini 2.5 Pro (#003)	93	92	88	76	96	85	81	MW
Gemini 3.5 Flash (#009)	91	88	82	97	90	88	90	AS

Google National's bench depth is real. Three cards in the league, three distinct positions. The problem? None of them is a top-tier reasoning player yet. The MW (#003) leads Google's three on reasoning. The WM (#012) leads on creativity and multimodal breadth. The AS (#009) leads on speed and value. Together they're a specialist-heavy squad, not a complete team.

The Gemini 3.5 Pro — widely expected as the orchestrator that ties this roster together — is still slated for June 2026 but has not appeared. Google National's tactical ceiling depends heavily on that one player showing up.

The broadcast take

Mike Breen voice: Omni Flash steps to the arc — pulls up from the logo — BANG. Google National has been chasing the next evolution of their Nano Banana image system for two full seasons. They finally get there, and the upgrade isn't an iteration, it's a position change.

This card doesn't compete with Claude Opus 4.8 on enterprise reliability. It doesn't compete with GPT-5.5 on agentic task orchestration. It competes with Seedance, Sora, Kling, and Runway — the video generation specialists — and it does so with a completely different attacking style: world-knowledge reasoning baked in, conversational controls on top, Google's ecosystem as the delivery system.

The 74 on value is a real limit. Subscription-gated access and unclear API pricing make it inaccessible for the developer community that drives adoption for most league players. Until that opens up, Omni Flash stays a consumer play. A spectacular consumer play.

The Pro variant is in the works. The API is coming. When this player gets full deployment, Google National's WM slot becomes one of the more interesting watching positions in the league.

92 OVR. WM. Welcome to the AI League. #AILeague