Showroom by Speechbox

Kulani DavajaProduct Marketing Manager for Gen Media

Generative AIAI VideoAI Music

Nano Banana offers granular control over image generation aesthetics.
VEO 3.1 Light delivers cost-effective, high-speed video creation.
Gemini 3.1 Flash Live introduces interactive, real-time AI avatars.

Google is dramatically expanding its generative media capabilities, empowering creators with an integrated suite of AI models that redefine artistic control and efficiency. A recent conference session showcased the full power of Nano Banana, VEO, Lyria, and Gemini, demonstrating how these tools can transform creative workflows.

Kulani Davaja, Product Marketing Manager for Gen Media, provided a comprehensive overview of Google's generative media ecosystem. She clarified that "Gen Media" encompasses Nano Banana for image generation and editing, VEO for video creation, Gemini Audio for transcription and text-to-speech, and Lyria for music generation. The session highlighted the rapid pace of innovation, with new updates frequently rolling out across these models.

Key Moment

Deep dive into Nano Banana

The demonstration began with a charming, AI-generated short film about working from home and snacking, illustrating the seamless integration of all models. Davaja then delved into Nano Banana's capabilities, showcasing how creators can achieve highly specific artistic styles, from 3D renders with rounded geometry to precise camera settings like 33mm film, glossy highlights, and halation. She emphasized the role of AI, particularly Gemini, in assisting creators to distill complex artistic terminology into effective prompts.

Key Moment

AI music, perfectly timed

Next, the spotlight shifted to VEO 3.1 Light, lauded for its cost-effectiveness and impressive generation speed, producing video frames in under 60 seconds. Davaja demonstrated VEO's ability to understand creative intent, generating dynamic sequences from just a first frame or between a first and last frame. For audio, Lyria 3 Pro was presented as a powerful tool for music generation, capable of precise timing and mood shifts based on timestamps. Notably, VEO 3.1 Light also handles sound effects, further streamlining audio production. Gemini's multimodal understanding proved invaluable in generating appropriate music prompts for Lyria.

Key Moment

AI voices, truly human

The session culminated with a look at Gemini 3.1 Flash Text-to-Speech, which offers unprecedented control over voice expressiveness through 200 distinct tags, enabling comedic, panicked, or even British-accented narration. The most anticipated reveal was Gemini 3.1 Flash Live, a new live avatar feature capable of real-time interaction and pulling live data from Google Search. This innovation opens doors for dynamic applications in education, training, and live streaming. Looking ahead, Davaja expressed excitement for "world models" like Genie 3, which promise to transform creators into operators within AI-generated environments, and for continued advancements in reducing generation latency, enhancing creative flow.

Key Moment

Beyond video: World Models

“The amazing thing about this model is that it connects to Google search. So, it can answer you with the live data from Google search.”
- Kulani Davaja, Product Marketing Manager for Gen Media

Google Unveils Next-Gen AI Media Stack: From Pixel-Perfect Images to Live Avatars

More Articles