Focus on the principles and applications of generative AI, including multimodal models, and their use in creating new content like video, music, and text-to-speech, as well as broader media automation.
Fixing audio with words
Beyond video: World Models
Talk your app into existence!
AI voices, truly human
Code generation explosion
Same dog, new scene
Interactive AI avatars
AI writes your prompts!
Deep dive into Nano Banana
Lulu's AI-generated debut
AI music, perfectly timed
Instant app ideas!
LLMs evaluate images
Video AI, redefined
AI's surprising data skill!
Text-to-speech, but better
All Google's Gen Media
โThe amazing thing about this model is that it connects to Google search. So, it can answer you with the live data from Google search.โ
โGemini's really awesome at multimodality, so we're able to kind of analyze those images even with Gemini, and then fact-check a lot of these questions to make sure that everything's aligned.โ
โI like to have fairly long conversation about my idea, the tech stack, and then ask it to do the research if maybe, you know, my idea is not perfect or try to optimize it.โ
Google is dramatically expanding its generative media capabilities, empowering creators with an integrated suite of AI models that redefine artistic control and efficiency. A recent conference session showcased the full power of Nano Banana, VEO, Lyria, and Gemini, demonstrating how these tools can transform creative workflows.
At a recent Google Cloud session, Developer Relations Engineer Katie Wynn demonstrated how to build sophisticated Generative Media agents capable of automating complex creative tasks, from character design to full story production, leveraging Google's ADK and MCP frameworks.
In an insightful session, Google Developer Expert Tomek Wierzchowski demystifies the process of building AI applications, sharing his journey from a multi-voice audiobook concept to a deployable solution. He emphasizes practical tools and a strategic mindset for navigating the fast-evolving AI landscape.




