Showroom by Speechbox|Pack your youtube playlist with premium viewing experience

← Google Cloud Next 2026 Developer Livestreams Read Article

Unleash AI's Full Power: Gemma 4, Baseten, NVIDIA Secrets Revealed!

“To me, what inference means is being able to actually deliver on the promise of AI applications.”
- Jason Davenport, Google Cloud

Discover how Baseten and NVIDIA are revolutionizing AI inference at scale on Google Cloud. Learn about cutting-edge hardware, software optimizations, and multi-region deployments that make AI applications faster and more reliable than ever.

AI InferenceGemma 4NVIDIABasetenGoogle CloudLLM OptimizationGPUKubernetesMachine LearningCloud Computing

00:00

Chapter 1Welcome: The Future of AI Inference Starts Now

00:43

Chapter 2NVIDIA's Next-Gen Hardware: Vera Rubin & Blackwell GPUs

01:28

Chapter 3Unpacking Inference: Delivering on AI's Promise

02:00

Chapter 4Baseten's Scale: Billions of Inferences with NVIDIA

02:50

Chapter 5Global Compute: Multi-Region Deployments Explained

03:25

Chapter 6Gemma 4 Unveiled: Multimodality & Model Sizes

04:30

Chapter 7Optimizing LLMs: TensorRT LLM & NVFP4 Secrets

05:20

Chapter 8Live Demo: Baseten's Inference Platform in Action

06:10

Chapter 9Deploying Gemma 4: From Hugging Face to Production

07:00

Chapter 10Auto-Scaling Magic: Meeting SLAs for AI Applications

08:00

Chapter 11Cost-Efficient Inference: NeMo Triton Models

08:40

Chapter 12GPU Selection: Optimizing Total Cost of Ownership

09:15

Chapter 13GKE's Role: Powering Multi-Model AI Agents

10:10

Chapter 14Decoding Inference Engineering: A New Book

10:50

Chapter 15Future Horizons: Community & Developer Insights

TopicsAI Infrastructure & Performance AI Development & MLOps Cloud Computing & Serverless Future of Tech & Innovation Google AI Models & Platforms

Top Moments

AI Auto-Scaling Magic

Never Worry About Scaling

Inference is Key

The Real Promise of AI

TensorRT LLM Hack

Boost LLM Performance

The Inference Bible

Unpacking Inference Engineering

Day Zero Support

Gemma 4's Secret Weapon

Vera Rubin & Blackwell

Next-Gen NVIDIA Hardware

Read the full article

NVIDIA and Baseten Unveil Next-Gen AI Inference Capabilities on Google Cloud

At a recent conference, NVIDIA and Baseten leaders detailed their strategic partnership with Google Cloud, focusing on groundbreaking advancements in AI inference. The collaboration promises to deliver unparalleled speed, reliability, and scalability for AI applications, leveraging next-generation hardware and sophisticated software optimizations.

Up Next

Unleash AI Power: Build & Share No-Code Agents!

Dart Everywhere! Unify Your Stack with Firebase Functions!

AI Agents: Why Tuning the 'Harness' Beats Model Weights!

AI Security: Stop Shifting Left, Start Shifting DOWN!

From 1 PR a Week to Dozens a Day: The AI Agent Governance Shock!

Karl Weinmeister

Google Cloud Next '26: The Developer Keynote Secrets You Missed!

The Future of Code: Why You Won't Write It Anymore!

Michele Catasta

AI Agents: The Shocking Truth About Your 'Naive' Data Strategy

AI Studio's Secret Weapon: How "Vibe Coding" Is Changing Everything

Logan Kilpatrick

Gemma 4: Run Google's AI on YOUR Phone?! The Future of Local AI is Here!

Omar Sanseviero

From Idea to App: Build AI with a Google Expert's Secrets!

Tomek Porozynski

Unleash Your Inner Director: Google's Gen Media Stack Revealed!

Unleash AI Creativity: Build Your Own Generative Media Agents!

Unleash AI Agents: The CLI Secret Google Just Revealed!

Acquired's Ben & David Reveal Google Cloud's Secret Weapon!

Jason Davenport