6h ago

Andrew Ng announces short course AI Agents for Image and Video Generation developed with Google Cloud through DeepLearning.AI

Curriculum covers agents that auto-evaluate outputs with LLM judges and similarity scoring.

0
Original post

New course: Build AI agents that generate images and videos -- an under-explored frontier. A key to performance is having the agent evaluate its own output, and iterate to improve quality. This short course is built together with @googlecloudtech and taught by Katie Nguyen and Wafae Bakkali. You'll learn three evaluation techniques and combine them in an agent: image-text similarity scoring to check the output matches the prompt, an LLM judge that scores against custom criteria like brand consistency, and structured rubrics that break a prompt into verifiable yes/no questions like "is the subject in the frame?" and "does the camera motion match?" Skills you'll gain: - Learn image and video prompt engineering - Build an image agent that turns brand guidelines into UI mockups - Build a video agent that plans multi-scene explainers and animates reference frames with synchronized audio Join and build agents that create images and video! https://www.deeplearning.ai/courses/ai-agents-for-image-and-video-generation

10:08 AM · May 20, 2026 View on X

Evaluation is one of the hardest parts of building agents that work at scale, especially when there's no single correct answer. If you're thinking about this for conversational agents, we built ArkSim for that. It simulates realistic interactions, evaluates outputs turn-by-turn, and catches failures before deployment.

Give it a try: https://github.com/arklexai/arksim

Andrew NgAndrew Ng@AndrewYNg

New course: Build AI agents that generate images and videos -- an under-explored frontier. A key to performance is having the agent evaluate its own output, and iterate to improve quality. This short course is built together with @googlecloudtech and taught by Katie Nguyen and Wafae Bakkali. You'll learn three evaluation techniques and combine them in an agent: image-text similarity scoring to check the output matches the prompt, an LLM judge that scores against custom criteria like brand consistency, and structured rubrics that break a prompt into verifiable yes/no questions like "is the subject in the frame?" and "does the camera motion match?" Skills you'll gain: - Learn image and video prompt engineering - Build an image agent that turns brand guidelines into UI mockups - Build a video agent that plans multi-scene explainers and animates reference frames with synchronized audio Join and build agents that create images and video! https://www.deeplearning.ai/courses/ai-agents-for-image-and-video-generation

5:08 PM · May 20, 2026 · 30K Views
5:56 PM · May 20, 2026 · 155 Views
Andrew Ng announces short course AI Agents for Image and Video Generation developed with Google Cloud through DeepLearning.AI · Digg