Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Automated Evals for LLM Agents
Learn how to build automated evals for LLM‑based agents, covering dataset structuring, stability checks, edge‑case handling, and reproducible regression detection.
In this live demo, I’ll show how we built an automated evaluation system (“evals”) to test AI agents powered by LLMs in production. We’ll quickly walk through our setup—how we structured test datasets, automated stability checks, handled edge cases, and implemented reproducible quality benchmarks. I’ll run a live eval demo from the terminal, showcasing how we detect regressions and ensure consistent performance in our AI agents.
Autonomous AI analyzes customer videos for insights, automating workflows.