Whatfinger Startup And Small Business
    What's Hot

    Add A “Minimum” Price Tier

    January 19, 2026

    Claude Code Clearly Explained (and how to use it)

    January 19, 2026

    How a non technical PM reviews code

    January 19, 2026
    Whatfinger News Headlines

    Add A “Minimum” Price Tier

    January 19, 2026

    Claude Code Clearly Explained (and how to use it)

    January 19, 2026

    How a non technical PM reviews code

    January 19, 2026

    Big Things Cost A Lot Of Time

    January 19, 2026

    My Silver Exit Strategy in 2026 — When I Plan to Sell

    January 19, 2026

    The Most Important Founder You’ve Never Heard Of

    January 19, 2026

    How I Rebuilt a $1.3B Giant with an AI Agent While Facing My Own Death | Intercom, Eoghan McCabe

    January 19, 2026

    This is why Steph Curry is unstoppable.

    January 19, 2026
    Facebook Twitter Instagram
    Monday, January 19
    • Whatfinger®
    • Breaking
    • Videos
    • Fast Clips
    • Entertainment
    • Military
    • Sports
    • Humor
    • Money
    • Daily List
    • World
    • Crazy Clips
    • Daily Paper
    • Sci-Tech
    • Top 3
    • Choice Clips
    • About
    • Retirement
    Whatfinger Startup And Small BusinessWhatfinger Startup And Small Business
    Whatfinger Startup And Small Business
    Home » Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

    Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

    webmasterBy webmasterSeptember 25, 2025 All Videos 3 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Hamel Husain and Shreya Shankar teach the world’s most popular course on AI evals and have trained over 2,000 PMs and engineers (including many teams at OpenAI and Anthropic). In this conversation, they demystify the process of developing effective evals, walk through real examples, and share practical techniques that’ll help you improve your AI product.

    *What you’ll learn:*
    1. WTF evals are
    2. Why they’ve become the most important new skill for AI product builders
    3. A step-by-step walkthrough of how to create an effective eval
    4. A deep dive into error analysis, open coding, and axial coding
    5. Code-based evals vs. LLM-as-judge
    6. The most common pitfalls and how to avoid them
    7. Practical tips for implementing evals with minimal time investment (30 minutes per week after initial setup)
    8. Insight into the debate between “vibes” and systematic evals

    *Brought to you by:*
    Fin—The #1 AI agent for customer service: https://fin.ai/lenny
    Dscout—The UX platform to capture insights at every stage: from ideation to production: https://www.dscout.com/
    Mercury—The art of simplified finances: https://mercury.com/

    *Transcript:* https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill

    *My biggest takeaways (for paid newsletter subscribers):* https://www.lennysnewsletter.com/i/173871171/my-biggest-takeaways-from-this-conversation

    *Where to find Shreya Shankar*
    • X: https://x.com/sh_reya
    • LinkedIn: https://www.linkedin.com/in/shrshnk/
    • Website: https://www.sh-reya.com/
    • Maven course: https://bit.ly/4myp27m

    *Where to find Hamel Husain*
    • X: https://x.com/HamelHusain
    • LinkedIn: https://www.linkedin.com/in/hamelhusain/
    • Website: https://hamel.dev/
    • Maven course: https://bit.ly/4myp27m

    *Where to find Lenny:*
    • Newsletter: https://www.lennysnewsletter.com
    • X: https://twitter.com/lennysan
    • LinkedIn: https://www.linkedin.com/in/lennyrachitsky/

    *In this episode, we cover:*
    (00:00) Introduction to Hamel and Shreya
    (04:57) What are evals?
    (09:56) Demo: Examining real traces from a property management AI assistant
    (16:51) Writing notes on errors
    (23:54) Why LLMs can’t replace humans in the initial error analysis
    (25:16) The concept of a “benevolent dictator” in the eval process
    (28:07) Theoretical saturation: when to stop
    (31:39) Using axial codes to help categorize and synthesize error notes
    (44:39) The results
    (46:06) Building an LLM-as-judge to evaluate specific failure modes
    (48:31) The difference between code-based evals and LLM-as-judge
    (52:10) Example: LLM-as-judge
    (54:45) Testing your LLM judge against human judgment
    (01:00:51) Why evals are the new PRDs for AI products
    (01:05:09) How many evals you actually need
    (01:07:41) What comes after evals
    (01:09:57) The great evals debate
    (1:15:15) Why dogfooding isn’t enough for most AI products
    (01:18:23) OpenAI’s Statsig acquisition
    (1:23:02) The Claude Code controversy and the importance of context
    (01:24:13) Common misconceptions around evals
    (1:22:28) Tips and tricks for implementing evals effectively
    (1:30:37) The time investment
    (1:33:38) Overview of their comprehensive evals course
    (1:37:57) Lightning round and final thoughts

    *LLM Log Open Codes Analysis Prompt:*
    _Please analyze the following CSV file. There is a metadata field which has an nested field called z_note that contains open codes for analysis of LLM logs that we are conducting. Please extract all of the different open codes. From the _note field, propose 5-6 categories that we can create axial codes from._

    *Referenced:*
    • Building eval systems that improve your AI product: https://www.lennysnewsletter.com/p/building-eval-systems-that-improve
    • Mercor: https://mercor.com/
    • Brendan Foody on LinkedIn: https://www.linkedin.com/in/brendan-foody-2995ab10b
    • Nurture Boss: https://nurtureboss.io/
    • Braintrust: https://www.braintrust.dev/
    • Andrew Ng on X: https://x.com/andrewyng
    • Carrying Out Error Analysis: https://www.youtube.com/watch?v=JoAxZsdw_3w
    • Julius AI: https://julius.ai/
    • Brendan Foody on X—“evals are the new PRDs”: https://x.com/BrendanFoody/status/1939764763485171948
    …References continued at: https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill

    *Recommended books:*
    • Pachinko: https://www.amazon.com/Pachinko-National-Book-Award-Finalist/dp/1455563935
    • Apple in China: The Capture of the World’s Greatest Company: https://www.amazon.com/Apple-China-Capture-Greatest-Company/dp/1668053373/
    • Machine Learning: https://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/1259096955
    • Artificial Intelligence: A Modern Approach: https://www.amazon.com/Artificial-Intelligence-Modern-Approach-Global/dp/1292401133/

    _Production and marketing by https://penname.co/._
    _For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com._

    Lenny may be an investor in the companies discussed.

    webmaster

    Keep Reading

    Add A “Minimum” Price Tier

    Claude Code Clearly Explained (and how to use it)

    How a non technical PM reviews code

    Big Things Cost A Lot Of Time

    My Silver Exit Strategy in 2026 — When I Plan to Sell

    The Most Important Founder You’ve Never Heard Of

    Add A Comment

    Leave A Reply Cancel Reply

    Latest Featured Stories

    Add A “Minimum” Price Tier

    January 19, 2026

    Claude Code Clearly Explained (and how to use it)

    January 19, 2026

    How a non technical PM reviews code

    January 19, 2026

    Big Things Cost A Lot Of Time

    January 19, 2026

    My Silver Exit Strategy in 2026 — When I Plan to Sell

    January 19, 2026

    The Most Important Founder You’ve Never Heard Of

    January 19, 2026

    How I Rebuilt a $1.3B Giant with an AI Agent While Facing My Own Death | Intercom, Eoghan McCabe

    January 19, 2026

    This is why Steph Curry is unstoppable.

    January 19, 2026

    He competed with IBM at 19.

    January 19, 2026

    7 Passive Income Business Ideas for 2026 (Only 1% Know These IDEAS)

    January 19, 2026

    Top 1% Is Easier Than It Sounds?

    January 18, 2026

    I Work As Much As I Can

    January 18, 2026

    The Brutal Truth About Marketing Agencies

    January 18, 2026

    Shaan’s first business was his worst.

    January 18, 2026

    How a Meta PM ships products without ever writing code | Zevi Arnovitz

    January 18, 2026

    Silicon Valley’s worst advice for founders

    January 18, 2026

    5 Genius Ways to Make Money from Home in 2026 (Using AI)

    January 18, 2026

    Zuckerberg brought Instagram in 48 hours.

    January 18, 2026

    4 Ways To Influence People

    January 17, 2026

    The First $ Is The Hardest

    January 17, 2026

    There’s Always 1 Constraint

    January 17, 2026

    What nobody tells first-time founders

    January 17, 2026

    The greatest art dealer of all time.

    January 17, 2026

    The Eminem Sales Hack

    January 16, 2026

    Just Do A Time Study

    January 16, 2026

    Build a Solopreneur Business Run By AI (Heygen, ChatGPT, Canva)

    January 16, 2026

    This Milionaire worked at McDonald’s

    January 16, 2026

    Why leaders need to rebuild their intuition

    January 16, 2026

    8 Online Business Ideas with Low Investment in 2026

    January 16, 2026

    The Man Who Built Singapore.

    January 16, 2026

    6 Ways To Improve Your Life

    January 15, 2026

    Why $1M-$3M/Year Is So Hard

    January 15, 2026

    Do This If You’re Making Less Than $3M A Year

    January 15, 2026

    What Surprised Me Most In 2025

    January 15, 2026

    Gold at Record Highs — Is a Crash Coming in 2026?

    January 15, 2026

    Your Scrolling Speed Is Causing Brain Rot #brainrot #attention #focus #adhd

    January 15, 2026

    The Lesson I Learned Too Late (but you don’t have to)

    January 15, 2026

    How I hijacked Twitch’s biggest crisis

    January 15, 2026

    Human skills are the new scarcity in the AI era | Sam Lessin

    January 15, 2026

    The multi-agent system trap

    January 15, 2026
    Whatfinger News – The Conservative Alternative To the Drudge Report – CLICK BELOW
    More news daily than any other news site on Earth. All sources, all on one page! BAM! There can be ONLY one… CLICK BELOW

    Type above and press Enter to search. Press Esc to cancel.