Whatfinger Startup And Small Business
    What's Hot

    Everything I hated about myself turns out to be ADHD #adhd

    May 29, 2026

    There are 2 types of parents & both are wrong

    May 29, 2026

    Mortgage Rates Are Rising Again — Here’s What Happens Next

    May 29, 2026
    Whatfinger News Headlines

    Everything I hated about myself turns out to be ADHD #adhd

    May 29, 2026

    There are 2 types of parents & both are wrong

    May 29, 2026

    Mortgage Rates Are Rising Again — Here’s What Happens Next

    May 29, 2026

    How to use Obsidian with Claude in 61 seconds

    May 29, 2026

    Why Two IIT Engineers Turned Down $550K Jobs To Build A Startup

    May 29, 2026

    AI makes great PMs more powerful

    May 29, 2026

    The insane rise and fall of MTV

    May 29, 2026

    A 60 Second Email Took 10 Days #adhd

    May 28, 2026
    Facebook Twitter Instagram
    Friday, May 29
    • Whatfinger®
    • Breaking
    • Fast Clips
    • Entertainment
    • Military
    • Sports
    • Humor
    • Money
    • Daily List
    • World
    • Crazy Clips
    • Sci-Tech
    • Choice Clips
    Whatfinger Startup And Small BusinessWhatfinger Startup And Small Business
    Whatfinger Startup And Small Business
    Home » Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

    Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

    webmasterBy webmasterSeptember 25, 2025 All Videos 3 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Hamel Husain and Shreya Shankar teach the world’s most popular course on AI evals and have trained over 2,000 PMs and engineers (including many teams at OpenAI and Anthropic). In this conversation, they demystify the process of developing effective evals, walk through real examples, and share practical techniques that’ll help you improve your AI product.

    *What you’ll learn:*
    1. WTF evals are
    2. Why they’ve become the most important new skill for AI product builders
    3. A step-by-step walkthrough of how to create an effective eval
    4. A deep dive into error analysis, open coding, and axial coding
    5. Code-based evals vs. LLM-as-judge
    6. The most common pitfalls and how to avoid them
    7. Practical tips for implementing evals with minimal time investment (30 minutes per week after initial setup)
    8. Insight into the debate between “vibes” and systematic evals

    *Brought to you by:*
    Fin—The #1 AI agent for customer service: https://fin.ai/lenny
    Dscout—The UX platform to capture insights at every stage: from ideation to production: https://www.dscout.com/
    Mercury—The art of simplified finances: https://mercury.com/

    *Transcript:* https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill

    *My biggest takeaways (for paid newsletter subscribers):* https://www.lennysnewsletter.com/i/173871171/my-biggest-takeaways-from-this-conversation

    *Where to find Shreya Shankar*
    • X: https://x.com/sh_reya
    • LinkedIn: https://www.linkedin.com/in/shrshnk/
    • Website: https://www.sh-reya.com/
    • Maven course: https://bit.ly/4myp27m

    *Where to find Hamel Husain*
    • X: https://x.com/HamelHusain
    • LinkedIn: https://www.linkedin.com/in/hamelhusain/
    • Website: https://hamel.dev/
    • Maven course: https://bit.ly/4myp27m

    *Where to find Lenny:*
    • Newsletter: https://www.lennysnewsletter.com
    • X: https://twitter.com/lennysan
    • LinkedIn: https://www.linkedin.com/in/lennyrachitsky/

    *In this episode, we cover:*
    (00:00) Introduction to Hamel and Shreya
    (04:57) What are evals?
    (09:56) Demo: Examining real traces from a property management AI assistant
    (16:51) Writing notes on errors
    (23:54) Why LLMs can’t replace humans in the initial error analysis
    (25:16) The concept of a “benevolent dictator” in the eval process
    (28:07) Theoretical saturation: when to stop
    (31:39) Using axial codes to help categorize and synthesize error notes
    (44:39) The results
    (46:06) Building an LLM-as-judge to evaluate specific failure modes
    (48:31) The difference between code-based evals and LLM-as-judge
    (52:10) Example: LLM-as-judge
    (54:45) Testing your LLM judge against human judgment
    (01:00:51) Why evals are the new PRDs for AI products
    (01:05:09) How many evals you actually need
    (01:07:41) What comes after evals
    (01:09:57) The great evals debate
    (1:15:15) Why dogfooding isn’t enough for most AI products
    (01:18:23) OpenAI’s Statsig acquisition
    (1:23:02) The Claude Code controversy and the importance of context
    (01:24:13) Common misconceptions around evals
    (1:22:28) Tips and tricks for implementing evals effectively
    (1:30:37) The time investment
    (1:33:38) Overview of their comprehensive evals course
    (1:37:57) Lightning round and final thoughts

    *LLM Log Open Codes Analysis Prompt:*
    _Please analyze the following CSV file. There is a metadata field which has an nested field called z_note that contains open codes for analysis of LLM logs that we are conducting. Please extract all of the different open codes. From the _note field, propose 5-6 categories that we can create axial codes from._

    *Referenced:*
    • Building eval systems that improve your AI product: https://www.lennysnewsletter.com/p/building-eval-systems-that-improve
    • Mercor: https://mercor.com/
    • Brendan Foody on LinkedIn: https://www.linkedin.com/in/brendan-foody-2995ab10b
    • Nurture Boss: https://nurtureboss.io/
    • Braintrust: https://www.braintrust.dev/
    • Andrew Ng on X: https://x.com/andrewyng
    • Carrying Out Error Analysis: https://www.youtube.com/watch?v=JoAxZsdw_3w
    • Julius AI: https://julius.ai/
    • Brendan Foody on X—“evals are the new PRDs”: https://x.com/BrendanFoody/status/1939764763485171948
    …References continued at: https://www.lennysnewsletter.com/p/why-ai-evals-are-the-hottest-new-skill

    *Recommended books:*
    • Pachinko: https://www.amazon.com/Pachinko-National-Book-Award-Finalist/dp/1455563935
    • Apple in China: The Capture of the World’s Greatest Company: https://www.amazon.com/Apple-China-Capture-Greatest-Company/dp/1668053373/
    • Machine Learning: https://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/1259096955
    • Artificial Intelligence: A Modern Approach: https://www.amazon.com/Artificial-Intelligence-Modern-Approach-Global/dp/1292401133/

    _Production and marketing by https://penname.co/._
    _For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com._

    Lenny may be an investor in the companies discussed.

    webmaster

    Keep Reading

    Everything I hated about myself turns out to be ADHD #adhd

    There are 2 types of parents & both are wrong

    Mortgage Rates Are Rising Again — Here’s What Happens Next

    How to use Obsidian with Claude in 61 seconds

    Why Two IIT Engineers Turned Down $550K Jobs To Build A Startup

    AI makes great PMs more powerful

    Add A Comment

    Leave A Reply Cancel Reply

    Latest Featured Stories

    Everything I hated about myself turns out to be ADHD #adhd

    May 29, 2026

    There are 2 types of parents & both are wrong

    May 29, 2026

    Mortgage Rates Are Rising Again — Here’s What Happens Next

    May 29, 2026

    How to use Obsidian with Claude in 61 seconds

    May 29, 2026

    Why Two IIT Engineers Turned Down $550K Jobs To Build A Startup

    May 29, 2026

    AI makes great PMs more powerful

    May 29, 2026

    The insane rise and fall of MTV

    May 29, 2026

    A 60 Second Email Took 10 Days #adhd

    May 28, 2026

    Inference, Diffusion, World Models, and More | YC Paper Club

    May 28, 2026

    AI still needs humans

    May 28, 2026

    ADHD Has Only Two Times Now Or Not Now #adhd

    May 27, 2026

    Bill Gates studied his hiring strategy

    May 27, 2026

    Been playing with this cool ai tool from genspark

    May 27, 2026

    Hair Business Ideas For WOMEN (+ you don’t have to do or sell hair 🤫)

    May 27, 2026

    How to Build Superintelligence Inside Your Company

    May 27, 2026

    The AI jobpocalypse isn’t real

    May 27, 2026

    The 50 richest families in America are betting on this trend

    May 27, 2026

    Business Credit Card Explained in 3 Minutes

    May 27, 2026

    Success Was The Rent I Paid For Being Weird #adhd

    May 26, 2026

    How Do You Find Signal in the Noise?

    May 26, 2026

    The Simple Habit That Changed My Productivity

    May 26, 2026

    Ai can be end of the humanity (I’m not joking)

    May 26, 2026

    ADHD Makes Time Feel Fake

    May 26, 2026

    SaaS is actually here to stay

    May 26, 2026

    The New Way of Making Content In The Age of AI

    May 25, 2026

    ADHD is an expensive condition to have #adhd

    May 25, 2026

    Investing in the S&P 500 is a mistake

    May 25, 2026

    A CS Professor on Why Slow Learning Wins in the AI Era | CU Boulder, Tom Yeh

    May 25, 2026

    Add A Minimum Price Tier 1

    May 25, 2026

    My ADHD Sleep Schedule Was Completely Broken

    May 24, 2026

    This billionaire paid $650K for lunch with Warren Buffett!?

    May 24, 2026

    This Is What Makes The Game interesting…

    May 24, 2026

    Why His Close Rate Won’t Budge…

    May 24, 2026

    The AI paradox: More automation, more humans, more work | Dan Shipper

    May 24, 2026

    Top 5 Small Business Ideas to Start a Business in 2026

    May 24, 2026

    Five Minutes And 15 Minutes Feel The Same With ADHD

    May 23, 2026

    I Accidentally Bought 15 Identical Black T Shirts

    May 23, 2026

    You Have To Give Up Profit…

    May 23, 2026

    Hermes Agent Explained

    May 23, 2026

    Why Nobody Can Copy Elon Even When He Tells Them How

    May 23, 2026
    Whatfinger News – The Conservative Alternative To the Drudge Report – CLICK BELOW
    More news daily than any other news site on Earth. All sources, all on one page! BAM! There can be ONLY one… CLICK BELOW

    Type above and press Enter to search. Press Esc to cancel.