Ever wonder what it actually takes to train a frontier AI model?
Ankit Gupta, YC General Partner, sits down with Nick Joseph, Anthropic’s Head of Pre-training, to explore the engineering challenges behind training Claude—from managing thousands of GPUs and debugging cursed bugs to balancing compute between pre-training and RL. We cover scaling laws, data strategies, team composition, and why the hardest problems in AI are often infrastructure problems, not ML problems.
Apply to Y Combinator: https://www.ycombinator.com/apply
Work at a startup: https://www.ycombinator.com/jobs
Chapters:
00:00 – Introduction
01:05 – From Vicarious to OpenAI to Anthropic
06:40 – What pretraining is
11:20 – Why next-word prediction won out
16:05 – Scaling laws and the feedback loop of compute → models → revenue
21:50 – Building Anthropic’s early infrastructure
27:35 – Efficiency hacks and debugging at scale
33:10 – Generalists vs. specialists on the pretraining team
38:45 – Challenges of training across thousands of GPUs
44:15 – Working with new chips: GPUs vs. TPUs
49:00 – Pretraining vs. post-training (RLHF and reasoning models)
54:25 – The future of data quality and availability
59:10 – Where pretraining goes next
1:03:00 – Closing reflections