OpenAI recently released its first open-weights model since GPT-2, entering a field led by DeepSeek and Alibaba’s Qwen.
YC’s Ankit Gupta breaks down everything you need to know about these top OSS models, including what sets them apart under the hood. He’ll compare their approaches to mixture-of-experts, long-context training, and post-training techniques that shape reasoning and alignment—and explore how different design choices lead to surprisingly similar performance.
Apply to Y Combinator: https://www.ycombinator.com/apply
Work at a startup: https://www.ycombinator.com/jobs
00:00 – OpenAI OSS Launch
01:00 – Comparing Open Source LLM Architectures
01:46 – GPT OSS Overview
02:37 – Under The Hood of GPT OSS
03:25 – Qwen-3 Architecture
04:17 – Qwen-3 Training
05:12 – Qwen-3 Post-Training
06:08 – Qwen-3 Reasoning & RL Innovations
06:52 – DeepSeek V3 Overview
07:40 – DeepSeek V3.1 Updates
08:39 – Attention Mechanism (MLA)
09:39 – Comparing Model Sizes
10:35 – Long Context Strategies
11:25 – Reflections on Methods
12:00 – Takeaways