DeepSeek: What Happened, What Matters, and Why It’s Interesting
Artificiality: Minds Meeting Machines - Un pódcast de Helen and Dave Edwards

Categorías:
First: - Apologies for the audio! We had a production error… What’s new: - DeepSeek has created breakthroughs in both: How AI systems are trained (making it much more affordable) and how they run in real-world use (making them faster and more efficient) Details - FP8 Training: Working With Less Precise Numbers - Traditional AI training requires extremely precise numbers - DeepSeek found you can use less precise numbers (like rounding $10.857643 to $10.86) - Cut memory and computation needs significantly with minimal impact - Like teaching someone math using rounded numbers instead of carrying every decimal place - Learning from Other AIs (Distillation) - Traditional approach: AI learns everything from scratch by studying massive amounts of data - DeepSeek's approach: Use existing AI models as teachers - Like having experienced programmers mentor new developers: - Trial & Error Learning (for their R1 model) - Started with some basic "tutoring" from advanced models - Then let it practice solving problems on its own - When it found good solutions, these were fed back into training - Led to "Aha moments" where R1 discovered better ways to solve problems - Finally, polished its ability to explain its thinking clearly to humans - Smart Team Management (Mixture of Experts) - Instead of one massive system that does everything, built a team of specialists - Like running a software company with: - 256 specialists who focus on different areas - 1 generalist who helps with everything - Smart project manager who assigns work efficiently - For each task, only need 8 specialists plus the generalist - More efficient than having everyone work on everything - Efficient Memory Management (Multi-head Latent Attention) - Traditional AI is like keeping complete transcripts of every conversation - DeepSeek's approach is like taking smart meeting minutes - Captures key information in compressed format - Similar to how JPEG compresses images - Looking Ahead (Multi-Token Prediction) - Traditional AI reads one word at a time - DeepSeek looks ahead and predicts two words at once - Like a skilled reader who can read ahead while maintaining comprehension Why This Matters - Cost Revolution: Training costs of $5.6M (vs hundreds of millions) suggests a future where AI development isn't limited to tech giants. - Working Around Constraints: Shows how limitations can drive innovation—DeepSeek achieved state-of-the-art results without access to the most powerful chips (at least that’s the best conclusion at the moment). What’s Interesting - Efficiency vs Power: Challenges the assumption that advancing AI requires ever-increasing computing power - sometimes smarter engineering beats raw force. - Self-Teaching AI: R1's ability to develop reasoning capabilities through pure reinforcement learning suggests AIs can discover problem-solving methods on their own. - AI Teaching AI: The success of distillation shows how knowledge can be transferred between AI models, potentially leading to compounding improvements over time. - IP for Free: If DeepSeek can be such a fast follower through distillation, what’s the advantage of OpenAI, Google, or another company to release a novel model?