$ kubectl apply -f post-training.yaml
epoch 42
000/275 [................] - eta: 6s
Post-training extends pre-trained models with dynamic learning capabilities. Post-trained models can:
Post-training demands more resources than pre-training or traditional fine-tuning, requiring parallel inference at scale to evaluate and refine outputs.
Access on-demand GPU infrastructure to:
We applied post-training to one of AI's oldest benchmarks: chess.
Unlike traditional chess engines that rely on search trees, we applied post-training to DeepSeek R1, creating a model that reasons about moves like humans—using reinforcement learning on self-generated thought chains.
This first large-scale post-training effort on an open-source reasoning model demonstrates techniques applicable to other structured problem domains.
Read the WhitepaperPost-training is evolving rapidly. We're developing new approaches, running experiments at scale, and simplifying access to compute for reasoning models.