Pre
Trained

$ kubectl apply -f post-training.yaml

epoch 42

000/275 [................] - eta: 6s

 

Post-training

Post-training extends pre-trained models with dynamic learning capabilities. Post-trained models can:

  • Self-improve through reinforcement learning
  • Develop multi-step reasoning abilities
  • Adapt to specific domains while maintaining generalization

Scale your post-training workloads

Post-training demands more resources than pre-training or traditional fine-tuning, requiring parallel inference at scale to evaluate and refine outputs.

Access on-demand GPU infrastructure to:

  • Run post-training without long-term infrastructure commitments
  • Scale compute usage up or down as needed
  • Deploy distributed clusters for large-scale reinforcement learning
Get Compute

Case Study

Chess: post-training in action

We applied post-training to one of AI's oldest benchmarks: chess.

Why Chess?

Unlike traditional chess engines that rely on search trees, we applied post-training to DeepSeek R1, creating a model that reasons about moves like humans—using reinforcement learning on self-generated thought chains.

This first large-scale post-training effort on an open-source reasoning model demonstrates techniques applicable to other structured problem domains.

Read the Whitepaper

Stay updated on post-training research

Post-training is evolving rapidly. We're developing new approaches, running experiments at scale, and simplifying access to compute for reasoning models.