Post-Training

Post-training extends pre-trained models with dynamic learning capabilities. Post-trained models can:

Self-improve through reinforcement learning
Develop multi-step reasoning abilities
Adapt to specific domains while maintaining generalization

Scale your post-training workloads

Post-training demands more resources than pre-training or traditional fine-tuning, requiring parallel inference at scale to evaluate and refine outputs.

Access on-demand GPU infrastructure to:

Run post-training without long-term infrastructure commitments
Scale compute usage up or down as needed
Deploy distributed clusters for large-scale reinforcement learning

Get Compute

Case Study

Chess: post-training in action

We applied post-training to one of AI's oldest benchmarks: chess.

Why Chess?

Unlike traditional chess engines that rely on search trees, we applied post-training to DeepSeek R1, creating a model that reasons about moves like humans—using reinforcement learning on self-generated thought chains.

This first large-scale post-training effort on an open-source reasoning model demonstrates techniques applicable to other structured problem domains.

Read the Whitepaper

Stay updated on post-training research

Post-training is evolving rapidly. We're developing new approaches, running experiments at scale, and simplifying access to compute for reasoning models.

PrePreTrained

Post-training

Scale your post-training workloads

Case Study

Chess: post-training in action

Why Chess?

Stay updated on post-training research

Pre
Trained