Open R1

A fully open reproduction of DeepSeek-R1. This repo is work in progress, let's build it together!

Overview

The goal of this repo is to build the missing pieces of the R1 pipeline such that everybody can reproduce and build on top of it. The project is simple by design and mostly consists of:

src/open_r1 contains the scripts to train and evaluate models as well generate synthetic data:
- grpo.py: trains a model with GRPO on a given dataset.
- sft.py: simple SFT of a model on a dataset.
- evaluate.py: evaluates a model on the R1 benchmarks.
- generate.py: generate synthetic data from a model using Distilabel.
Makefile contains an easy to run command for each step in the R1 pipeline leveraging the scipts above.

Plan of attack

We will use the DeepSeek-R1 tech report as a guide, which can roughly be broken down into three main steps:

Step 1: replicate the R1-Distill models by distilling a high-quality corpus from DeepSeek-R1.
Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will likely involve curating new, large-scale datasets for math, reasoning, and code.
Step 3: show we can go from base model to RL-tuned via multi-stage training.

Installation

To run the code in this project, first, create a Python virtual environment using e.g. Conda:

conda create -n openr1 python=3.11 && conda activate openr1

Next, install vLLM:

pip install vllm==0.6.6.post1

# For HF (cluster only has CUDA 12.1)
pip install vllm==0.6.6.post1 --extra-index-url https://download.pytorch.org/whl/cu121

This will also install PyTorch v2.5.1 and it is very important to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case via pip install -e .[LIST OF MODES]. For most contributors, we recommend:

pip install -e ".[dev]"

Next, log into your Hugging Face and Weights and Biases accounts as follows:

huggingface-cli login
wandb login

Finally, check your system has Git LFS installed so that you can load and push models/datasets to the Hugging Face Hub:

git-lfs --version

If it isn't installed, run:

sudo apt-get install git-lfs

Training models

We support training models with either DDP or DeepSpeed ZeRO-2 and ZeRO-3. To switch between methods, simply change the path to the accelerate YAML config in configs.

Note

The training commands below are configured for a node of 8 x H100s (80GB). For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps.

SFT

To run SFT on a dataset distilled from DeepSeek-R1 with reasoning traces such as Bespoke-Stratos-17k, run:

accelerate launch --config_file=configs/zero3.yaml src/open_r1/sft.py \
    --model_name_or_path Qwen/Qwen2.5-Math-1.5B-Instruct \
    --dataset_name HuggingFaceH4/Bespoke-Stratos-17k \
    --learning_rate 2.0e-5 \
    --num_train_epochs 1 \
    --packing \
    --max_seq_length 4096 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --gradient_checkpointing \
    --bf16 \
    --logging_steps 5 \
    --eval_strategy steps \
    --eval_steps 100 \
    --output_dir data/Qwen2.5-1.5B-Open-R1-Distill

To launch a Slurm job, run:

sbatch --output=/path/to/logs/%x-%j.out --err=/path/to/logs/%x-%j.err slurm/sft.slurm {model} {dataset} {accelerator}

Here {model} and {dataset} refer to the model and dataset IDs on the Hugging Face Hub, while {accelerator} refers to the choice of 🤗 Accelerate config in configs.

GRPO

accelerate launch --config_file configs/zero3.yaml src/open_r1/grpo.py \
    --output_dir DeepSeek-R1-Distill-Qwen-7B-GRPO \
    --model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
    --dataset_name AI-MO/NuminaMath-TIR \
    --max_prompt_length 256 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --logging_steps 10 \
    --bf16

Evaluating models

We use lighteval to evaluate models, with custom tasks defined in src/open_r1/evaluate.py. For models which fit on a single GPU, run:

MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
MODEL_ARGS="pretrained=$MODEL,dtype=float16,max_model_length=32768,gpu_memory_utilisation=0.8"
TASK=aime24
OUTPUT_DIR=data/evals/$MODEL

lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
    --custom-tasks src/open_r1/evaluate.py \
    --use-chat-template \
    --system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
    --output-dir $OUTPUT_DIR

To increase throughput across multiple GPUs, use data parallel as follows:

NUM_GPUS=8
MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
MODEL_ARGS="pretrained=$MODEL,dtype=float16,data_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilisation=0.8"
TASK=aime24
OUTPUT_DIR=data/evals/$MODEL

lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
    --custom-tasks src/open_r1/evaluate.py \
    --use-chat-template \
    --system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
    --output-dir $OUTPUT_DIR

For large models which require sharding across GPUs, use tensor parallel and run:

NUM_GPUS=8
MODEL=deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
MODEL_ARGS="pretrained=$MODEL,dtype=float16,tensor_parallel_size=$NUM_GPUS,max_model_length=32768,gpu_memory_utilisation=0.8"
TASK=aime24
OUTPUT_DIR=data/evals/$MODEL

export VLLM_WORKER_MULTIPROC_METHOD=spawn
lighteval vllm $MODEL_ARGS "custom|$TASK|0|0" \
    --custom-tasks src/open_r1/evaluate.py \
    --use-chat-template \
    --system-prompt="Please reason step by step, and put your final answer within \boxed{}." \
    --output-dir $OUTPUT_DIR

Data generation

Generate data from a smol distilled R1 model

The following example can be run in 1xH100. First install the following dependencies:

pip install "distilabel[vllm]>=1.5.2"

Now save the following snippet into a file named pipeline.py and run with python pipeline.py. It will generate for each of the 10 examples 4 generations (change the username for the repository to your org/user name):

from datasets import load_dataset
from distilabel.models import vLLM
from distilabel.pipeline import Pipeline
from distilabel.steps.tasks import TextGeneration


prompt_template = """\
You will be given a problem. Please reason step by step, and put your final answer within \boxed{}:
{{ instruction }}"""

dataset = load_dataset("AI-MO/NuminaMath-TIR", split="train").select(range(10))

model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"  # Exchange with another smol distilled r1

with Pipeline(
    name="distill-qwen-7b-r1",
    description="A pipeline to generate data from a distilled r1 model",
) as pipeline:

    llm = vLLM(
        model=model_id,
        tokenizer=model_id,
        extra_kwargs={
            "tensor_parallel_size": 1,
            "max_model_len": 8192,
        },
        generation_kwargs={
            "temperature": 0.6,
            "max_new_tokens": 8192,
        },
    )
    prompt_column = "problem"
    text_generation = TextGeneration(
        llm=llm, 
        template=prompt_template,
        num_generations=4,
        input_mappings={"instruction": prompt_column} if prompt_column is not None else {}
    )


if __name__ == "__main__":
    distiset = pipeline.run(dataset=dataset)
    distiset.push_to_hub(repo_id="username/numina-deepseek-r1-qwen-7b")

Take a look at the sample dataset at HuggingFaceH4/numina-deepseek-r1-qwen-7b.

Generate data from DeepSeek-R1

To run the bigger DeepSeek-R1, we used 2 nodes of 8xH100 each one, using the slurm file present in this repo at slurm/generate.slurm. First, install the dependencies:

(for now we need to install the vllm dev wheel that fixes the R1 cuda graph capture)

pip install https://wheels.vllm.ai/221d388cc5a836fa189305785ed7e887cea8b510/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu121

pip install "distilabel[vllm,ray,openai]>=1.5.2"

And then, place the generate.slurm file at the same level as src/open_r1/generate.py (it will try to run the file in the relative path), and run the following command:

sbatch generate.slurm \
    --hf-dataset AI-MO/NuminaMath-TIR \
    --temperature 0.6 \
    --prompt-column problem \
    --model deepseek-ai/DeepSeek-R1 \
    --hf-output-dataset username/r1-dataset

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
slurm		slurm
src/open_r1		src/open_r1
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open R1

Overview

Plan of attack

Installation

Training models

SFT

GRPO

Evaluating models

Data generation

Generate data from a smol distilled R1 model

Generate data from DeepSeek-R1

About

Releases

Packages

Languages

License

cwz427/open-r1

Folders and files

Latest commit

History

Repository files navigation

Open R1

Overview

Plan of attack

Installation

Training models

SFT

GRPO

Evaluating models

Data generation

Generate data from a smol distilled R1 model

Generate data from DeepSeek-R1

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages