nanogpt-speedrun

Reproducing GPT-2 (124M) as fast as possible on an RTX 4090.

karparthy:

The 124M model is the smallest model in the GPT-2 series released by OpenAI in 2019, and is actually quite accessible today, even for the GPU poor...You can train the model with a single GPU too, it would just take proportionally longer (e.g. ~4-24 hours depending on the GPU).

This repo is heavily influced by https://github.com/KellerJordan/modded-nanogpt. The initial baseline here was taken directly from the initial commit of that repo, with minor modifications.

setup

uv sync --all-extras
uv run python src/data/cached_fineweb10B.py
./run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

nanogpt-speedrun

setup

Files

README.md

Latest commit

History

README.md

File metadata and controls

nanogpt-speedrun

setup