Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 764 Bytes

README.md

File metadata and controls

18 lines (12 loc) · 764 Bytes

nanogpt-speedrun

Reproducing GPT-2 (124M) as fast as possible on an RTX 4090.

karparthy:

The 124M model is the smallest model in the GPT-2 series released by OpenAI in 2019, and is actually quite accessible today, even for the GPU poor...You can train the model with a single GPU too, it would just take proportionally longer (e.g. ~4-24 hours depending on the GPU).

This repo is heavily influced by https://github.com/KellerJordan/modded-nanogpt. The initial baseline here was taken directly from the initial commit of that repo, with minor modifications.

See also: karpathy/llm.c#481 and https://github.com/tysam-code/hlb-gpt

setup

uv sync --all-extras
uv run python src/data/cached_fineweb10B.py
./run.sh