Skip to content

bluescreen/diy-gpt

Repository files navigation

Do it yourself GPT

GPT implementation using pytorch to understand the GPT Architecture

GPT

Notes:

  • nn1.py Basic 3 Layer neural net
  • nn2.py Optimizer Adam, Extract Model
  • nn3.py Text Encoding/ Decoding
  • nn4.py Embeddings
  • nn5.py Attention Head
  • nn6.py Add Self Attention
  • main.py Using Reweight
  • gpt.py GPT Model Karpathy

Terms:

  • SGD = Stochastic Gradient Descent
  • ADAM = Adaptive Moment Estimation
  • MSE = Mean Squared Error
  • RELU = Rectified linear activation unit
  • Masked Self Attention: Only look on items before you

Papers:

GPT-3 Paper, 2015

Attention is all you need (GPT Architecture, 2017

The annotated transformer

References:

GPT Source from Andrey Karpathy

Nano GPT

Videos:

From Zero To GPT & Beyond - Fast Paced Beginner Friendly Tutorial On Neural Networks

Let's build GPT: from scratch, in code, spelled out.

How do transformers work? (Attention is all you need)

About

Do it yourself GPT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages