GPT implementation using pytorch to understand the GPT Architecture
- nn1.py Basic 3 Layer neural net
- nn2.py Optimizer Adam, Extract Model
- nn3.py Text Encoding/ Decoding
- nn4.py Embeddings
- nn5.py Attention Head
- nn6.py Add Self Attention
- main.py Using Reweight
- gpt.py GPT Model Karpathy
- SGD = Stochastic Gradient Descent
- ADAM = Adaptive Moment Estimation
- MSE = Mean Squared Error
- RELU = Rectified linear activation unit
- Masked Self Attention: Only look on items before you
Attention is all you need (GPT Architecture, 2017
GPT Source from Andrey Karpathy
From Zero To GPT & Beyond - Fast Paced Beginner Friendly Tutorial On Neural Networks