You've seen openai/gpt-2.
You've seen karpathy/minGPT.
You've also seen karpathy/nanoGPT!
You've even seen picoGPT!!
But you probably don't want to see stupidGPT...
stupidGPT
is exactly what it sounds like: a really stupid partial rewrite of the (unlike this project) excellent and well-written picoGPT.
What makes it so stupid?
Well...
stupidGPT "features":
- NumPy? ❌ Nope! I've ripped it out completely. This is a pure Python implementation.
- Fast? ❌ Couldn't be further from the truth. stupidGPT takes around 1 minute per token with the smallest model and a relatively short prompt. With the largest model, expect 10 or more minutes per token.
- Usable? ✅ Technically, yes! stupidGPT is fully functional. But I wouldn't consider it usable in any other sense of the word.
- Training code? ❌ Error, 4️⃣0️⃣4️⃣ not found
- Batch inference? ❌ stupidGPT is civilized, single file line, one at a time only
- top-p sampling? ❌ top-k? ❌ temperature? ❌ categorical sampling?! ❌ greedy? ✅
- Readable? ❌ No. I've taken special care to ensure all of my NumPy replacement functions are single-line list comprehensions. (with a couple exceptions)
- Smol??? ❌ HAHA NO
I want to fundamentally understand the underlying mathematics behind GPT, which means I'm going without the luxury of NumPy abstraction. I generally learn that sort of thing better this way.
Unfortunately for you, that means this is the last option you should consider for GPT-2 inference. But who knows, maybe you might learn something too?