Skip to content

pos_embedding_layer question #11

Answered by rasbt
nicolaleo asked this question in Q&A
Dec 28, 2023 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.

I will modify this using a separate parameter to make it more clear. E.g.,

token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(context_len, output_dim)

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@nicolaleo
Comment options

@rasbt
Comment options

Answer selected by rasbt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants