-
is it correct that the embeddings for token and position have the same input size equals to vocab_size? token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now. I will modify this using a separate parameter to make it more clear. E.g., token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(context_len, output_dim) |
Beta Was this translation helpful? Give feedback.
Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.
I will modify this using a separate parameter to make it more clear. E.g.,