MultiHeadAttention Module #167
-
Inside the MultiHeadAttention module in Chapter-03, can we convert the following code:
to
Does it change anything while calculating the final context vectors using contiguous() ?? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
That's a great question. At first glance, it definitely looks like these should be relatively similar. However, reshaping (viewing) and transposing are not the same. E.g., you can try:
and compare it to
Both will be 3x2 tensors, but the contents are arranged differently. One might think the LLM might still pretrain fine. However, when you try to load the pretrained weights from OpenAI in chapter 5 or chapter 6, you will see that the outputs by the LLM will be garbled. As an experiment, you can change the lines here: LLMs-from-scratch/ch06/01_main-chapter-code/previous_chapters.py Lines 85 to 94 in 451a629 and then run the chapter 6 code. You will see that the response here will be nonsensical after the modification (below I am showing the original text to show you which code I mean): |
Beta Was this translation helpful? Give feedback.
That's a great question. At first glance, it definitely looks like these should be relatively similar. However, reshaping (viewing) and transposing are not the same.
E.g., you can try:
and compare it to
Both will be 3x2 tensors, but the contents are arranged differently.
One might think the LLM might still pretrain fine. However, when you try to load the pretrained weights from OpenAI in chapter 5 or chapter 6, you will see that the outputs by the LLM will be garbled.
As an experiment, you can change the lines here:
LLMs-from-scratch/ch06/01_main-chapter-code/previous_chapters.py
Lines 85 to 94 in 451a629