Skip to content

MultiHeadAttention Module #167

Closed Answered by rasbt
vamsikumbuf asked this question in Q&A
May 20, 2024 · 1 comments · 1 reply
Discussion options

You must be logged in to vote

That's a great question. At first glance, it definitely looks like these should be relatively similar. However, reshaping (viewing) and transposing are not the same.

E.g., you can try:

a = torch.tensor([[1, 2, 3], [4, 5, 6]])
a.view(3, 2)

and compare it to

a.transpose(0, 1)

Both will be 3x2 tensors, but the contents are arranged differently.

One might think the LLM might still pretrain fine. However, when you try to load the pretrained weights from OpenAI in chapter 5 or chapter 6, you will see that the outputs by the LLM will be garbled.

As an experiment, you can change the lines here:

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@vamsikumbuf
Comment options

Answer selected by vamsikumbuf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants