You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hope you can help with this. I'm trying to implement ring attention using Llama 3 architecture and I'm starting with the blockwise parallel transformer piece. My question is when do I start to break the input sequence into chunks 1) after projection of weights to Q, K, and V or 2) prior to self-attention in the block?
Any feedback would be much appreciated :)
The text was updated successfully, but these errors were encountered:
Hope you can help with this. I'm trying to implement ring attention using Llama 3 architecture and I'm starting with the blockwise parallel transformer piece. My question is when do I start to break the input sequence into chunks 1) after projection of weights to Q, K, and V or 2) prior to self-attention in the block?
Any feedback would be much appreciated :)
The text was updated successfully, but these errors were encountered: