Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
TLDR;
Enable application of staggered rope embeddings to different sequences within the same batch.
During generation tasks, different sequences in a batch might have different start positions (technically different end positions as well but that's bounded by the max sequence length in the batch so something we can afford to ignore for now). This change simply modifies the rope kernel to apply the rope embeddings in a staggered manner to different sequences in the batch using an argument
start_positions
.(The
start_positions
and related changes are directly adapted from #829 which was authored by @pggPL)Fixes # (issue)
Type of change
Changes
apply_rotary_pos_emb
function but this is non breaking sincestart_positions
is a default kwarg here.FusedRoPEFunc
and all the extensions/kernels that are called internally by this function.