-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Multi-sequence broken #11898
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Andy Lo <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
vllm/sequence.py
Outdated
n = self.sampling_params.n | ||
assert isinstance(n, int) | ||
if n > self.num_seqs(): | ||
# At prompt stage, the sequence group is not yet filled up | ||
# and only have one sequence running. However, in the | ||
# generation stage, we will have `n` sequences | ||
# running. | ||
return n | ||
# At sampling stages, return the number of actual sequences | ||
# that are not finished yet. | ||
return self.num_seqs() - self.num_finished_seqs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will we hit this? I think the engine will only see single-sequence request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you construct the output when n > 1
you access the "master group".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, you construct the RequestOutput with multiple sequences here:
Line 180 in ef725fe
return cls.from_seq_group(assembled_seq_group, use_cache, |
Then call master_seq_group.is_finished()
here:
Line 170 in ef725fe
finished = seq_group.is_finished() |
Which currently already becomes True
when the first sequence terminates (regardless of whether the other sequences has terminated)
params = copy.deepcopy(original_params) | ||
params.n = 1 | ||
if params.seed is not None: | ||
params.seed += i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part makes sense to me.
@andylolu2 thanks for the fix! can you add a test case for |
0be80f4
to
7c31b9c
Compare
@youkaichao I added new asserts in the current tests to ensure each sample in the same parallel-sampling group gives different results. |
Fixes the bugs introduced in #9569
SequenceGroup
does not necessarily contain only one sequence (e.g. whenn
> 1), so many of the optimisations don't make sense.n
> 1 with seed set, all completions give the same output.