[Bugfix] Multi-sequence broken #11898

andylolu2 · 2025-01-09T13:08:42Z

Fixes the bugs introduced in #9569

SequenceGroup does not necessarily contain only one sequence (e.g. when n > 1), so many of the optimisations don't make sense.
Currently the seed is duplicated across all completions, so when we have n > 1 with seed set, all completions give the same output.
Currently only the first sequence in a ParallelSampleSequenceGroup yields responses. But once the first sequence finishes it won't receive new chunks. This means responses from other sequences are not sent when the first sequence terminates first.

Signed-off-by: Andy Lo <[email protected]>

github-actions · 2025-01-09T13:08:57Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

andylolu2 · 2025-01-09T13:09:12Z

@youkaichao

youkaichao · 2025-01-10T08:13:57Z

vllm/sequence.py

+            n = self.sampling_params.n
+            assert isinstance(n, int)
+            if n > self.num_seqs():
+                # At prompt stage, the sequence group is not yet filled up
+                # and only have one sequence running. However, in the
+                # generation stage, we will have `n` sequences
+                # running.
+                return n
+        # At sampling stages, return the number of actual sequences
+        # that are not finished yet.
+        return self.num_seqs() - self.num_finished_seqs()


when will we hit this? I think the engine will only see single-sequence request

When you construct the output when n > 1 you access the "master group".

For example, you construct the RequestOutput with multiple sequences here:

vllm/vllm/outputs.py

Line 180 in ef725fe

return cls.from_seq_group(assembled_seq_group, use_cache,

Then call master_seq_group.is_finished() here:

vllm/vllm/outputs.py

Line 170 in ef725fe

finished = seq_group.is_finished()

Which currently already becomes True when the first sequence terminates (regardless of whether the other sequences has terminated)

youkaichao · 2025-01-10T08:14:13Z

vllm/sequence.py

+            params = copy.deepcopy(original_params)
+            params.n = 1
+            if params.seed is not None:
+                params.seed += i


this part makes sense to me.

youkaichao · 2025-01-10T12:02:01Z

@andylolu2 thanks for the fix! can you add a test case for n > 1 and seed to make sure they are different?

andylolu2 · 2025-01-12T22:01:05Z

@youkaichao I added new asserts in the current tests to ensure each sample in the same parallel-sampling group gives different results.

Fix multi-sequence bugs

f9a2eb2

Signed-off-by: Andy Lo <[email protected]>

youkaichao reviewed Jan 10, 2025

View reviewed changes

andylolu2 force-pushed the main branch 2 times, most recently from 0be80f4 to 7c31b9c Compare January 12, 2025 21:58

Add test and fix non-streaming multi-sequence

debff7f

andylolu2 force-pushed the main branch from 7c31b9c to debff7f Compare January 12, 2025 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Multi-sequence broken #11898

[Bugfix] Multi-sequence broken #11898

andylolu2 commented Jan 9, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 9, 2025

andylolu2 commented Jan 9, 2025

youkaichao Jan 10, 2025

andylolu2 Jan 10, 2025

andylolu2 Jan 10, 2025 •

edited

Loading

youkaichao Jan 10, 2025

youkaichao commented Jan 10, 2025

andylolu2 commented Jan 12, 2025 •

edited

Loading

[Bugfix] Multi-sequence broken #11898

Are you sure you want to change the base?

[Bugfix] Multi-sequence broken #11898

Conversation

andylolu2 commented Jan 9, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 9, 2025

andylolu2 commented Jan 9, 2025

youkaichao Jan 10, 2025

Choose a reason for hiding this comment

andylolu2 Jan 10, 2025

Choose a reason for hiding this comment

andylolu2 Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

youkaichao Jan 10, 2025

Choose a reason for hiding this comment

youkaichao commented Jan 10, 2025

andylolu2 commented Jan 12, 2025 • edited Loading

andylolu2 commented Jan 9, 2025 •

edited by github-actions bot

Loading

andylolu2 Jan 10, 2025 •

edited

Loading

andylolu2 commented Jan 12, 2025 •

edited

Loading