Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement selective batching for vllm #9659

Merged
merged 26 commits into from
Dec 22, 2023
Merged

Implement selective batching for vllm #9659

merged 26 commits into from
Dec 22, 2023

Conversation

gc-fu
Copy link
Contributor

@gc-fu gc-fu commented Dec 12, 2023

Description

Implement selective batching for vLLM.

User can use an argument VLLM_ENABLE_SELECTIVE_BATCHING to control whether to enable selective_batching feature

@gc-fu gc-fu marked this pull request as ready for review December 21, 2023 07:45
@gc-fu gc-fu requested review from glorysdj and xiangyuT December 21, 2023 08:02
Copy link
Contributor

@glorysdj glorysdj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@xiangyuT xiangyuT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gc-fu gc-fu merged commit 11987e6 into intel:main Dec 22, 2023
liu-shaojun pushed a commit that referenced this pull request Mar 25, 2024
* add control to load hf model

* finish initial version of selective_batching

* temp

* finish

* Remove print statement

* fix error

* Apply yang's optimization

* a version that works

* We need to check kv_cache passed in, this could be an error. TODO: add fast decoding path

* format

* temp solution: not batching prefill requests

* a version that works for prefill batching

* format

* a solid version: works normally

* a temp version

* Solid version: remove redundant functions

* fix format

* format

* solid: add option to enable selective_batching

* remove logic for using transformer models

* format

* format

* solid: enable argument VLLM_ENABLE_SELECTIVE_BATCHING

* format

* finish

* format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants