Implement selective batching for vllm #9659

gc-fu · 2023-12-12T08:05:55Z

Description

Implement selective batching for vLLM.

User can use an argument VLLM_ENABLE_SELECTIVE_BATCHING to control whether to enable selective_batching feature

…d fast decoding path

python/llm/src/bigdl/llm/transformers/models/llama.py

python/llm/src/bigdl/llm/vllm/model_executor/models/bigdl_llama.py

glorysdj

LGTM

xiangyuT

LGTM

* add control to load hf model * finish initial version of selective_batching * temp * finish * Remove print statement * fix error * Apply yang's optimization * a version that works * We need to check kv_cache passed in, this could be an error. TODO: add fast decoding path * format * temp solution: not batching prefill requests * a version that works for prefill batching * format * a solid version: works normally * a temp version * Solid version: remove redundant functions * fix format * format * solid: add option to enable selective_batching * remove logic for using transformer models * format * format * solid: enable argument VLLM_ENABLE_SELECTIVE_BATCHING * format * finish * format

gc-fu requested a review from xiangyuT December 12, 2023 08:07

gc-fu force-pushed the sb_dev branch from b8984ce to 32c9087 Compare December 19, 2023 09:57

gc-fu added 19 commits December 21, 2023 13:30

add control to load hf model

b30eac7

finish initial version of selective_batching

7dbe7ce

temp

dec985a

finish

9132a5a

Remove print statement

4baaf9e

fix error

59ad16a

Apply yang's optimization

35a0ef8

a version that works

196e878

We need to check kv_cache passed in, this could be an error. TODO: ad…

6fa0f68

…d fast decoding path

format

6e883c4

temp solution: not batching prefill requests

39ff898

a version that works for prefill batching

6ebc580

format

9f4435a

a solid version: works normally

16bd217

a temp version

cc4cc9d

Solid version: remove redundant functions

888eb70

fix format

39a8a11

format

861c072

solid: add option to enable selective_batching

29b2e60

gc-fu force-pushed the sb_dev branch from 2f77092 to 29b2e60 Compare December 21, 2023 06:06

gc-fu added 4 commits December 21, 2023 14:12

remove logic for using transformer models

fa68542

format

b7d300a

format

2f4abf1

solid: enable argument VLLM_ENABLE_SELECTIVE_BATCHING

3e908f7

gc-fu marked this pull request as ready for review December 21, 2023 07:45

format

f1216ae

xiangyuT reviewed Dec 21, 2023

View reviewed changes

python/llm/src/bigdl/llm/transformers/models/llama.py Outdated Show resolved Hide resolved

xiangyuT reviewed Dec 21, 2023

View reviewed changes

python/llm/src/bigdl/llm/transformers/models/llama.py Outdated Show resolved Hide resolved

xiangyuT reviewed Dec 21, 2023

View reviewed changes

python/llm/src/bigdl/llm/transformers/models/llama.py Outdated Show resolved Hide resolved

gc-fu requested review from glorysdj and xiangyuT December 21, 2023 08:02

xiangyuT reviewed Dec 21, 2023

View reviewed changes

python/llm/src/bigdl/llm/vllm/model_executor/models/bigdl_llama.py Show resolved Hide resolved

glorysdj approved these changes Dec 22, 2023

View reviewed changes

gc-fu added 2 commits December 22, 2023 10:39

finish

ec4a84e

format

a473fbc

xiangyuT approved these changes Dec 22, 2023

View reviewed changes

gc-fu merged commit 11987e6 into intel:main Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement selective batching for vllm #9659

Implement selective batching for vllm #9659

gc-fu commented Dec 12, 2023 •

edited

Loading

glorysdj left a comment

xiangyuT left a comment

Implement selective batching for vllm #9659

Implement selective batching for vllm #9659

Conversation

gc-fu commented Dec 12, 2023 • edited Loading

Description

glorysdj left a comment

Choose a reason for hiding this comment

xiangyuT left a comment

Choose a reason for hiding this comment

gc-fu commented Dec 12, 2023 •

edited

Loading