Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add batched prefill via VLLM_SCHED_PREFILL_COUNT
To ensure we we don't run prefills repeatedly during decode, provide a mechanism to queue up a certain number of prefills before executing. VLLM_SCHED_PREFILL_COUNT will be the minimum batch count to specify before executing. One caveat, the --scheduler-delay-factor should be used to enforce a longer prefill scheduling value. This will be set to the value in VLLM_SCHED_PREFILL_COUNT, if not explicitly provided. The need for this exists because an uneven number of prefills can lead to the queue never reaching the VLLM_SCHED_PREFILL_COUNT. Causing the server to hang
- Loading branch information