Adapt LeaderWorkerSet to implement multi-node dirtributed inference #4001

JesseStutler · 2025-02-07T13:59:58Z

What is the problem you're trying to solve

BackGround

The development and application of large language models are experiencing explosive growth, with open-source models like DeepSeek-R1 continuously emerging, driving the demand for developers to deploy large models in local environments. However, as the scale of model parameters continues to grow, the memory capacity of a single device has become insufficient to accommodate the complete model. Some inference frameworks have begun actively exploring multi-node distributed inference solutions:

vLLM: https://docs.vllm.ai/en/latest/serving/distributed_serving.html
llama.cpp: https://github.com/ggerganov/llama.cpp/tree/master/examples/rpc
TensorRT-LLM: https://nvidia.github.io/TensorRT-LLM/architecture/core-concepts.html#multi-gpu-multi-node
Aphrodite: https://aphrodite.pygmalion.chat/pages/usage/distributed

New API for multi-node distributed inference

LeaderWorkerSet

k8s sig has designed a new API for multi-node distributed inference scenario, called LeaderWorkerSet:
https://github.com/kubernetes-sigs/lws

KServe ServingRuntime/ClusterServingRuntime WorkerSpec

Even KServe has modified their serving API, add a new field called WorkerSpec to implement multi-node distributed inference

After discussing with @Monokaix @hwdef , we'd better implement LeaderWorkerSet first and get end users' feedback.

Describe the solution you'd like

LeaderWorkerSet has the concept of logical PodGroup when it is designed, corresponding to 1 Leader + n Workers. Volcano needs to keep this logical PodGroup concept consistent with Volcano's PodGroup. The replicas in LeaderWorkerSet represent the number of Volcano PodGroups to be created. One of the tasks is Leader Pod, the replica is 1, and the other task is Workers. So there are following tasks need to be adapted:

Add a LeaderWorkSet controller, reconcile to create podgroups for lws
Implement network topology aware scheduling for worker pods
Adapt LeaderWorkSet RestartPolicy

Additional context

No response

JesseStutler · 2025-02-07T14:01:31Z

milestone v1.12, may need to start implement soon

JesseStutler added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt LeaderWorkerSet to implement multi-node dirtributed inference #4001

Adapt LeaderWorkerSet to implement multi-node dirtributed inference #4001

JesseStutler commented Feb 7, 2025 •

edited

Loading

JesseStutler commented Feb 7, 2025

Adapt LeaderWorkerSet to implement multi-node dirtributed inference #4001

Adapt LeaderWorkerSet to implement multi-node dirtributed inference #4001

Comments

JesseStutler commented Feb 7, 2025 • edited Loading

What is the problem you're trying to solve

BackGround

New API for multi-node distributed inference

LeaderWorkerSet

KServe ServingRuntime/ClusterServingRuntime WorkerSpec

Describe the solution you'd like

Additional context

JesseStutler commented Feb 7, 2025

JesseStutler commented Feb 7, 2025 •

edited

Loading