You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The development and application of large language models are experiencing explosive growth, with open-source models like DeepSeek-R1 continuously emerging, driving the demand for developers to deploy large models in local environments. However, as the scale of model parameters continues to grow, the memory capacity of a single device has become insufficient to accommodate the complete model. Some inference frameworks have begun actively exploring multi-node distributed inference solutions:
Even KServe has modified their serving API, add a new field called WorkerSpec to implement multi-node distributed inference
After discussing with @Monokaix@hwdef , we'd better implement LeaderWorkerSet first and get end users' feedback.
Describe the solution you'd like
LeaderWorkerSet has the concept of logical PodGroup when it is designed, corresponding to 1 Leader + n Workers. Volcano needs to keep this logical PodGroup concept consistent with Volcano's PodGroup. The replicas in LeaderWorkerSet represent the number of Volcano PodGroups to be created. One of the tasks is Leader Pod, the replica is 1, and the other task is Workers. So there are following tasks need to be adapted:
Add a LeaderWorkSet controller, reconcile to create podgroups for lws
Implement network topology aware scheduling for worker pods
Adapt LeaderWorkSet RestartPolicy
Additional context
No response
The text was updated successfully, but these errors were encountered:
What is the problem you're trying to solve
BackGround
The development and application of large language models are experiencing explosive growth, with open-source models like DeepSeek-R1 continuously emerging, driving the demand for developers to deploy large models in local environments. However, as the scale of model parameters continues to grow, the memory capacity of a single device has become insufficient to accommodate the complete model. Some inference frameworks have begun actively exploring multi-node distributed inference solutions:
New API for multi-node distributed inference
LeaderWorkerSet
k8s sig has designed a new API for multi-node distributed inference scenario, called LeaderWorkerSet:
https://github.com/kubernetes-sigs/lws
KServe ServingRuntime/ClusterServingRuntime WorkerSpec
Even KServe has modified their serving API, add a new field called WorkerSpec to implement multi-node distributed inference
After discussing with @Monokaix @hwdef , we'd better implement LeaderWorkerSet first and get end users' feedback.
Describe the solution you'd like
LeaderWorkerSet has the concept of logical PodGroup when it is designed, corresponding to 1 Leader + n Workers. Volcano needs to keep this logical PodGroup concept consistent with Volcano's PodGroup. The replicas in LeaderWorkerSet represent the number of Volcano PodGroups to be created. One of the tasks is Leader Pod, the replica is 1, and the other task is Workers. So there are following tasks need to be adapted:
Additional context
No response
The text was updated successfully, but these errors were encountered: