Enhancing LLM Serving with ZenTorch on AMD Gen5 CPUs #13174
Manoj-red-hat
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
With the recent advancements in ZenTorch, PyTorch workloads have seen significant speedups, particularly on AMD's latest Genoa and Turin (Gen5) CPUs (Hugging Face + AMD blog). This presents a great opportunity for optimizing LLM inference on CPU-based deployments.
I am already working on this and can lead the effort to integrate ZenTorch into vLLM, enabling enhanced serving performance for users leveraging AMD’s latest hardware. This could provide a highly efficient, cost-effective solution for CPU-based LLM inference, especially in environments where GPUs are constrained.
Would love to discuss how we can collaborate on this!
Beta Was this translation helpful? Give feedback.
All reactions