Running DeepSeek-R1-AWQ on single A100 with vllm #13271

mjp9527 · 2025-02-14T07:58:21Z

mjp9527
Feb 14, 2025

I discovered an interesting thing, by placing the MOE experts' weights in the CPU's pinned memory during loading, Triton kernels for FusedMOE can support pinned CPU tensors. This allows the DeepSeek-R1 model to run on a GPU with only 40GB of VRAM. I successfully ran the AWQ model on an A800 GPU, although the performance was relatively poor. Model is cognitivecomputations/DeepSeek-R1-AWQ

dailingcs · 2025-02-17T08:33:31Z

dailingcs
Feb 17, 2025

Can you share your command? Thank you!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running DeepSeek-R1-AWQ on single A100 with vllm #13271

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Running DeepSeek-R1-AWQ on single A100 with vllm #13271

mjp9527 Feb 14, 2025

Replies: 1 comment

dailingcs Feb 17, 2025

mjp9527
Feb 14, 2025

dailingcs
Feb 17, 2025