Replies: 1 comment
-
Can you share your command? Thank you! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I discovered an interesting thing, by placing the MOE experts' weights in the CPU's pinned memory during loading, Triton kernels for FusedMOE can support pinned CPU tensors. This allows the DeepSeek-R1 model to run on a GPU with only 40GB of VRAM. I successfully ran the AWQ model on an A800 GPU, although the performance was relatively poor. Model is cognitivecomputations/DeepSeek-R1-AWQ
Beta Was this translation helpful? Give feedback.
All reactions