-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FlashMLA #74
base: main
Are you sure you want to change the base?
Add FlashMLA #74
Conversation
let compute_cap = compute_cap()?; | ||
// assert compute cap is sm90 | ||
// TODO TODO TODO | ||
// assert!(compute_cap == 90, "Compute capability must be 90 (90a)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want an alternative approach I'm pretty sure you could use get_device_prop from cudarc.
The returned sys::cudaDeviceProp
will contain major
and minor
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would require us to "assume" ordinal 0, which is probably fine. Also, maybe there is a case for using the output of nvidia-smi
because it's what we usually do 🤔? Not sure what is best though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it shouldn't matter. I was just thinking it was simpler since we already have cudarc available.
I assume the bound C functions are the same that nvidia-smi
use.
https://github.com/deepseek-ai/FlashMLA