Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FlashMLA #74

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Add FlashMLA #74

wants to merge 27 commits into from

Conversation

EricLBuehler
Copy link
Owner

Comment on lines +61 to +64
let compute_cap = compute_cap()?;
// assert compute cap is sm90
// TODO TODO TODO
// assert!(compute_cap == 90, "Compute capability must be 90 (90a)");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want an alternative approach I'm pretty sure you could use get_device_prop from cudarc.
The returned sys::cudaDeviceProp will contain major and minor fields.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would require us to "assume" ordinal 0, which is probably fine. Also, maybe there is a case for using the output of nvidia-smi because it's what we usually do 🤔? Not sure what is best though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it shouldn't matter. I was just thinking it was simpler since we already have cudarc available.
I assume the bound C functions are the same that nvidia-smi use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants