Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

Open
divedb opened this issue Dec 13, 2024 · 1 comment
Open

Comments

@divedb
Copy link

divedb commented Dec 13, 2024

Description

I encountered an error while trying to compile from the source code, specifically in the file src/nn/quant/fp8/fp8_util.cu. The code attempts to use a CUDA assembly instruction (e4m3x2) that requires a minimum GPU architecture of sm_90, but my current GPU does not support this architecture.

error

The problematic code section is as follows:
__device__ __forceinline__ uint16_t half2_to_e4m3(const uint32_t a) { uint16_t val; #if __CUDA_ARCH__ >= 890 asm volatile("{ cvt.rn.satfinite.e4m3x2.f16x2 %0, %1;}\n" : "=h"(val) : "r"(a)); #else assert(false); #endif return val; }

Question:

Is there any way to modify the code to support GPUs with a lower compute capability (e.g., sm_80 or sm_70)? Specifically, is there an alternative approach for quantization that doesn't rely on e4m3x2 or a way to skip this operation for unsupported GPUs?

@spetrel
Copy link
Collaborator

spetrel commented Dec 16, 2024

what's your cuda version? e4m3x2 is guarded by conditional compiling with '#if CUDA_ARCH >= 890'.
You can try set CMAKE_CUDA_ARCHITECTURES=80 before build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants