You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an error while trying to compile from the source code, specifically in the file src/nn/quant/fp8/fp8_util.cu. The code attempts to use a CUDA assembly instruction (e4m3x2) that requires a minimum GPU architecture of sm_90, but my current GPU does not support this architecture.
The problematic code section is as follows: __device__ __forceinline__ uint16_t half2_to_e4m3(const uint32_t a) { uint16_t val; #if __CUDA_ARCH__ >= 890 asm volatile("{ cvt.rn.satfinite.e4m3x2.f16x2 %0, %1;}\n" : "=h"(val) : "r"(a)); #else assert(false); #endif return val; }
Question:
Is there any way to modify the code to support GPUs with a lower compute capability (e.g., sm_80 or sm_70)? Specifically, is there an alternative approach for quantization that doesn't rely on e4m3x2 or a way to skip this operation for unsupported GPUs?
The text was updated successfully, but these errors were encountered:
what's your cuda version? e4m3x2 is guarded by conditional compiling with '#if CUDA_ARCH >= 890'.
You can try set CMAKE_CUDA_ARCHITECTURES=80 before build
Description
I encountered an error while trying to compile from the source code, specifically in the file src/nn/quant/fp8/fp8_util.cu. The code attempts to use a CUDA assembly instruction (e4m3x2) that requires a minimum GPU architecture of sm_90, but my current GPU does not support this architecture.
The problematic code section is as follows:
__device__ __forceinline__ uint16_t half2_to_e4m3(const uint32_t a) { uint16_t val; #if __CUDA_ARCH__ >= 890 asm volatile("{ cvt.rn.satfinite.e4m3x2.f16x2 %0, %1;}\n" : "=h"(val) : "r"(a)); #else assert(false); #endif return val; }
Question:
Is there any way to modify the code to support GPUs with a lower compute capability (e.g., sm_80 or sm_70)? Specifically, is there an alternative approach for quantization that doesn't rely on e4m3x2 or a way to skip this operation for unsupported GPUs?
The text was updated successfully, but these errors were encountered: