Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

divedb · 2024-12-13T13:17:37Z

Description

I encountered an error while trying to compile from the source code, specifically in the file src/nn/quant/fp8/fp8_util.cu. The code attempts to use a CUDA assembly instruction (e4m3x2) that requires a minimum GPU architecture of sm_90, but my current GPU does not support this architecture.

The problematic code section is as follows:
__device__ __forceinline__ uint16_t half2_to_e4m3(const uint32_t a) { uint16_t val; #if __CUDA_ARCH__ >= 890 asm volatile("{ cvt.rn.satfinite.e4m3x2.f16x2 %0, %1;}\n" : "=h"(val) : "r"(a)); #else assert(false); #endif return val; }

Question:

Is there any way to modify the code to support GPUs with a lower compute capability (e.g., sm_80 or sm_70)? Specifically, is there an alternative approach for quantization that doesn't rely on e4m3x2 or a way to skip this operation for unsupported GPUs?

The text was updated successfully, but these errors were encountered:

spetrel · 2024-12-16T03:01:10Z

what's your cuda version? e4m3x2 is guarded by conditional compiling with '#if CUDA_ARCH >= 890'.
You can try set CMAKE_CUDA_ARCHITECTURES=80 before build

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

divedb commented Dec 13, 2024

spetrel commented Dec 16, 2024 •

edited

Loading

Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

Comments

divedb commented Dec 13, 2024

Description

Question:

spetrel commented Dec 16, 2024 • edited Loading

spetrel commented Dec 16, 2024 •

edited

Loading