Replies: 1 comment 4 replies
-
the comment is actually not accurate. it helps C++ to initialize the data properly. Not always forcing it to be 0. It is just a trick to fool the compiler. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@kerrmudgeon the comment says that the redundant mov PTX instruction is used to enforce the compiler to initialize data to zero before ld.global. but the code don't init the data to zero. here is the code below.
template
struct global_load<AccessType,
32
> {
CUTLASS_DEVICE
global_load(AccessType &D, void const *ptr, bool pred_guard) {
uint4 *data = reinterpret_cast<uint4 *>(&D);
#if CUTLASS_ENABLE_L2_PREFETCH
" @p ld.global.L2::128B.v4.u32 {%0, %1, %2, %3}, [%8];\n"
" @p ld.global.L2::128B.v4.u32 {%4, %5, %6, %7}, [%18];\n"
#else
" @p ld.global.v4.u32 {%0, %1, %2, %3}, [%8];\n"
" @p ld.global.v4.u32 {%4, %5, %6, %7}, [%18];\n"
#endif
"}\n"
: "=r"(data[0].x), "=r"(data[0].y), "=r"(data[0].z), "=r"(data[0].w),
"=r"(data[1].x), "=r"(data[1].y), "=r"(data[1].z), "=r"(data[1].w)
: "l"(ptr), "r"((int)pred_guard), "r"(data[0].x), "r"(data[0].y),
"r"(data[0].z), "r"(data[0].w), "r"(data[1].x), "r"(data[1].y),
"r"(data[1].z), "r"(data[1].w), "l"(((uint8_t *)ptr) + 16));
}
};
could you explain it in detail? Thanks a lot.
Beta Was this translation helpful? Give feedback.
All reactions