[XPU][OptRed] Define triton_intel_gpu.simd_reduce
and use in optimized transposed reduction
#2907
+1,490
−161
Loading