You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flang can't vectorize the loop in s233 of TSVC while Clang can vectorize the loop written in C.
(Clang doesn't actually vectorize the loop because the vectorization of strided accesses is less beneficial.)
Fortran
! Fortran version
subroutines233 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)
integer ntimes, ld, n, i, nl, j
real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)
call init(ld,n,a,b,c,d,e,aa,bb,cc,'s233 ')
do10 i =2,n
do20 j =2,n
aa(i,j) = aa(i,j-1) + cc(i,j)
20continuedo30 j =2,n
bb(i,j) = bb(i-1,j) + cc(i,j)
30continue10continuecall dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
end
$ flang-new -v -O3 -flang-experimental-integer-overflow s233.f -S -Rpass=vector -Rpass-analysis=vector -Rpass-missed=vectorflang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)Target: aarch64-unknown-linux-gnuThread model: posixInstalledDir: /path/to/build/binBuild config: +assertionsFound candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11Candidate multilib: .;@m64Selected multilib: .;@m64 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -Rpass-analysis=vector -Rpass-missed=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s233.fpath/to/s233.f:13:13: remark: loop not vectorized: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loopUnsafe indirect dependence. Memory location is the same as accessed at s233.f:13:13 [-Rpass-analysis=loop-vectorize]path/to/s233.f:12:10: remark: loop not vectorized [-Rpass-missed=loop-vectorize]path/to/s233.f:10:13: remark: loop not vectorized: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loopUnsafe indirect dependence. Memory location is the same as accessed at s233.f:10:13 [-Rpass-analysis=loop-vectorize]path/to/s233.f:9:10: remark: loop not vectorized [-Rpass-missed=loop-vectorize]
$ clang -O3 s233.c -S -Rpass=vector -Rpass-analysis=vector -Rpass-missed=vectors233.c:15:4: remark: the cost-model indicates that vectorization is not beneficial [-Rpass-analysis=loop-vectorize] 15 | for (int j = 1; j < LEN2; j++) { | ^s233.c:15:4: remark: interleaved loop (interleaved count: 2) [-Rpass=loop-vectorize]s233.c:13:16: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop [-Rpass-analysis=loop-vectorize] 13 | aa[j][i] = aa[j-1][i] + cc[j][i]; | ^s233.c:12:4: remark: loop not vectorized [-Rpass-missed=loop-vectorize] 12 | for (int j = 1; j < LEN2; j++) { | ^s233.c:16:16: remark: Cannot SLP vectorize list: vectorization was impossible with available vectorization factors [-Rpass-missed=slp-vectorizer] 16 | bb[j][i] = bb[j][i-1] + cc[j][i]; | ^
One of the causes seems same as #110611. In this case, however, some loop optimizations such as loop interchange can help vectorization. Address calculations are linearlized in LLVM IR, so loop optimizations in MLIR or using the polyhedral model might be necessary.
Flang can't vectorize the loop in
s233
of TSVC while Clang can vectorize the loop written in C.(Clang doesn't actually vectorize the loop because the vectorization of strided accesses is less beneficial.)
One of the causes seems same as #110611. In this case, however, some loop optimizations such as loop interchange can help vectorization. Address calculations are linearlized in LLVM IR, so loop optimizations in MLIR or using the polyhedral model might be necessary.
The text was updated successfully, but these errors were encountered: