Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][LAA] TSVC s2101, s233: not vectorized because the extents of arrays are not constant #110611

Open
yus3710-fj opened this issue Oct 1, 2024 · 0 comments
Labels
flang Flang issues not falling into any other category loopoptim vectorization

Comments

@yus3710-fj
Copy link
Contributor

Flang can't vectorize the loops in s2101 and s233 of TSVC while Clang can vectorize the loops written in C.
(Clang doesn't actually vectorize the loops because the vectorization of strided accesses is less beneficial.)

  • Fortran
! Fortran version
      subroutine s2101(ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)

      integer ntimes, ld, n, i, nl
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)

      call init(ld,n,a,b,c,d,e,aa,bb,cc,'s2101')
      do 10 i = 1,n
         aa(i,i) = aa(i,i) + bb(i,i) * cc(i,i)
   10 continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
      end
$ flang-new -v -O3 -flang-experimental-integer-overflow s2101.f -S -Rpass=vector -Rpass-analysis=vector -Rpass-missed=vector
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -Rpass-analysis=vector -Rpass-missed=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s2101.f
path/to/s2101.f:9:10: remark: loop not vectorized: unsafe dependent memory operations in loop. Use #pragma clang loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
Unsafe indirect dependence. Memory location is the same as accessed at s2101.f:9:10 [-Rpass-analysis=loop-vectorize]
path/to/s2101.f:8:7: remark: loop not vectorized [-Rpass-missed=loop-vectorize]
  • C
// C version
#define LEN 32000
#define LEN2 256
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2];

int s2101() {
  init( "s2101");
  for (int i = 0; i < LEN2; i++) {
    aa[i][i] += bb[i][i] * cc[i][i];
  }
  dummy(a, b, c, d, e, aa, bb, cc, 0.);
  return 0;
}
$ clang -O3 s2101.c -S -Rpass=vector -Rpass-analysis=vector -Rpass-missed=vector
s2101.c:9:3: remark: the cost-model indicates that vectorization is not beneficial [-Rpass-analysis=loop-vectorize]
    9 |                 for (int i = 0; i < LEN2; i++) {
      |                 ^
s2101.c:9:3: remark: interleaved loop (interleaved count: 2) [-Rpass=loop-vectorize]

In Fortran, extents of arrays are sometimes not constant in compilation time. On the other hand, LAA requires that the pointer stride is constant.
I suspect the constraint is too restrictive. IIUC, it is sufficient for vectorization that the pointer stride is loop-invariant and never gets zero. SCEV can tell that but LAA doesn't check it at the moment.
(This might be resolved by MLIR or the polyhedral model.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang Flang issues not falling into any other category loopoptim vectorization
Projects
None yet
Development

No branches or pull requests

1 participant