Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow #110609

Open
yus3710-fj opened this issue Oct 1, 2024 · 1 comment
Labels

Comments

@yus3710-fj
Copy link
Contributor

Flang can't vectorize the loop in s115 of TSVC while Clang can vectorize the loop written in C.

  • Fortran
!     Fortran version
      subroutine s115 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)

      integer ntimes, ld, n, i, nl, j
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)

      call init(ld,n,a,b,c,d,e,aa,bb,cc,'s115 ')
      do 10 j = 1,n
         do 20 i = j+1, n
            a(i) = a(i) - aa(i,j) * a(j)
  20     continue
  10  continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
      end
$ flang-new -v -O3 -flang-experimental-integer-overflow s115.f -S -Rpass=vector
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s115.f
  • C
// C version
#define LEN 32000
#define LEN2 256
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2];

int s115() {
  init( "s115 ");
  for (int j = 0; j < LEN2; j++) {
    for (int i = j+1; i < LEN2; i++) {
      a[i] -= aa[j][i] * a[j];
    }
  }
  dummy(a, b, c, d, e, aa, bb, cc, 0.);
  return 0;
}
$ clang -O3 s115.c -S -Rpass=vector
s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   10 |                         for (int i = j+1; i < LEN2; i++) {
      |                         ^

If j+1 overflow, the access to a(i) and a(j) may overlap so vectorization is prevented.
IIRC, compilers don't have to consider it.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 1, 2024

@llvm/issue-subscribers-flang-ir

Author: Yusuke MINATO (yus3710-fj)

Flang can't vectorize the loop in `s115` of [TSVC](https://www.netlib.org/benchmark/vectors) while Clang can vectorize the loop written in C.
  • Fortran
!     Fortran version
      subroutine s115 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)

      integer ntimes, ld, n, i, nl, j
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)

      call init(ld,n,a,b,c,d,e,aa,bb,cc,'s115 ')
      do 10 j = 1,n
         do 20 i = j+1, n
            a(i) = a(i) - aa(i,j) * a(j)
  20     continue
  10  continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
      end
$ flang-new -v -O3 -flang-experimental-integer-overflow s115.f -S -Rpass=vector
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@<!-- -->m64
Selected multilib: .;@<!-- -->m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s115.f
  • C
// C version
#define LEN 32000
#define LEN2 256
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2];

int s115() {
  init( "s115 ");
  for (int j = 0; j &lt; LEN2; j++) {
    for (int i = j+1; i &lt; LEN2; i++) {
      a[i] -= aa[j][i] * a[j];
    }
  }
  dummy(a, b, c, d, e, aa, bb, cc, 0.);
  return 0;
}
$ clang -O3 s115.c -S -Rpass=vector
s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   10 |                         for (int i = j+1; i &lt; LEN2; i++) {
      |                         ^

If j+1 overflow, the access to a(i) and a(j) may overlap so vectorization is prevented.
IIRC, compilers don't have to consider it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants