[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow #110609

yus3710-fj · 2024-10-01T01:07:00Z

Flang can't vectorize the loop in s115 of TSVC while Clang can vectorize the loop written in C.

Fortran

!     Fortran version
      subroutine s115 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)

      integer ntimes, ld, n, i, nl, j
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)

      call init(ld,n,a,b,c,d,e,aa,bb,cc,'s115 ')
      do 10 j = 1,n
         do 20 i = j+1, n
            a(i) = a(i) - aa(i,j) * a(j)
  20     continue
  10  continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
      end

$ flang-new -v -O3 -flang-experimental-integer-overflow s115.f -S -Rpass=vector
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s115.f

C

// C version
#define LEN 32000
#define LEN2 256
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2];

int s115() {
  init( "s115 ");
  for (int j = 0; j < LEN2; j++) {
    for (int i = j+1; i < LEN2; i++) {
      a[i] -= aa[j][i] * a[j];
    }
  }
  dummy(a, b, c, d, e, aa, bb, cc, 0.);
  return 0;
}

$ clang -O3 s115.c -S -Rpass=vector
s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   10 |                         for (int i = j+1; i < LEN2; i++) {
      |                         ^

If j+1 overflow, the access to a(i) and a(j) may overlap so vectorization is prevented.
IIRC, compilers don't have to consider it.

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-10-01T01:07:20Z

@llvm/issue-subscribers-flang-ir

Author: Yusuke MINATO (yus3710-fj)

Flang can't vectorize the loop in `s115` of [TSVC](https://www.netlib.org/benchmark/vectors) while Clang can vectorize the loop written in C.

Fortran

!     Fortran version
      subroutine s115 (ntimes,ld,n,ctime,dtime,a,b,c,d,e,aa,bb,cc)

      integer ntimes, ld, n, i, nl, j
      real a(n), b(n), c(n), d(n), e(n), aa(ld,n), bb(ld,n), cc(ld,n)

      call init(ld,n,a,b,c,d,e,aa,bb,cc,'s115 ')
      do 10 j = 1,n
         do 20 i = j+1, n
            a(i) = a(i) - aa(i,j) * a(j)
  20     continue
  10  continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,1.)
      end

$ flang-new -v -O3 -flang-experimental-integer-overflow s115.f -S -Rpass=vector
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@<!-- -->m64
Selected multilib: .;@<!-- -->m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu generic -target-feature +outline-atomics -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s115.f

C

// C version
#define LEN 32000
#define LEN2 256
float a[LEN], b[LEN], c[LEN], d[LEN], e[LEN];
float aa[LEN2][LEN2], bb[LEN2][LEN2], cc[LEN2][LEN2];

int s115() {
  init( "s115 ");
  for (int j = 0; j &lt; LEN2; j++) {
    for (int i = j+1; i &lt; LEN2; i++) {
      a[i] -= aa[j][i] * a[j];
    }
  }
  dummy(a, b, c, d, e, aa, bb, cc, 0.);
  return 0;
}

$ clang -O3 s115.c -S -Rpass=vector
s115.c:10:4: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   10 |                         for (int i = j+1; i &lt; LEN2; i++) {
      |                         ^

If j+1 overflow, the access to a(i) and a(j) may overlap so vectorization is prevented.
IIRC, compilers don't have to consider it.

yus3710-fj added the flang:ir label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow #110609

[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow #110609

yus3710-fj commented Oct 1, 2024

llvmbot commented Oct 1, 2024

[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow #110609

[Flang] TSVC s115: compiler doesn't vectorize the loop considering an initial value of do-variable might overflow #110609

Comments

yus3710-fj commented Oct 1, 2024

llvmbot commented Oct 1, 2024