[Flang][LICM] deferred-shape arrays are not vectorized in some cases #110613

yus3710-fj · 2024-10-01T01:17:56Z

Flang can't vectorize some loops in TSVC if arrays are ALLOCATABLE. For example, Flang can't vectorize the loop in s271 of TSVC if I rewrite explicit-shape arrays to deferred-shape arrays.

! s271_allocatable.f90
subroutine s271 (ld,n,a,b,c)
  implicit none
  integer ld, n, i
  real, allocatable :: a(:), b(:), c(:) ! added ALLOCATABLE attribute

  call init(ld,n,a,b,c,'s271 ')
  do i=1,n
    if (b(i) .gt. 0.) a(i) = a(i) + b(i) * c(i)
  end do
  call dummy(ld,n,a,b,c,1.)
end subroutine s271

$ flang-new -v -O3 -flang-experimental-integer-overflow s271_allocatable.f90 -S -Rpass=vector -mcpu=a64fx
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu a64fx -target-feature +outline-atomics -target-feature +v8.2a -target-feature +aes -target-feature +complxnum -target-feature +crc -target-feature +fp-armv8 -target-feature +fullfp16 -target-feature +lse -target-feature +neon -target-feature +perfmon -target-feature +ras -target-feature +rdm -target-feature +sha2 -target-feature +sve -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s271_allocatable.f90

The base addresses and the lower bounds of arrays aren't recognized as loop-invariant.

11:                                               ; preds = %.lr.ph, %25
  %indvars.iv = phi i64 [ 1, %.lr.ph ], [ %indvars.iv.next, %25 ] ;; i
  %12 = sub nsw i64 %indvars.iv, %.unpack322.unpack.unpack ;; i - lbound(b,1)
  %13 = getelementptr float, ptr %.unpack266.pre, i64 %12
  %14 = load float, ptr %13, align 4, !tbaa !12
  %15 = fcmp fast ogt float %14, 0.000000e+00 ;; b(i) > 0
  br i1 %15, label %16, label %25

16:                                               ; preds = %11
  %.unpack329 = load ptr, ptr %2, align 8, !tbaa !4 ;; a
  %.unpack343.unpack.unpack = load i64, ptr %.elt342, align 8, !tbaa !4 ;; lbound(a,1)
  %17 = sub nsw i64 %indvars.iv, %.unpack343.unpack.unpack ;; i - lbound(a,1)
  %18 = getelementptr float, ptr %.unpack329, i64 %17
  %19 = load float, ptr %18, align 4, !tbaa !14 ;; a(i)
  %.unpack350 = load ptr, ptr %4, align 8, !tbaa !4 ;; c
  %.unpack364.unpack.unpack = load i64, ptr %.elt363, align 8, !tbaa !4 ;; lbound(c,1)
  %20 = sub nsw i64 %indvars.iv, %.unpack364.unpack.unpack ;; i - lbound(c,1)
  %21 = getelementptr float, ptr %.unpack350, i64 %20
  %22 = load float, ptr %21, align 4, !tbaa !16 ;; c(i)
  %23 = fmul fast float %22, %14
  %24 = fadd fast float %23, %19
  store float %24, ptr %18, align 4, !tbaa !14
  br label %25

25:                                               ; preds = %16, %11
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %10
  br i1 %exitcond.not, label %._crit_edge.loopexit, label %11

If I move %.unpack329, %.unpack343.unpack.unpack, %.unpack350 and %.unpack364.unpack.unpack outside the loop manually, the loop is vectorized.

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-10-01T01:18:16Z

@llvm/issue-subscribers-flang-ir

Author: Yusuke MINATO (yus3710-fj)

Flang can't vectorize some loops in [TSVC](https://www.netlib.org/benchmark/vectors) if arrays are `ALLOCATABLE`. For example, Flang can't vectorize the loop in `s271` of TSVC if I rewrite explicit-shape arrays to deferred-shape arrays.

! s271_allocatable.f90
subroutine s271 (ld,n,a,b,c)
  implicit none
  integer ld, n, i
  real, allocatable :: a(:), b(:), c(:) ! added ALLOCATABLE attribute

  call init(ld,n,a,b,c,'s271 ')
  do i=1,n
    if (b(i) .gt. 0.) a(i) = a(i) + b(i) * c(i)
  end do
  call dummy(ld,n,a,b,c,1.)
end subroutine s271

$ flang-new -v -O3 -flang-experimental-integer-overflow s271_allocatable.f90 -S -Rpass=vector -mcpu=a64fx
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@<!-- -->m64
Selected multilib: .;@<!-- -->m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu a64fx -target-feature +outline-atomics -target-feature +v8.2a -target-feature +aes -target-feature +complxnum -target-feature +crc -target-feature +fp-armv8 -target-feature +fullfp16 -target-feature +lse -target-feature +neon -target-feature +perfmon -target-feature +ras -target-feature +rdm -target-feature +sha2 -target-feature +sve -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s271_allocatable.f90

The base addresses and the lower bounds of arrays aren't recognized as loop-invariant.

11:                                               ; preds = %.lr.ph, %25
  %indvars.iv = phi i64 [ 1, %.lr.ph ], [ %indvars.iv.next, %25 ] ;; i
  %12 = sub nsw i64 %indvars.iv, %.unpack322.unpack.unpack ;; i - lbound(b,1)
  %13 = getelementptr float, ptr %.unpack266.pre, i64 %12
  %14 = load float, ptr %13, align 4, !tbaa !12
  %15 = fcmp fast ogt float %14, 0.000000e+00 ;; b(i) &gt; 0
  br i1 %15, label %16, label %25

16:                                               ; preds = %11
  %.unpack329 = load ptr, ptr %2, align 8, !tbaa !4 ;; a
  %.unpack343.unpack.unpack = load i64, ptr %.elt342, align 8, !tbaa !4 ;; lbound(a,1)
  %17 = sub nsw i64 %indvars.iv, %.unpack343.unpack.unpack ;; i - lbound(a,1)
  %18 = getelementptr float, ptr %.unpack329, i64 %17
  %19 = load float, ptr %18, align 4, !tbaa !14 ;; a(i)
  %.unpack350 = load ptr, ptr %4, align 8, !tbaa !4 ;; c
  %.unpack364.unpack.unpack = load i64, ptr %.elt363, align 8, !tbaa !4 ;; lbound(c,1)
  %20 = sub nsw i64 %indvars.iv, %.unpack364.unpack.unpack ;; i - lbound(c,1)
  %21 = getelementptr float, ptr %.unpack350, i64 %20
  %22 = load float, ptr %21, align 4, !tbaa !16 ;; c(i)
  %23 = fmul fast float %22, %14
  %24 = fadd fast float %23, %19
  store float %24, ptr %18, align 4, !tbaa !14
  br label %25

25:                                               ; preds = %16, %11
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %10
  br i1 %exitcond.not, label %._crit_edge.loopexit, label %11

If I move %.unpack329, %.unpack343.unpack.unpack, %.unpack350 and %.unpack364.unpack.unpack outside the loop manually, the loop is vectorized.

yus3710-fj added loopoptim flang:ir labels Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flang][LICM] deferred-shape arrays are not vectorized in some cases #110613

[Flang][LICM] deferred-shape arrays are not vectorized in some cases #110613

yus3710-fj commented Oct 1, 2024

llvmbot commented Oct 1, 2024

[Flang][LICM] deferred-shape arrays are not vectorized in some cases #110613

[Flang][LICM] deferred-shape arrays are not vectorized in some cases #110613

Comments

yus3710-fj commented Oct 1, 2024

llvmbot commented Oct 1, 2024