Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][LICM] deferred-shape arrays are not vectorized in some cases #110613

Open
yus3710-fj opened this issue Oct 1, 2024 · 1 comment
Open

Comments

@yus3710-fj
Copy link
Contributor

Flang can't vectorize some loops in TSVC if arrays are ALLOCATABLE. For example, Flang can't vectorize the loop in s271 of TSVC if I rewrite explicit-shape arrays to deferred-shape arrays.

! s271_allocatable.f90
subroutine s271 (ld,n,a,b,c)
  implicit none
  integer ld, n, i
  real, allocatable :: a(:), b(:), c(:) ! added ALLOCATABLE attribute

  call init(ld,n,a,b,c,'s271 ')
  do i=1,n
    if (b(i) .gt. 0.) a(i) = a(i) + b(i) * c(i)
  end do
  call dummy(ld,n,a,b,c,1.)
end subroutine s271
$ flang-new -v -O3 -flang-experimental-integer-overflow s271_allocatable.f90 -S -Rpass=vector -mcpu=a64fx
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu a64fx -target-feature +outline-atomics -target-feature +v8.2a -target-feature +aes -target-feature +complxnum -target-feature +crc -target-feature +fp-armv8 -target-feature +fullfp16 -target-feature +lse -target-feature +neon -target-feature +perfmon -target-feature +ras -target-feature +rdm -target-feature +sha2 -target-feature +sve -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s271_allocatable.f90

The base addresses and the lower bounds of arrays aren't recognized as loop-invariant.

11:                                               ; preds = %.lr.ph, %25
  %indvars.iv = phi i64 [ 1, %.lr.ph ], [ %indvars.iv.next, %25 ] ;; i
  %12 = sub nsw i64 %indvars.iv, %.unpack322.unpack.unpack ;; i - lbound(b,1)
  %13 = getelementptr float, ptr %.unpack266.pre, i64 %12
  %14 = load float, ptr %13, align 4, !tbaa !12
  %15 = fcmp fast ogt float %14, 0.000000e+00 ;; b(i) > 0
  br i1 %15, label %16, label %25

16:                                               ; preds = %11
  %.unpack329 = load ptr, ptr %2, align 8, !tbaa !4 ;; a
  %.unpack343.unpack.unpack = load i64, ptr %.elt342, align 8, !tbaa !4 ;; lbound(a,1)
  %17 = sub nsw i64 %indvars.iv, %.unpack343.unpack.unpack ;; i - lbound(a,1)
  %18 = getelementptr float, ptr %.unpack329, i64 %17
  %19 = load float, ptr %18, align 4, !tbaa !14 ;; a(i)
  %.unpack350 = load ptr, ptr %4, align 8, !tbaa !4 ;; c
  %.unpack364.unpack.unpack = load i64, ptr %.elt363, align 8, !tbaa !4 ;; lbound(c,1)
  %20 = sub nsw i64 %indvars.iv, %.unpack364.unpack.unpack ;; i - lbound(c,1)
  %21 = getelementptr float, ptr %.unpack350, i64 %20
  %22 = load float, ptr %21, align 4, !tbaa !16 ;; c(i)
  %23 = fmul fast float %22, %14
  %24 = fadd fast float %23, %19
  store float %24, ptr %18, align 4, !tbaa !14
  br label %25

25:                                               ; preds = %16, %11
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %10
  br i1 %exitcond.not, label %._crit_edge.loopexit, label %11

If I move %.unpack329, %.unpack343.unpack.unpack, %.unpack350 and %.unpack364.unpack.unpack outside the loop manually, the loop is vectorized.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 1, 2024

@llvm/issue-subscribers-flang-ir

Author: Yusuke MINATO (yus3710-fj)

Flang can't vectorize some loops in [TSVC](https://www.netlib.org/benchmark/vectors) if arrays are `ALLOCATABLE`. For example, Flang can't vectorize the loop in `s271` of TSVC if I rewrite explicit-shape arrays to deferred-shape arrays.
! s271_allocatable.f90
subroutine s271 (ld,n,a,b,c)
  implicit none
  integer ld, n, i
  real, allocatable :: a(:), b(:), c(:) ! added ALLOCATABLE attribute

  call init(ld,n,a,b,c,'s271 ')
  do i=1,n
    if (b(i) .gt. 0.) a(i) = a(i) + b(i) * c(i)
  end do
  call dummy(ld,n,a,b,c,1.)
end subroutine s271
$ flang-new -v -O3 -flang-experimental-integer-overflow s271_allocatable.f90 -S -Rpass=vector -mcpu=a64fx
flang-new version 20.0.0git (https://github.com/llvm/llvm-project.git 2c770675ce36402b51a320ae26f369690c138dc1)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/build/bin
Build config: +assertions
Found candidate GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Selected GCC installation: /usr/lib/gcc/aarch64-redhat-linux/11
Candidate multilib: .;@<!-- -->m64
Selected multilib: .;@<!-- -->m64
 "/path/to/build/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu a64fx -target-feature +outline-atomics -target-feature +v8.2a -target-feature +aes -target-feature +complxnum -target-feature +crc -target-feature +fp-armv8 -target-feature +fullfp16 -target-feature +lse -target-feature +neon -target-feature +perfmon -target-feature +ras -target-feature +rdm -target-feature +sha2 -target-feature +sve -fversion-loops-for-stride -flang-experimental-integer-overflow -Rpass=vector -resource-dir /path/to/build/lib/clang/20 -mframe-pointer=non-leaf -O3 -o /dev/null -x f95-cpp-input s271_allocatable.f90

The base addresses and the lower bounds of arrays aren't recognized as loop-invariant.

11:                                               ; preds = %.lr.ph, %25
  %indvars.iv = phi i64 [ 1, %.lr.ph ], [ %indvars.iv.next, %25 ] ;; i
  %12 = sub nsw i64 %indvars.iv, %.unpack322.unpack.unpack ;; i - lbound(b,1)
  %13 = getelementptr float, ptr %.unpack266.pre, i64 %12
  %14 = load float, ptr %13, align 4, !tbaa !12
  %15 = fcmp fast ogt float %14, 0.000000e+00 ;; b(i) &gt; 0
  br i1 %15, label %16, label %25

16:                                               ; preds = %11
  %.unpack329 = load ptr, ptr %2, align 8, !tbaa !4 ;; a
  %.unpack343.unpack.unpack = load i64, ptr %.elt342, align 8, !tbaa !4 ;; lbound(a,1)
  %17 = sub nsw i64 %indvars.iv, %.unpack343.unpack.unpack ;; i - lbound(a,1)
  %18 = getelementptr float, ptr %.unpack329, i64 %17
  %19 = load float, ptr %18, align 4, !tbaa !14 ;; a(i)
  %.unpack350 = load ptr, ptr %4, align 8, !tbaa !4 ;; c
  %.unpack364.unpack.unpack = load i64, ptr %.elt363, align 8, !tbaa !4 ;; lbound(c,1)
  %20 = sub nsw i64 %indvars.iv, %.unpack364.unpack.unpack ;; i - lbound(c,1)
  %21 = getelementptr float, ptr %.unpack350, i64 %20
  %22 = load float, ptr %21, align 4, !tbaa !16 ;; c(i)
  %23 = fmul fast float %22, %14
  %24 = fadd fast float %23, %19
  store float %24, ptr %18, align 4, !tbaa !14
  br label %25

25:                                               ; preds = %16, %11
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, %10
  br i1 %exitcond.not, label %._crit_edge.loopexit, label %11

If I move %.unpack329, %.unpack343.unpack.unpack, %.unpack350 and %.unpack364.unpack.unpack outside the loop manually, the loop is vectorized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants