Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Implement NEON vscale intrinsics #100347

Merged
merged 3 commits into from
Sep 26, 2024
Merged

Conversation

Lukacma
Copy link
Contributor

@Lukacma Lukacma commented Jul 24, 2024

This patch implements following intrinsics:

float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi [email protected]

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen llvm:ir labels Jul 24, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jul 24, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-llvm-ir

Author: None (Lukacma)

Changes

This patch implements following intrinsics:

float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>


Full diff: https://github.com/llvm/llvm-project/pull/100347.diff

7 Files Affected:

  • (modified) clang/include/clang/Basic/arm_neon.td (+6)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+8)
  • (added) clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c (+58)
  • (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+7)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+21)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+1-1)
  • (added) llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll (+54)
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3098fa67e6a51..f930c62a79280 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -2096,3 +2096,9 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r
   def VLDAP1_LANE : WInst<"vldap1_lane", ".(c*!).I", "QUlQlUlldQdPlQPl">;
   def VSTL1_LANE  : WInst<"vstl1_lane", "v*(.!)I", "QUlQlUlldQdPlQPl">;
 }
+
+let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8" in {
+  // fscale
+  def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
+  def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
\ No newline at end of file
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5639239359ab8..816899e5c11e3 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -13491,6 +13491,14 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Int = Intrinsic::aarch64_neon_suqadd;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vuqadd");
   }
+  case NEON::BI__builtin_neon_vscale_f16:
+  case NEON::BI__builtin_neon_vscaleq_f16:
+  case NEON::BI__builtin_neon_vscale_f32:
+  case NEON::BI__builtin_neon_vscaleq_f32:
+  case NEON::BI__builtin_neon_vscaleq_f64: {
+    Int = Intrinsic::aarch64_neon_fp8_fscale;
+    return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "fscale");
+  }
   }
 }
 
diff --git a/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c b/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
new file mode 100644
index 0000000000000..b50d30876a7c5
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
+#include <arm_neon.h>
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -S -O3 -o /dev/null %s
+
+// CHECK-LABEL: define dso_local <4 x half> @test_vscale_f16(
+// CHECK-SAME: <4 x half> noundef [[VN:%.*]], <4 x i16> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half> [[VN]], <4 x i16> [[VM]])
+// CHECK-NEXT:    ret <4 x half> [[FSCALE2_I]]
+//
+float16x4_t test_vscale_f16(float16x4_t vn, int16x4_t vm) {
+  return vscale_f16(vn, vm);
+}
+
+// CHECK-LABEL: define dso_local <8 x half> @test_vscaleq_f16(
+// CHECK-SAME: <8 x half> noundef [[VN:%.*]], <8 x i16> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half> [[VN]], <8 x i16> [[VM]])
+// CHECK-NEXT:    ret <8 x half> [[FSCALE2_I]]
+//
+float16x8_t test_vscaleq_f16(float16x8_t vn, int16x8_t vm) {
+  return vscaleq_f16(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <2 x float> @test_vscale_f32(
+// CHECK-SAME: <2 x float> noundef [[VN:%.*]], <2 x i32> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float> [[VN]], <2 x i32> [[VM]])
+// CHECK-NEXT:    ret <2 x float> [[FSCALE2_I]]
+//
+float32x2_t test_vscale_f32(float32x2_t vn, int32x2_t vm) {
+  return vscale_f32(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <4 x float> @test_vscaleq_f32(
+// CHECK-SAME: <4 x float> noundef [[VN:%.*]], <4 x i32> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float> [[VN]], <4 x i32> [[VM]])
+// CHECK-NEXT:    ret <4 x float> [[FSCALE2_I]]
+//
+float32x4_t test_vscaleq_f32(float32x4_t vn, int32x4_t vm) {
+  return vscaleq_f32(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <2 x double> @test_vscale_f64(
+// CHECK-SAME: <2 x double> noundef [[VN:%.*]], <2 x i64> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double> [[VN]], <2 x i64> [[VM]])
+// CHECK-NEXT:    ret <2 x double> [[FSCALE2_I]]
+//
+float64x2_t test_vscale_f64(float64x2_t vn, int64x2_t vm) {
+  return vscaleq_f64(vn, vm);
+}
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index 3735bf5222fce..1f1691a6235b8 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -563,6 +563,13 @@ let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {
   def int_aarch64_neon_vcmla_rot90  : AdvSIMD_3VectorArg_Intrinsic;
   def int_aarch64_neon_vcmla_rot180 : AdvSIMD_3VectorArg_Intrinsic;
   def int_aarch64_neon_vcmla_rot270 : AdvSIMD_3VectorArg_Intrinsic;
+  
+  // FP8 fscale
+  def int_aarch64_neon_fp8_fscale : DefaultAttrsIntrinsic<
+                                    [llvm_anyvector_ty],
+                                    [LLVMMatchType<0>,
+                                    LLVMVectorOfBitcastsToInt<0>],
+                                    [IntrNoMem]>;
 }
 
 let TargetPrefix = "aarch64" in {  // All intrinsics start with "llvm.aarch64.".
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index e1ecc5a57dd26..46902fd9f8b0b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -5985,6 +5985,27 @@ multiclass SIMDThreeSameVectorFP<bit U, bit S, bits<3> opc,
         [(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2f64 V128:$Rm)))]>;
 }
 
+// As above, but only floating point elements supported.
+let mayRaiseFPException = 1, Uses = [FPCR] in
+multiclass SIMDThreeVectorFP<bit U, bit S, bits<3> opc,
+                                 string asm, SDPatternOperator OpNode> {
+  def v4f16 : BaseSIMDThreeSameVector<0, U, {S,0b10}, {0b00,opc}, V64,
+                                      asm, ".4h",
+        [(set (v4f16 V64:$Rd), (OpNode (v4f16 V64:$Rn), (v4i16 V64:$Rm)))]>;
+  def v8f16 : BaseSIMDThreeSameVector<1, U, {S,0b10}, {0b00,opc}, V128,
+                                      asm, ".8h",
+        [(set (v8f16 V128:$Rd), (OpNode (v8f16 V128:$Rn), (v8i16 V128:$Rm)))]>;
+  def v2f32 : BaseSIMDThreeSameVector<0, U, {S,0b01}, {0b11,opc}, V64,
+                                      asm, ".2s",
+        [(set (v2f32 V64:$Rd), (OpNode (v2f32 V64:$Rn), (v2i32 V64:$Rm)))]>;
+  def v4f32 : BaseSIMDThreeSameVector<1, U, {S,0b01}, {0b11,opc}, V128,
+                                      asm, ".4s",
+        [(set (v4f32 V128:$Rd), (OpNode (v4f32 V128:$Rn), (v4i32 V128:$Rm)))]>;
+  def v2f64 : BaseSIMDThreeSameVector<1, U, {S,0b11}, {0b11,opc}, V128,
+                                      asm, ".2d",
+        [(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2i64 V128:$Rm)))]>;
+}
+
 let mayRaiseFPException = 1, Uses = [FPCR] in
 multiclass SIMDThreeSameVectorFPCmp<bit U, bit S, bits<3> opc,
                                     string asm,
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 1053ba9242768..1fa21278657ae 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -10128,7 +10128,7 @@ let Predicates = [HasFP8] in {
   defm BF2CVTL : SIMDMixedTwoVectorFP8<0b11, "bf2cvtl">;
   defm FCVTN_F16_F8 : SIMDThreeSameSizeVectorCvt<"fcvtn">;
   defm FCVTN_F32_F8 : SIMDThreeVectorCvt<"fcvtn">;
-  defm FSCALE : SIMDThreeSameVectorFP<0b1, 0b1, 0b111, "fscale", null_frag>;
+  defm FSCALE : SIMDThreeVectorFP<0b1, 0b1, 0b111, "fscale", int_aarch64_neon_fp8_fscale>;
 } // End let Predicates = [HasFP8]
 
 let Predicates = [HasFAMINMAX] in {
diff --git a/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll b/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll
new file mode 100644
index 0000000000000..da0e365db2d31
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64-linux -mattr=+neon,+fp8 < %s | FileCheck %s
+
+
+define <4 x half> @test_fscale_f16(<4 x half> %vn, <4 x i16> %vm) {
+; CHECK-LABEL: test_fscale_f16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.4h, v0.4h, v1.4h
+; CHECK-NEXT:    ret
+  %res = tail call <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half> %vn, <4 x i16> %vm)
+  ret <4 x half> %res
+}
+
+define <8 x half> @test_fscaleq_f16(<8 x half> %vn, <8 x i16> %vm) {
+; CHECK-LABEL: test_fscaleq_f16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.8h, v0.8h, v1.8h
+; CHECK-NEXT:    ret
+  %res = tail call <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half> %vn, <8 x i16> %vm)
+  ret <8 x half> %res
+}
+
+define <2 x float> @test_fscale_f32(<2 x float> %vn, <2 x i32> %vm) {
+; CHECK-LABEL: test_fscale_f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:    ret
+  %res = tail call <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float> %vn, <2 x i32> %vm)
+  ret <2 x float> %res
+}
+
+define <4 x float> @test_fscaleq_f32(<4 x float> %vn, <4 x i32> %vm) {
+; CHECK-LABEL: test_fscaleq_f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.4s, v0.4s, v1.4s
+; CHECK-NEXT:    ret
+  %res = tail call <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float> %vn, <4 x i32> %vm)
+  ret <4 x float> %res
+}
+
+define <2 x double> @test_fscaleq_f64(<2 x double> %vn, <2 x i64> %vm) {
+; CHECK-LABEL: test_fscaleq_f64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.2d, v0.2d, v1.2d
+; CHECK-NEXT:    ret
+  %res = tail call <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double> %vn, <2 x i64> %vm)
+  ret <2 x double> %res
+}
+
+declare <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half>, <4 x i16>)
+declare <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half>, <8 x i16>)
+declare <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float>, <2 x i32>)
+declare <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float>, <4 x i32>)
+declare <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double>, <2 x i64>)

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 24, 2024

@llvm/pr-subscribers-backend-aarch64

Author: None (Lukacma)

Changes

This patch implements following intrinsics:

float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>


Full diff: https://github.com/llvm/llvm-project/pull/100347.diff

7 Files Affected:

  • (modified) clang/include/clang/Basic/arm_neon.td (+6)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+8)
  • (added) clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c (+58)
  • (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+7)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+21)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+1-1)
  • (added) llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll (+54)
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3098fa67e6a51..f930c62a79280 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -2096,3 +2096,9 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r
   def VLDAP1_LANE : WInst<"vldap1_lane", ".(c*!).I", "QUlQlUlldQdPlQPl">;
   def VSTL1_LANE  : WInst<"vstl1_lane", "v*(.!)I", "QUlQlUlldQdPlQPl">;
 }
+
+let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8" in {
+  // fscale
+  def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
+  def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
\ No newline at end of file
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5639239359ab8..816899e5c11e3 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -13491,6 +13491,14 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Int = Intrinsic::aarch64_neon_suqadd;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vuqadd");
   }
+  case NEON::BI__builtin_neon_vscale_f16:
+  case NEON::BI__builtin_neon_vscaleq_f16:
+  case NEON::BI__builtin_neon_vscale_f32:
+  case NEON::BI__builtin_neon_vscaleq_f32:
+  case NEON::BI__builtin_neon_vscaleq_f64: {
+    Int = Intrinsic::aarch64_neon_fp8_fscale;
+    return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "fscale");
+  }
   }
 }
 
diff --git a/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c b/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
new file mode 100644
index 0000000000000..b50d30876a7c5
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
+#include <arm_neon.h>
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -S -O3 -o /dev/null %s
+
+// CHECK-LABEL: define dso_local <4 x half> @test_vscale_f16(
+// CHECK-SAME: <4 x half> noundef [[VN:%.*]], <4 x i16> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half> [[VN]], <4 x i16> [[VM]])
+// CHECK-NEXT:    ret <4 x half> [[FSCALE2_I]]
+//
+float16x4_t test_vscale_f16(float16x4_t vn, int16x4_t vm) {
+  return vscale_f16(vn, vm);
+}
+
+// CHECK-LABEL: define dso_local <8 x half> @test_vscaleq_f16(
+// CHECK-SAME: <8 x half> noundef [[VN:%.*]], <8 x i16> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half> [[VN]], <8 x i16> [[VM]])
+// CHECK-NEXT:    ret <8 x half> [[FSCALE2_I]]
+//
+float16x8_t test_vscaleq_f16(float16x8_t vn, int16x8_t vm) {
+  return vscaleq_f16(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <2 x float> @test_vscale_f32(
+// CHECK-SAME: <2 x float> noundef [[VN:%.*]], <2 x i32> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float> [[VN]], <2 x i32> [[VM]])
+// CHECK-NEXT:    ret <2 x float> [[FSCALE2_I]]
+//
+float32x2_t test_vscale_f32(float32x2_t vn, int32x2_t vm) {
+  return vscale_f32(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <4 x float> @test_vscaleq_f32(
+// CHECK-SAME: <4 x float> noundef [[VN:%.*]], <4 x i32> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float> [[VN]], <4 x i32> [[VM]])
+// CHECK-NEXT:    ret <4 x float> [[FSCALE2_I]]
+//
+float32x4_t test_vscaleq_f32(float32x4_t vn, int32x4_t vm) {
+  return vscaleq_f32(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <2 x double> @test_vscale_f64(
+// CHECK-SAME: <2 x double> noundef [[VN:%.*]], <2 x i64> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double> [[VN]], <2 x i64> [[VM]])
+// CHECK-NEXT:    ret <2 x double> [[FSCALE2_I]]
+//
+float64x2_t test_vscale_f64(float64x2_t vn, int64x2_t vm) {
+  return vscaleq_f64(vn, vm);
+}
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index 3735bf5222fce..1f1691a6235b8 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -563,6 +563,13 @@ let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {
   def int_aarch64_neon_vcmla_rot90  : AdvSIMD_3VectorArg_Intrinsic;
   def int_aarch64_neon_vcmla_rot180 : AdvSIMD_3VectorArg_Intrinsic;
   def int_aarch64_neon_vcmla_rot270 : AdvSIMD_3VectorArg_Intrinsic;
+  
+  // FP8 fscale
+  def int_aarch64_neon_fp8_fscale : DefaultAttrsIntrinsic<
+                                    [llvm_anyvector_ty],
+                                    [LLVMMatchType<0>,
+                                    LLVMVectorOfBitcastsToInt<0>],
+                                    [IntrNoMem]>;
 }
 
 let TargetPrefix = "aarch64" in {  // All intrinsics start with "llvm.aarch64.".
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index e1ecc5a57dd26..46902fd9f8b0b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -5985,6 +5985,27 @@ multiclass SIMDThreeSameVectorFP<bit U, bit S, bits<3> opc,
         [(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2f64 V128:$Rm)))]>;
 }
 
+// As above, but only floating point elements supported.
+let mayRaiseFPException = 1, Uses = [FPCR] in
+multiclass SIMDThreeVectorFP<bit U, bit S, bits<3> opc,
+                                 string asm, SDPatternOperator OpNode> {
+  def v4f16 : BaseSIMDThreeSameVector<0, U, {S,0b10}, {0b00,opc}, V64,
+                                      asm, ".4h",
+        [(set (v4f16 V64:$Rd), (OpNode (v4f16 V64:$Rn), (v4i16 V64:$Rm)))]>;
+  def v8f16 : BaseSIMDThreeSameVector<1, U, {S,0b10}, {0b00,opc}, V128,
+                                      asm, ".8h",
+        [(set (v8f16 V128:$Rd), (OpNode (v8f16 V128:$Rn), (v8i16 V128:$Rm)))]>;
+  def v2f32 : BaseSIMDThreeSameVector<0, U, {S,0b01}, {0b11,opc}, V64,
+                                      asm, ".2s",
+        [(set (v2f32 V64:$Rd), (OpNode (v2f32 V64:$Rn), (v2i32 V64:$Rm)))]>;
+  def v4f32 : BaseSIMDThreeSameVector<1, U, {S,0b01}, {0b11,opc}, V128,
+                                      asm, ".4s",
+        [(set (v4f32 V128:$Rd), (OpNode (v4f32 V128:$Rn), (v4i32 V128:$Rm)))]>;
+  def v2f64 : BaseSIMDThreeSameVector<1, U, {S,0b11}, {0b11,opc}, V128,
+                                      asm, ".2d",
+        [(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2i64 V128:$Rm)))]>;
+}
+
 let mayRaiseFPException = 1, Uses = [FPCR] in
 multiclass SIMDThreeSameVectorFPCmp<bit U, bit S, bits<3> opc,
                                     string asm,
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 1053ba9242768..1fa21278657ae 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -10128,7 +10128,7 @@ let Predicates = [HasFP8] in {
   defm BF2CVTL : SIMDMixedTwoVectorFP8<0b11, "bf2cvtl">;
   defm FCVTN_F16_F8 : SIMDThreeSameSizeVectorCvt<"fcvtn">;
   defm FCVTN_F32_F8 : SIMDThreeVectorCvt<"fcvtn">;
-  defm FSCALE : SIMDThreeSameVectorFP<0b1, 0b1, 0b111, "fscale", null_frag>;
+  defm FSCALE : SIMDThreeVectorFP<0b1, 0b1, 0b111, "fscale", int_aarch64_neon_fp8_fscale>;
 } // End let Predicates = [HasFP8]
 
 let Predicates = [HasFAMINMAX] in {
diff --git a/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll b/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll
new file mode 100644
index 0000000000000..da0e365db2d31
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64-linux -mattr=+neon,+fp8 < %s | FileCheck %s
+
+
+define <4 x half> @test_fscale_f16(<4 x half> %vn, <4 x i16> %vm) {
+; CHECK-LABEL: test_fscale_f16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.4h, v0.4h, v1.4h
+; CHECK-NEXT:    ret
+  %res = tail call <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half> %vn, <4 x i16> %vm)
+  ret <4 x half> %res
+}
+
+define <8 x half> @test_fscaleq_f16(<8 x half> %vn, <8 x i16> %vm) {
+; CHECK-LABEL: test_fscaleq_f16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.8h, v0.8h, v1.8h
+; CHECK-NEXT:    ret
+  %res = tail call <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half> %vn, <8 x i16> %vm)
+  ret <8 x half> %res
+}
+
+define <2 x float> @test_fscale_f32(<2 x float> %vn, <2 x i32> %vm) {
+; CHECK-LABEL: test_fscale_f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:    ret
+  %res = tail call <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float> %vn, <2 x i32> %vm)
+  ret <2 x float> %res
+}
+
+define <4 x float> @test_fscaleq_f32(<4 x float> %vn, <4 x i32> %vm) {
+; CHECK-LABEL: test_fscaleq_f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.4s, v0.4s, v1.4s
+; CHECK-NEXT:    ret
+  %res = tail call <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float> %vn, <4 x i32> %vm)
+  ret <4 x float> %res
+}
+
+define <2 x double> @test_fscaleq_f64(<2 x double> %vn, <2 x i64> %vm) {
+; CHECK-LABEL: test_fscaleq_f64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.2d, v0.2d, v1.2d
+; CHECK-NEXT:    ret
+  %res = tail call <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double> %vn, <2 x i64> %vm)
+  ret <2 x double> %res
+}
+
+declare <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half>, <4 x i16>)
+declare <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half>, <8 x i16>)
+declare <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float>, <2 x i32>)
+declare <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float>, <4 x i32>)
+declare <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double>, <2 x i64>)

@llvmbot
Copy link
Collaborator

llvmbot commented Jul 24, 2024

@llvm/pr-subscribers-clang

Author: None (Lukacma)

Changes

This patch implements following intrinsics:

float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>


Full diff: https://github.com/llvm/llvm-project/pull/100347.diff

7 Files Affected:

  • (modified) clang/include/clang/Basic/arm_neon.td (+6)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+8)
  • (added) clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c (+58)
  • (modified) llvm/include/llvm/IR/IntrinsicsAArch64.td (+7)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+21)
  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+1-1)
  • (added) llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll (+54)
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3098fa67e6a51..f930c62a79280 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -2096,3 +2096,9 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r
   def VLDAP1_LANE : WInst<"vldap1_lane", ".(c*!).I", "QUlQlUlldQdPlQPl">;
   def VSTL1_LANE  : WInst<"vstl1_lane", "v*(.!)I", "QUlQlUlldQdPlQPl">;
 }
+
+let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8" in {
+  // fscale
+  def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
+  def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
\ No newline at end of file
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5639239359ab8..816899e5c11e3 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -13491,6 +13491,14 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     Int = Intrinsic::aarch64_neon_suqadd;
     return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "vuqadd");
   }
+  case NEON::BI__builtin_neon_vscale_f16:
+  case NEON::BI__builtin_neon_vscaleq_f16:
+  case NEON::BI__builtin_neon_vscale_f32:
+  case NEON::BI__builtin_neon_vscaleq_f32:
+  case NEON::BI__builtin_neon_vscaleq_f64: {
+    Int = Intrinsic::aarch64_neon_fp8_fscale;
+    return EmitNeonCall(CGM.getIntrinsic(Int, Ty), Ops, "fscale");
+  }
   }
 }
 
diff --git a/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c b/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
new file mode 100644
index 0000000000000..b50d30876a7c5
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
@@ -0,0 +1,58 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 4
+#include <arm_neon.h>
+
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -S -O3 -o /dev/null %s
+
+// CHECK-LABEL: define dso_local <4 x half> @test_vscale_f16(
+// CHECK-SAME: <4 x half> noundef [[VN:%.*]], <4 x i16> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half> [[VN]], <4 x i16> [[VM]])
+// CHECK-NEXT:    ret <4 x half> [[FSCALE2_I]]
+//
+float16x4_t test_vscale_f16(float16x4_t vn, int16x4_t vm) {
+  return vscale_f16(vn, vm);
+}
+
+// CHECK-LABEL: define dso_local <8 x half> @test_vscaleq_f16(
+// CHECK-SAME: <8 x half> noundef [[VN:%.*]], <8 x i16> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half> [[VN]], <8 x i16> [[VM]])
+// CHECK-NEXT:    ret <8 x half> [[FSCALE2_I]]
+//
+float16x8_t test_vscaleq_f16(float16x8_t vn, int16x8_t vm) {
+  return vscaleq_f16(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <2 x float> @test_vscale_f32(
+// CHECK-SAME: <2 x float> noundef [[VN:%.*]], <2 x i32> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float> [[VN]], <2 x i32> [[VM]])
+// CHECK-NEXT:    ret <2 x float> [[FSCALE2_I]]
+//
+float32x2_t test_vscale_f32(float32x2_t vn, int32x2_t vm) {
+  return vscale_f32(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <4 x float> @test_vscaleq_f32(
+// CHECK-SAME: <4 x float> noundef [[VN:%.*]], <4 x i32> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float> [[VN]], <4 x i32> [[VM]])
+// CHECK-NEXT:    ret <4 x float> [[FSCALE2_I]]
+//
+float32x4_t test_vscaleq_f32(float32x4_t vn, int32x4_t vm) {
+  return vscaleq_f32(vn, vm);
+
+}
+
+// CHECK-LABEL: define dso_local <2 x double> @test_vscale_f64(
+// CHECK-SAME: <2 x double> noundef [[VN:%.*]], <2 x i64> noundef [[VM:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[FSCALE2_I:%.*]] = tail call <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double> [[VN]], <2 x i64> [[VM]])
+// CHECK-NEXT:    ret <2 x double> [[FSCALE2_I]]
+//
+float64x2_t test_vscale_f64(float64x2_t vn, int64x2_t vm) {
+  return vscaleq_f64(vn, vm);
+}
diff --git a/llvm/include/llvm/IR/IntrinsicsAArch64.td b/llvm/include/llvm/IR/IntrinsicsAArch64.td
index 3735bf5222fce..1f1691a6235b8 100644
--- a/llvm/include/llvm/IR/IntrinsicsAArch64.td
+++ b/llvm/include/llvm/IR/IntrinsicsAArch64.td
@@ -563,6 +563,13 @@ let TargetPrefix = "aarch64", IntrProperties = [IntrNoMem] in {
   def int_aarch64_neon_vcmla_rot90  : AdvSIMD_3VectorArg_Intrinsic;
   def int_aarch64_neon_vcmla_rot180 : AdvSIMD_3VectorArg_Intrinsic;
   def int_aarch64_neon_vcmla_rot270 : AdvSIMD_3VectorArg_Intrinsic;
+  
+  // FP8 fscale
+  def int_aarch64_neon_fp8_fscale : DefaultAttrsIntrinsic<
+                                    [llvm_anyvector_ty],
+                                    [LLVMMatchType<0>,
+                                    LLVMVectorOfBitcastsToInt<0>],
+                                    [IntrNoMem]>;
 }
 
 let TargetPrefix = "aarch64" in {  // All intrinsics start with "llvm.aarch64.".
diff --git a/llvm/lib/Target/AArch64/AArch64InstrFormats.td b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
index e1ecc5a57dd26..46902fd9f8b0b 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrFormats.td
@@ -5985,6 +5985,27 @@ multiclass SIMDThreeSameVectorFP<bit U, bit S, bits<3> opc,
         [(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2f64 V128:$Rm)))]>;
 }
 
+// As above, but only floating point elements supported.
+let mayRaiseFPException = 1, Uses = [FPCR] in
+multiclass SIMDThreeVectorFP<bit U, bit S, bits<3> opc,
+                                 string asm, SDPatternOperator OpNode> {
+  def v4f16 : BaseSIMDThreeSameVector<0, U, {S,0b10}, {0b00,opc}, V64,
+                                      asm, ".4h",
+        [(set (v4f16 V64:$Rd), (OpNode (v4f16 V64:$Rn), (v4i16 V64:$Rm)))]>;
+  def v8f16 : BaseSIMDThreeSameVector<1, U, {S,0b10}, {0b00,opc}, V128,
+                                      asm, ".8h",
+        [(set (v8f16 V128:$Rd), (OpNode (v8f16 V128:$Rn), (v8i16 V128:$Rm)))]>;
+  def v2f32 : BaseSIMDThreeSameVector<0, U, {S,0b01}, {0b11,opc}, V64,
+                                      asm, ".2s",
+        [(set (v2f32 V64:$Rd), (OpNode (v2f32 V64:$Rn), (v2i32 V64:$Rm)))]>;
+  def v4f32 : BaseSIMDThreeSameVector<1, U, {S,0b01}, {0b11,opc}, V128,
+                                      asm, ".4s",
+        [(set (v4f32 V128:$Rd), (OpNode (v4f32 V128:$Rn), (v4i32 V128:$Rm)))]>;
+  def v2f64 : BaseSIMDThreeSameVector<1, U, {S,0b11}, {0b11,opc}, V128,
+                                      asm, ".2d",
+        [(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2i64 V128:$Rm)))]>;
+}
+
 let mayRaiseFPException = 1, Uses = [FPCR] in
 multiclass SIMDThreeSameVectorFPCmp<bit U, bit S, bits<3> opc,
                                     string asm,
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 1053ba9242768..1fa21278657ae 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -10128,7 +10128,7 @@ let Predicates = [HasFP8] in {
   defm BF2CVTL : SIMDMixedTwoVectorFP8<0b11, "bf2cvtl">;
   defm FCVTN_F16_F8 : SIMDThreeSameSizeVectorCvt<"fcvtn">;
   defm FCVTN_F32_F8 : SIMDThreeVectorCvt<"fcvtn">;
-  defm FSCALE : SIMDThreeSameVectorFP<0b1, 0b1, 0b111, "fscale", null_frag>;
+  defm FSCALE : SIMDThreeVectorFP<0b1, 0b1, 0b111, "fscale", int_aarch64_neon_fp8_fscale>;
 } // End let Predicates = [HasFP8]
 
 let Predicates = [HasFAMINMAX] in {
diff --git a/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll b/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll
new file mode 100644
index 0000000000000..da0e365db2d31
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/neon-fp8-fscale.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64-linux -mattr=+neon,+fp8 < %s | FileCheck %s
+
+
+define <4 x half> @test_fscale_f16(<4 x half> %vn, <4 x i16> %vm) {
+; CHECK-LABEL: test_fscale_f16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.4h, v0.4h, v1.4h
+; CHECK-NEXT:    ret
+  %res = tail call <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half> %vn, <4 x i16> %vm)
+  ret <4 x half> %res
+}
+
+define <8 x half> @test_fscaleq_f16(<8 x half> %vn, <8 x i16> %vm) {
+; CHECK-LABEL: test_fscaleq_f16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.8h, v0.8h, v1.8h
+; CHECK-NEXT:    ret
+  %res = tail call <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half> %vn, <8 x i16> %vm)
+  ret <8 x half> %res
+}
+
+define <2 x float> @test_fscale_f32(<2 x float> %vn, <2 x i32> %vm) {
+; CHECK-LABEL: test_fscale_f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:    ret
+  %res = tail call <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float> %vn, <2 x i32> %vm)
+  ret <2 x float> %res
+}
+
+define <4 x float> @test_fscaleq_f32(<4 x float> %vn, <4 x i32> %vm) {
+; CHECK-LABEL: test_fscaleq_f32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.4s, v0.4s, v1.4s
+; CHECK-NEXT:    ret
+  %res = tail call <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float> %vn, <4 x i32> %vm)
+  ret <4 x float> %res
+}
+
+define <2 x double> @test_fscaleq_f64(<2 x double> %vn, <2 x i64> %vm) {
+; CHECK-LABEL: test_fscaleq_f64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    fscale v0.2d, v0.2d, v1.2d
+; CHECK-NEXT:    ret
+  %res = tail call <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double> %vn, <2 x i64> %vm)
+  ret <2 x double> %res
+}
+
+declare <4 x half> @llvm.aarch64.neon.fp8.fscale.v4f16(<4 x half>, <4 x i16>)
+declare <8 x half> @llvm.aarch64.neon.fp8.fscale.v8f16(<8 x half>, <8 x i16>)
+declare <2 x float> @llvm.aarch64.neon.fp8.fscale.v2f32(<2 x float>, <2 x i32>)
+declare <4 x float> @llvm.aarch64.neon.fp8.fscale.v4f32(<4 x float>, <4 x i32>)
+declare <2 x double> @llvm.aarch64.neon.fp8.fscale.v2f64(<2 x double>, <2 x i64>)

@@ -5985,6 +5985,27 @@ multiclass SIMDThreeSameVectorFP<bit U, bit S, bits<3> opc,
[(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2f64 V128:$Rm)))]>;
}

// As above, but only floating point elements supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I believe we can remove this comments.
Looks like a c&p from the class above, that is also only supporting FP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wrong comment was removed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

@@ -2096,3 +2096,9 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r
def VLDAP1_LANE : WInst<"vldap1_lane", ".(c*!).I", "QUlQlUlldQdPlQPl">;
def VSTL1_LANE : WInst<"vstl1_lane", "v*(.!)I", "QUlQlUlldQdPlQPl">;
}

let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8" in {
Copy link
Contributor

@SpencerAbson SpencerAbson Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be TargetGuard = "fp8,neon"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -5985,6 +5985,27 @@ multiclass SIMDThreeSameVectorFP<bit U, bit S, bits<3> opc,
[(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2f64 V128:$Rm)))]>;
}

// As above, but only floating point elements supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wrong comment was removed here.

@@ -5985,6 +5984,27 @@ multiclass SIMDThreeSameVectorFP<bit U, bit S, bits<3> opc,
[(set (v2f64 V128:$Rd), (OpNode (v2f64 V128:$Rn), (v2f64 V128:$Rm)))]>;
}

// As above, but only floating point elements supported.
let mayRaiseFPException = 1, Uses = [FPCR] in
multiclass SIMDThreeVectorFP<bit U, bit S, bits<3> opc,
Copy link
Contributor

@SpencerAbson SpencerAbson Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the name of this class be changed to reflect that it's second operand's elements must be integers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed and moved.

Copy link
Contributor

@SpencerAbson SpencerAbson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@Lukacma Lukacma merged commit c511cc0 into llvm:main Sep 26, 2024
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-sie-ubuntu-fast running on sie-linux-worker while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/8043

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 4: /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/clang -cc1 -internal-isystem /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
    2 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
1 error generated.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder clang-ve-ninja running on hpce-ve-main while building clang,llvm at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/12/builds/6660

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/ve-linux.py ...' (failure)
...
[293/299] Linking CXX executable tools/clang/unittests/Driver/ClangDriverTests
[294/299] Linking CXX executable tools/clang/unittests/Tooling/ToolingTests
[295/299] Linking CXX executable tools/clang/unittests/Frontend/FrontendTests
[296/299] Linking CXX executable tools/clang/unittests/CodeGen/ClangCodeGenTests
[297/299] Linking CXX executable tools/clang/unittests/Interpreter/ExceptionTests/ClangReplInterpreterExceptionTests
[298/299] Linking CXX executable tools/clang/unittests/Interpreter/ClangReplInterpreterTests
[298/299] Running the Clang regression tests
-- Testing: 21108 tests, 48 workers --
llvm-lit: /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using clang: /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang
Testing:  0.. 10.. 20.
FAIL: Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c (5672 of 21108)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 4: /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang -cc1 -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang -cc1 -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
    2 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
1 error generated.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c


Testing Time: 55.57s

Total Discovered Tests: 38083
  Skipped          :    10 (0.03%)
  Unsupported      :  3777 (9.92%)
  Passed           : 34270 (89.99%)
  Expectedly Failed:    25 (0.07%)
  Failed           :     1 (0.00%)
FAILED: tools/clang/test/CMakeFiles/check-clang /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/tools/clang/test/CMakeFiles/check-clang 
cd /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/tools/clang/test && /home/buildbot/sandbox/bin/python3 /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/./bin/llvm-lit -sv --param USE_Z3_SOLVER=0 /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/tools/clang/test
ninja: build stopped: subcommand failed.
make: *** [check-llvm] Error 1
['make', '-f', '/scratch/buildbot/bothome/clang-ve-ninja/llvm-zorg/zorg/buildbot/builders/annotated/ve-linux-steps.make', 'check-llvm', 'BUILDROOT=/scratch/buildbot/bothome/clang-ve-ninja/build'] exited with return code 2.
The build step threw an exception...
Step 8 (check-llvm) failure: check-llvm (failure)
...
[293/299] Linking CXX executable tools/clang/unittests/Driver/ClangDriverTests
[294/299] Linking CXX executable tools/clang/unittests/Tooling/ToolingTests
[295/299] Linking CXX executable tools/clang/unittests/Frontend/FrontendTests
[296/299] Linking CXX executable tools/clang/unittests/CodeGen/ClangCodeGenTests
[297/299] Linking CXX executable tools/clang/unittests/Interpreter/ExceptionTests/ClangReplInterpreterExceptionTests
[298/299] Linking CXX executable tools/clang/unittests/Interpreter/ClangReplInterpreterTests
[298/299] Running the Clang regression tests
-- Testing: 21108 tests, 48 workers --
llvm-lit: /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/llvm/utils/lit/lit/llvm/config.py:506: note: using clang: /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang
Testing:  0.. 10.. 20.
FAIL: Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c (5672 of 21108)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 4: /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang -cc1 -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/clang -cc1 -internal-isystem /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
    2 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
1 error generated.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/bin/FileCheck /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c


Testing Time: 55.57s

Total Discovered Tests: 38083
  Skipped          :    10 (0.03%)
  Unsupported      :  3777 (9.92%)
  Passed           : 34270 (89.99%)
  Expectedly Failed:    25 (0.07%)
  Failed           :     1 (0.00%)
FAILED: tools/clang/test/CMakeFiles/check-clang /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/tools/clang/test/CMakeFiles/check-clang 
cd /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/tools/clang/test && /home/buildbot/sandbox/bin/python3 /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/./bin/llvm-lit -sv --param USE_Z3_SOLVER=0 /scratch/buildbot/bothome/clang-ve-ninja/build/build_llvm/tools/clang/test
ninja: build stopped: subcommand failed.
make: *** [check-llvm] Error 1
['make', '-f', '/scratch/buildbot/bothome/clang-ve-ninja/llvm-zorg/zorg/buildbot/builders/annotated/ve-linux-steps.make', 'check-llvm', 'BUILDROOT=/scratch/buildbot/bothome/clang-ve-ninja/build'] exited with return code 2.
The build step threw an exception...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-sles-build-only running on rocm-worker-hw-04-sles while building clang,llvm at step 6 "Add check check-clang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/140/builds/7532

Here is the relevant piece of the build log for the reference
Step 6 (Add check check-clang) failure: test (failure)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 4: /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/clang -cc1 -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/clang -cc1 -internal-isystem /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
/home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
    2 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
1 error generated.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.build/bin/FileCheck /home/botworker/bbot/builds/openmp-offload-sles-build/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder clang-cmake-x86_64-avx512-linux running on avx512-intel64 while building clang,llvm at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/133/builds/4312

Here is the relevant piece of the build log for the reference
Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 4: /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/clang -cc1 -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/FileCheck /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/clang -cc1 -internal-isystem /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/FileCheck /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
/localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
    2 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
1 error generated.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/stage1/bin/FileCheck /localdisk2/buildbot/llvm-worker/clang-cmake-x86_64-avx512-linux/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-sie-win running on sie-win-worker while building clang,llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/46/builds/5582

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 4
z:\b\llvm-clang-x86_64-sie-win\build\bin\clang.exe -cc1 -internal-isystem Z:\b\llvm-clang-x86_64-sie-win\build\lib\clang\20\include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\CodeGen\aarch64-neon-fp8-intrinsics\acle_neon_fscale.c | z:\b\llvm-clang-x86_64-sie-win\build\bin\filecheck.exe Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\CodeGen\aarch64-neon-fp8-intrinsics\acle_neon_fscale.c
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\clang.exe' -cc1 -internal-isystem 'Z:\b\llvm-clang-x86_64-sie-win\build\lib\clang\20\include' -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\CodeGen\aarch64-neon-fp8-intrinsics\acle_neon_fscale.c'
# .---command stderr------------
# | Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\CodeGen\aarch64-neon-fp8-intrinsics\acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
# |     2 | #include <arm_neon.h>
# |       |          ^~~~~~~~~~~~
# | 1 error generated.
# `-----------------------------
# error: command failed with exit status: 1
# executed command: 'z:\b\llvm-clang-x86_64-sie-win\build\bin\filecheck.exe' 'Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\CodeGen\aarch64-neon-fp8-intrinsics\acle_neon_fscale.c'
# .---command stderr------------
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  z:\b\llvm-clang-x86_64-sie-win\build\bin\filecheck.exe Z:\b\llvm-clang-x86_64-sie-win\llvm-project\clang\test\CodeGen\aarch64-neon-fp8-intrinsics\acle_neon_fscale.c
# `-----------------------------
# error: command failed with exit status: 2

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder arc-builder running on arc-worker while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/5294

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 4: /buildbot/worker/arc-folder/build/bin/clang -cc1 -internal-isystem /buildbot/worker/arc-folder/build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /buildbot/worker/arc-folder/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /buildbot/worker/arc-folder/build/bin/FileCheck /buildbot/worker/arc-folder/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /buildbot/worker/arc-folder/build/bin/clang -cc1 -internal-isystem /buildbot/worker/arc-folder/build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /buildbot/worker/arc-folder/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
/buildbot/worker/arc-folder/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
    2 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
+ /buildbot/worker/arc-folder/build/bin/FileCheck /buildbot/worker/arc-folder/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
1 error generated.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /buildbot/worker/arc-folder/build/bin/FileCheck /buildbot/worker/arc-folder/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder clang-armv8-quick running on linaro-clang-armv8-quick while building clang,llvm at step 5 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/5082

Here is the relevant piece of the build log for the reference
Step 5 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/clang -cc1 -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/clang -cc1 -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/FileCheck /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
RUN: at line 5: /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/clang -cc1 -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -S -O3 -o /dev/null /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/bin/clang -cc1 -internal-isystem /home/tcwg-buildbot/worker/clang-armv8-quick/stage1/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -S -O3 -o /dev/null /home/tcwg-buildbot/worker/clang-armv8-quick/llvm/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
error: unable to create target: 'No available targets are compatible with triple "aarch64-none-linux-gnu"'
1 error generated.

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder openmp-offload-libc-amdgpu-runtime running on omp-vega20-1 while building clang,llvm at step 7 "Add check check-clang".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/73/builds/6203

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-clang) failure: test (failure)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 4: /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/clang -cc1 -internal-isystem /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/clang -cc1 -internal-isystem /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
/home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
    2 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
1 error generated.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.build/bin/FileCheck /home/ompworker/bbot/openmp-offload-libc-amdgpu-runtime/llvm.src/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c

--

********************

Step 10 (Add check check-offload) failure: 1200 seconds without output running [b'ninja', b'-j 32', b'check-offload'], attempting to kill
...
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49779.cpp (866 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/test_libc.cpp (867 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug47654.cpp (868 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/wtime.c (869 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug50022.cpp (870 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp (871 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp (872 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp (873 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp (874 of 879)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp (875 of 879)
command timed out: 1200 seconds without output running [b'ninja', b'-j 32', b'check-offload'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1236.197204

@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 26, 2024

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-darwin running on doug-worker-3 while building clang,llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/23/builds/3385

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
RUN: at line 4: /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang -cc1 -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c | /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/FileCheck /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang -cc1 -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/lib/clang/20/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +neon -target-feature +fp8 -O3 -emit-llvm -o - /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
+ /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/FileCheck /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c
/Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c:2:10: fatal error: 'arm_neon.h' file not found
    2 | #include <arm_neon.h>
      |          ^~~~~~~~~~~~
1 error generated.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/FileCheck /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/CodeGen/aarch64-neon-fp8-intrinsics/acle_neon_fscale.c

--

********************


augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Sep 26, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Sep 27, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:ir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants