Skip to content

Commit

Permalink
Merge upstream (Xilinx#191)
Browse files Browse the repository at this point in the history
* install target.h for the memory allocator as well (Xilinx#606)

This is redundant, but reflects the true dependencies better.

* Fix path used for tests

Peano should come before the regular vitis path, due to name collisions.

* Add basic AIE2 tests.

* [AIE] Add decoding of DMA status

The test library now does this for both AIE1 and AIE2 DMAs.

* [AIE] Add packet stream tests for ShimDMAs

This test looks at a common scenario where we have 3 tensors input
to a tile from 3 independent DMAs, but only 2 receiving tile DMAs.
Using packet routing, this scenario can be accommodated by
time-sharing one of the destination DMAs.

Obsoletes Xilinx#85

* Fix error message if no device.

We need to return with an error message, or later code will
segfault.

* Put exp lookup table into run_time_lib/AIE2 (Xilinx#604)

* Update chess_intrinsic_wrapper.cpp (Xilinx#610)

Remove event intrinsic declarations from AIEv1 wrapper

* Add TOSA tensor broadcast and mixed precision tests (Xilinx#609)

* Add the following TOSA integration tests to test/Integration/Dialect/TOSA/
* List of PASS tests:
i16xi16_add_elem (lane=32)
i16xi16_mul_elem (lane=32)
i16xi16_sel (lane=32)
i16xi16_sub_elem (lane=32)
i8xi8_add_elem (lane=64)
i8xi8_mul_elem (lane=32)
i8xi8_sel (lane=64)
i8xi8_sub_elem (lane=64)
bf16xbf16_sub_elem_2d_broadcast_1d (lane=16)
bf16xbf16_sub_elem_2d_broadcast_1d_reshape (lane=16)
* List of XFAIL tests:
i8xi16_sub_elem (lane=32)
bf16xbf16_sub_elem_2d_broadcast_2d (lane=16)
bf16xbf16_sub_elem_2d_broadcast_1d_unit_dim (lane=16)

* Fix include order (Xilinx#613)

* [aievec] Add hoisting patterns for arith.extsi

Hoisting cast operations as close as possible to the source of data can
make later patterns more robust to typical variations in the source
code.

We might need to revisit this one if, in the future, this process
causes unintended consequences.

* Implement inverse of a float by lookup tables (Xilinx#612)

* Fix some test failures (Xilinx#614)

* Move aiecc.py implementation to python library (Xilinx#387)

* Use correct macros for C API (Xilinx#615)

* capi

* reformat

* Re-export MLIR target set in CMake (Xilinx#617)

* mlirconfig

* reformat

* Disable `-Wno-unknown-warning-option` on windows (Xilinx#620)

* unknownwarning

* reformat

* win32 (Xilinx#618)

* Use new policy CMP0091 for MSVC (Xilinx#619)

* msvc

* reformat

* Revert "Use new policy CMP0091 for MSVC (Xilinx#619)" (Xilinx#622)

This reverts commit 1520898.

* Use upstream CMake macros to find python (Xilinx#616)

* cmake

* reformat

* Bump cmakeModules (Xilinx#624)

* ObjFifo unroll dependency fixes (Xilinx#621)

* Fixes for objFifo unrolling algorithm.

* EOF

* clang-format (Xilinx#625)

* Use target model functions to get number of DMA channels.

* Clang format

* Fix function call

* Add shim tiles that are not in noc columns in the getNumDestShimMuxConnections() functions.

* Add isShimNOCorPLTile () to the target model.

* Add missing target model.

* Add improvements to the doc

* Add isShimNOCorPLTile() virtual function.

* Clang format

---------

Co-authored-by: abisca <[email protected]>
Co-authored-by: Joseph Melber <[email protected]>

* need ONLY option to make cmake find numpy (Xilinx#630)

* Split ccache database according to the parallel jobs (Xilinx#600)

This fixes a race condition in the ccache database writing happening
at the end of each job running in parallel.
By using a unique key per job, each database is correctly written and
can be used for a next CI run.
Use also the real LLVM commit hash used by ccache database key in CI
instead of previous hack assuming the textual commit was present inside
utils/clone-llvm.sh.

* Fix TOSA broadcast and mixed precision tests (Xilinx#631)

Fix the following TOSA tests:
- bf16xbf16_sub_elem_2d_broadcast_2d
- i8xi16_sub_elem
Add the following new TOSA tests:
- i16xi16_sub_elem_2d_broadcast_scalar (pass)
- i16xi16_sub_elem_2d_broadcast_1d_unit_dim (pass)
- bf16xbf16_sub_elem_2d_broadcast_scalar (xfail)

* Fix ordering of putStream intrinsic.

The argument order for the intrinsic didn't match.
Argument 0: channel #
Argument 1: value

* Fix decoding of tile status for stream stalls.

These were just non-sensical.

* Add end-to-end tests for CPU stream access.

* Fix intrinsic wrapper for aie2 acquire/release

* Explictly compile intrinsic-dependent code with
chess frontend.

* [tests] Remove address.

This address is ignored, resulting in a warning.

* catch up to TOM MLIR (Xilinx#590)

* catch up to llvm TOM

* Update VectorToAIEVecConversions.cpp

* Get `VectorType` instead of `Type`

* format

* xfail opaque pointer related tests and update test

* update finalize-memref-to-llvm

---------

Co-authored-by: Javier Setoain <[email protected]>

* Add softmax test cases (Xilinx#635)

* Revised the xchess compilation commands for lut test cases. (Xilinx#636)

* Add more combined precision tosa tests (Xilinx#637)

Add the following passing element-wise tosa tests:
- i32xi32_add_elem (lane=32)
- i32xi32_mul_elem (lane=16)
- i32xi32_sel (lane=16)
- i32xi32_sub_elem (lane=32)

Add the following passing combined precision element-wise tosa tests:
- i8xi16_add_elem (lane=32)
- i8xi16_sub_elem (lane=32)
- i8xi32_add_elem (lane=32)
- i8xi32_sub_elem (lane=32)
- i16xi32_add_elem_v16 (lane=16)
- i16xi32_sub_elem_v16 (lane=16)

Add the following XFAIL combined precision element-wise tosa tests:
- i16xi32_add_elem_v32 (lane=32)
- i16xi32_sub_elem_v32 (lane=32)

* [aievec] Generalize vector passes

Right now, vectorization passes hook to FuncOp, which prevents
conversion to AIEVec within other top level operations, like AIE.device
ops.

This patch makes all passes generic and allows for conversion within
AIE.device.

* Implement tanh(x) based on linear approximation lookup tables (Xilinx#639)

* Refactor conversion of aievec.mul_elem to support combined precision (Xilinx#643)

* Refactor AIE-ML acc datatype emission
* Refactor arith.muli/mulf to aievec.mul_elem conversion pattern to make it extensible and clean
  - Reorganize the existing case-by-case patterns and decouple the pattern that requires two inputs to be the same type
  - Make it a cleaner pattern considering lhs/rhs/out datatype
  - Verified that all the dut.cc are identical before/after the refactor
* Add convertValueToTargetTypeAieML() which can be helpful for handling the vector lane mismatch issue later on.
* Add CPP emission for aievec.unpack op
* Add VectorToAIEVec lit tests to cover the lowering patterns
* Add new combined precision tosa tests for element-wise multiply:
  - i8xi16_mul_elem_v32 (out=i32, lane=32) (cycle count=144, PM=272), PASS
  - i8xi16_mul_elem_v16 (out=i32, lane=16) (cycle count=792, PM=368), XFAIL
    - No intent to work on this at the moment, but keep a record there
  - i16xi32_mul_elem (out=i32, lane=16) (cycle count=408, PM=384), PASS
  - i8xi32_mul_elem (out=i32, lane=16) (cycle count=728, PM=368), PASS

* Compute memref sizes by multiplying all shape sizes. (Xilinx#641)

Co-authored-by: abisca <[email protected]>

* [aievec][nfc] Clean-up aievec to llvm conversion

This code needed updating its use of a couple of constructs, and
namespaces.

* Add tosa-to-tensor pass to fix regression (Xilinx#645)

* Add tosa-to-tensor pass to fix regression of tosa broadcast tests

* Convert math.sqrt to a function call getSqrtBf16() for v16bfloat16 and v32bfloat16 types (Xilinx#646)

* Add comments for sqrt.h (Xilinx#648)

* Adding more tosa tests for combined precision inputs and broadcast (Xilinx#650)

* Add floatxfloat_sub_elem tosa test
* Add floatxfloat_add_elem tosa test
* Add floatxfloat_sel tosa test
* Add bf16xfloat_sub_elem tosa test
* Add bf16xfloat_add_elem tosa test
* Add i16xi16_sub_elem broadcast tests
* Add i8xi8_sub_elem broadcast tests
* Reorganize bf16xbf16 broadcast tosa tests
* Add floatxfloat_sub_elem broadcast tests
* Fix tosa lowering pipeline for bf16xbf16 sub_elem broadcast tests

* [aievec] Add missing conversion warnings for mac_elem and broadcast

This patch is a first step towards enabling AIEVec to LLVM Dialect
conversion for AIEml intrinsics.

* Add support of broadcast with vector width = 256 or 1024 and fix TOSA tests (Xilinx#653)

*Add support of broadcast_elem/broadcast_to_vxx for vector width == 256 (e.g. v16bf16) or 1024 (e.g. v32int32).
*Since we lower vector.broadcast op to multiple aievec ops, we have to fix FoldMulAddChainToConv pass to recognize the new aievec.broadcast patterns
*Add the following list of PASS tests for implicit broadcast:
i32xi32_sub_elem_16x1024_broadcast_1
i32xi32_sub_elem_2d_broadcast_1d_unit_dim_v16 (out=i32, lane=16)
i32xi32_sub_elem_2d_broadcast_1d_unit_dim_v32 (out=i32, lane=32)
i32xi32_sub_elem_2d_broadcast_scalar_v16 (out=i32, lane=16)
i32xi32_sub_elem_2d_broadcast_scalar_v32 (out=i32, lane=32)
i32xi32_sub_elem_16x1024_broadcast_1024
i32xi32_sub_elem_2d_broadcast_1d_reshape_v16 (out=i32, lane=16)
i32xi32_sub_elem_2d_broadcast_1d_reshape_v32 (out=i32, lane=32)
i32xi32_sub_elem_2d_broadcast_1d_v16 (out=i32, lane=16)
i32xi32_sub_elem_2d_broadcast_1d_v32 (out=i32, lane=32)
i32xi32_sub_elem_2d_broadcast_2d_v16 (out=i32, lane=16)
i32xi32_sub_elem_2d_broadcast_2d_v32 (out=i32, lane=32)
*Add dut.cc reference for bf16xbf16_sub_elem_16x1024_broadcast_1 tests. The resulting dut.cc is legal, but it's blocked by "broadcast_elem() of v32bfloat16" bug. Hence, the tests are still marked XFAIL.
*Add conversion test coverage for aievec.broadcast and aievec.broadcast_scalar in test_broadcast.mlir
*Fix i8xi16_mul_elem_v32 mlir script

* Convert tosa.erf and math.erf to a function call getErfBf16() for v16bfloat16 and v32bfloat16 types (Xilinx#652)

* Enable use of mlir pass manager in aiecc (Xilinx#628)

* Enable use of mlir pass manager in aiecc

* clang-format

* limit scope of mlir context, rebase

* fixup

* Revert "catch up to TOM MLIR (Xilinx#590)" (Xilinx#656)

This reverts commit 47ff7d3.

* Make pathfinder aware of the arch-specific routing constraints (Xilinx#657)

* Convert math.rsqrt to a function call getRsqrtBf16() for v16bfloat16 and v32bfloat16 types and reorganize files in aie_runtime_lib (Xilinx#655)

* Add more add/sub/mul mixed precision tests (Xilinx#659)

* Refactor the tosa-to-vector pipelines script in each test to a central place at test/Integration/lit.local.cfg for better maintainability. Also, make sure each .mlir test is running in a unique workdir for placing multiple .mlir test in a single directory.
* For add/sub/mul mixed precision tests, we add tests with swapped inputs
* Per the TOSA spec at https://www.mlplatform.org/tosa/tosa_spec.html#_mul, we add test coverage for i16xi16_mul_elem_i32, and i8xi8_mul_elem_i32. Our refactored mul_elem lowering pattern works on these two cases directly, and the acctype for the i8/i16 mac intrinsics we used is i32.

* Enable AIEX dialect bindings (Xilinx#658)

* Enable AIEX dialect bindings

* Replace 'Aie' prefix with 'AIE' in python cmake

* Fixs after merge

* Apply xca_udm_dbg workaround to new tests

* Change runner to xsj

* XFAIL some tests after merge

Co-authored-by: Stephen Neuendorffer <[email protected]>
Co-authored-by: Hanchen Ye <[email protected]>
Co-authored-by: Lina Yu <[email protected]>
Co-authored-by: James Lin <[email protected]>
Co-authored-by: Javier Setoain <[email protected]>
Co-authored-by: Maksim Levental <[email protected]>
Co-authored-by: Andra Bisca <[email protected]>
Co-authored-by: abisca <[email protected]>
Co-authored-by: Joseph Melber <[email protected]>
Co-authored-by: Kristof Denolf <[email protected]>
Co-authored-by: Ronan Keryell <[email protected]>
Co-authored-by: Javier Setoain <[email protected]>
Co-authored-by: erwei-xilinx <[email protected]>
  • Loading branch information
14 people authored and GitHub Enterprise committed Sep 29, 2023
1 parent b075eee commit a8d67c5
Show file tree
Hide file tree
Showing 293 changed files with 7,224 additions and 874 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/buildAndTest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
# cache.
build-llvm:
name: Build pynqMLIR-AIE
runs-on: xrlabs-xco
runs-on: xrlabs-xsj
steps:
# - name: Configure Environment
# run: echo "$GITHUB_WORKSPACE/llvm/install/bin" >> $GITHUB_PATH
Expand Down
228 changes: 1 addition & 227 deletions aie_runtime_lib/AIE/lut_based_ops.cpp

Large diffs are not rendered by default.

83 changes: 1 addition & 82 deletions aie_runtime_lib/AIE/lut_based_ops.h
Original file line number Diff line number Diff line change
@@ -1,82 +1 @@
//===--- exp_lut.h - get exponential values from loopup tables ---===//
//
// This file is licensed under the Apache License v2.0 with LLVM Exceptions
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
// (c) Copyright 2023 Xilinx Inc.
//
//
//===----------------------------------------------------------------------===//
// This is the implementation of getting exponential values for a bfloat16
// vector from exponential lookup tables.
//===----------------------------------------------------------------------===//
#ifndef __LUT_BASED_OPS_H__
#define __LUT_BASED_OPS_H__

#include "aie_api/aie.hpp"

alignas(aie::vector_decl_align) extern int16 exp_ilut_ab[512];
alignas(aie::vector_decl_align) extern int16 exp_ilut_cd[512];
alignas(aie::vector_decl_align) extern int16 exp_flut_ab[512];
alignas(aie::vector_decl_align) extern int16 exp_flut_cd[512];
alignas(aie::vector_decl_align) extern unsigned char m_inv_lut[128];

__attribute__((always_inline)) v16accfloat getExpBf16(v16bfloat16 x) {
bfloat16 __aie_dm_resource_a *ilut_ab =
(bfloat16 __aie_dm_resource_a *)exp_ilut_ab;
bfloat16 __aie_dm_resource_b *ilut_cd =
(bfloat16 __aie_dm_resource_b *)exp_ilut_cd;
bfloat16 __aie_dm_resource_a *flut_ab =
(bfloat16 __aie_dm_resource_a *)exp_flut_ab;
bfloat16 __aie_dm_resource_b *flut_cd =
(bfloat16 __aie_dm_resource_b *)exp_flut_cd;

using lut_type = aie::lut<4, bfloat16, bfloat16>;
const int LUT_elems = 256;
const int step_i = 8;
const int step_f = 0;

lut_type lut_i(LUT_elems, ilut_ab, ilut_cd);
lut_type lut_f(LUT_elems, flut_ab, flut_cd);
aie::parallel_lookup<uint16, lut_type, aie::lut_oor_policy::truncate>
lookup_i(lut_i, step_i);
aie::parallel_lookup<uint16, lut_type, aie::lut_oor_policy::truncate>
lookup_f(lut_f, step_f);

aie::vector<bfloat16, 16> I_val_vec, F_val_vec;
aie::accum<accfloat, 16> exp_val;
aie::vector<bfloat16, 16> input_bf16 = x;

// position of output decimal point = 8, making input become 8 bits, and for
// LUT_elems = 256 lookup. aie::vector<int16, 16>
// input=aie::to_fixed<int16>(input_bf16,8);
aie::vector<int16, 32> input0 = v32int16(bfloat16_to_int(input_bf16, 8));
aie::vector<int16, 16> input = aie::filter_even(input0);

I_val_vec = lookup_i.fetch(input.cast_to<uint16>());
F_val_vec = lookup_f.fetch(input.cast_to<uint16>());
exp_val = aie::mul(I_val_vec, F_val_vec);
return v16accfloat(exp_val);
}

__attribute__((always_inline)) bfloat16 getInvBf16(float x) {
unsigned int *B_x;
unsigned int exp_mask = 0x7F800000;
unsigned int mantissa_mask = 0x007FFFFF;
unsigned int mantissa_Q = 0x00008000;
unsigned char exponent, mantissa;
unsigned inv_exponent;
unsigned short inv_x_val;
unsigned int B_Q;
bfloat16 *inv_x;
B_x = (unsigned int *)&x;
B_Q = *B_x + mantissa_Q;
exponent = (B_Q & exp_mask) >> 23;
mantissa = (B_Q & mantissa_mask) >> 16;
inv_exponent = (mantissa == 0) + (253 - exponent);
inv_x_val = (inv_exponent << 7) + m_inv_lut[mantissa];
inv_x = (bfloat16 *)&inv_x_val;
return *inv_x;
}
#endif //__LUT_BASED_OPS_H__
// Unsupported exp_lut.h for AIE1
1 change: 1 addition & 0 deletions aie_runtime_lib/AIE/vec_math.h
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
// Unsupported sqrt.h for AIE1
4 changes: 2 additions & 2 deletions aie_runtime_lib/AIE2/chess_intrinsic_wrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,10 @@
/// when parsing .ll code containing standard intrinsic names, so these symbols
/// are defined that way.

extern "C" void llvm___aie___lock___acquire___reg(unsigned id, unsigned val) {
extern "C" void llvm___aie2___acquire(unsigned id, unsigned val) {
acquire_equal(id, val);
}
extern "C" void llvm___aie___lock___release___reg(unsigned id, unsigned val) {
extern "C" void llvm___aie2___release(unsigned id, unsigned val) {
release(id, val);
}
extern "C" void llvm___aie___event0() { event0(); }
Expand Down
140 changes: 138 additions & 2 deletions aie_runtime_lib/AIE2/lut_based_ops.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//===--- exp_lut.cpp - exponential loopup tables ---===//
//===--- lut_based_ops.cpp - lookup table based operations ---===//
//
// This file is licensed under the Apache License v2.0 with LLVM Exceptions
// See https://llvm.org/LICENSE.txt for license information.
Expand All @@ -8,7 +8,7 @@
//
//
//===----------------------------------------------------------------------===//
// These are exponential lookup tables for bfloat16 type
// Lookup table based operations
//===----------------------------------------------------------------------===//

#include "aie_api/aie.hpp"
Expand Down Expand Up @@ -225,3 +225,139 @@ alignas(aie::vector_decl_align) unsigned char m_inv_lut[128] = {
22, 22, 21, 20, 20, 19, 18, 18, 17, 16, 16, 15, 14, 14, 13,
13, 12, 11, 11, 10, 10, 9, 9, 8, 7, 7, 6, 6, 5, 5,
4, 4, 3, 3, 2, 2, 1, 1};

// Tanh look up tables: Divides into 32 segments between [-4,4], bank size:
// (32*2*2*4)*2=1k, one lut=512B
float chess_storage(% chess_alignof(v32int8)) tanh_lut_ab[128] = {
0.00000000000000000000000000000000, -1.00000000000000000000000000000000,
0.00283813476562500000000000000000, -0.98828125000000000000000000000000,
0.00000000000000000000000000000000, -1.00000000000000000000000000000000,
0.00283813476562500000000000000000, -0.98828125000000000000000000000000,
0.00509643554687500000000000000000, -0.98046875000000000000000000000000,
0.00750732421875000000000000000000, -0.97265625000000000000000000000000,
0.00509643554687500000000000000000, -0.98046875000000000000000000000000,
0.00750732421875000000000000000000, -0.97265625000000000000000000000000,
0.01269531250000000000000000000000, -0.95703125000000000000000000000000,
0.02124023437500000000000000000000, -0.93359375000000000000000000000000,
0.01269531250000000000000000000000, -0.95703125000000000000000000000000,
0.02124023437500000000000000000000, -0.93359375000000000000000000000000,
0.03540039062500000000000000000000, -0.89843750000000000000000000000000,
0.05639648437500000000000000000000, -0.85156250000000000000000000000000,
0.03540039062500000000000000000000, -0.89843750000000000000000000000000,
0.05639648437500000000000000000000, -0.85156250000000000000000000000000,
0.09179687500000000000000000000000, -0.78125000000000000000000000000000,
0.14550781250000000000000000000000, -0.68750000000000000000000000000000,
0.09179687500000000000000000000000, -0.78125000000000000000000000000000,
0.14550781250000000000000000000000, -0.68750000000000000000000000000000,
0.22949218750000000000000000000000, -0.56250000000000000000000000000000,
0.34765625000000000000000000000000, -0.41601562500000000000000000000000,
0.22949218750000000000000000000000, -0.56250000000000000000000000000000,
0.34765625000000000000000000000000, -0.41601562500000000000000000000000,
0.50390625000000000000000000000000, -0.25976562500000000000000000000000,
0.69140625000000000000000000000000, -0.11962890625000000000000000000000,
0.50390625000000000000000000000000, -0.25976562500000000000000000000000,
0.69140625000000000000000000000000, -0.11962890625000000000000000000000,
0.86718750000000000000000000000000, -0.03076171875000000000000000000000,
1.00000000000000000000000000000000, 0.00000000000000000000000000000000,
0.86718750000000000000000000000000, -0.03076171875000000000000000000000,
1.00000000000000000000000000000000, 0.00000000000000000000000000000000,
1.00000000000000000000000000000000, 0.00000000000000000000000000000000,
0.86718750000000000000000000000000, 0.03076171875000000000000000000000,
1.00000000000000000000000000000000, 0.00000000000000000000000000000000,
0.86718750000000000000000000000000, 0.03076171875000000000000000000000,
0.69140625000000000000000000000000, 0.11962890625000000000000000000000,
0.50390625000000000000000000000000, 0.25976562500000000000000000000000,
0.69140625000000000000000000000000, 0.11962890625000000000000000000000,
0.50390625000000000000000000000000, 0.25976562500000000000000000000000,
0.34765625000000000000000000000000, 0.41601562500000000000000000000000,
0.22949218750000000000000000000000, 0.56250000000000000000000000000000,
0.34765625000000000000000000000000, 0.41601562500000000000000000000000,
0.22949218750000000000000000000000, 0.56250000000000000000000000000000,
0.14550781250000000000000000000000, 0.68750000000000000000000000000000,
0.09179687500000000000000000000000, 0.78125000000000000000000000000000,
0.14550781250000000000000000000000, 0.68750000000000000000000000000000,
0.09179687500000000000000000000000, 0.78125000000000000000000000000000,
0.05639648437500000000000000000000, 0.85156250000000000000000000000000,
0.03540039062500000000000000000000, 0.89843750000000000000000000000000,
0.05639648437500000000000000000000, 0.85156250000000000000000000000000,
0.03540039062500000000000000000000, 0.89843750000000000000000000000000,
0.02124023437500000000000000000000, 0.93359375000000000000000000000000,
0.01269531250000000000000000000000, 0.95703125000000000000000000000000,
0.02124023437500000000000000000000, 0.93359375000000000000000000000000,
0.01269531250000000000000000000000, 0.95703125000000000000000000000000,
0.00750732421875000000000000000000, 0.97265625000000000000000000000000,
0.00509643554687500000000000000000, 0.98046875000000000000000000000000,
0.00750732421875000000000000000000, 0.97265625000000000000000000000000,
0.00509643554687500000000000000000, 0.98046875000000000000000000000000,
0.00283813476562500000000000000000, 0.98828125000000000000000000000000,
0.00000000000000000000000000000000, 1.00000000000000000000000000000000,
0.00283813476562500000000000000000, 0.98828125000000000000000000000000,
0.00000000000000000000000000000000, 1.00000000000000000000000000000000,
};

float chess_storage(% chess_alignof(v32int8)) tanh_lut_cd[128] = {
0.00000000000000000000000000000000, -1.00000000000000000000000000000000,
0.00283813476562500000000000000000, -0.98828125000000000000000000000000,
0.00000000000000000000000000000000, -1.00000000000000000000000000000000,
0.00283813476562500000000000000000, -0.98828125000000000000000000000000,
0.00509643554687500000000000000000, -0.98046875000000000000000000000000,
0.00750732421875000000000000000000, -0.97265625000000000000000000000000,
0.00509643554687500000000000000000, -0.98046875000000000000000000000000,
0.00750732421875000000000000000000, -0.97265625000000000000000000000000,
0.01269531250000000000000000000000, -0.95703125000000000000000000000000,
0.02124023437500000000000000000000, -0.93359375000000000000000000000000,
0.01269531250000000000000000000000, -0.95703125000000000000000000000000,
0.02124023437500000000000000000000, -0.93359375000000000000000000000000,
0.03540039062500000000000000000000, -0.89843750000000000000000000000000,
0.05639648437500000000000000000000, -0.85156250000000000000000000000000,
0.03540039062500000000000000000000, -0.89843750000000000000000000000000,
0.05639648437500000000000000000000, -0.85156250000000000000000000000000,
0.09179687500000000000000000000000, -0.78125000000000000000000000000000,
0.14550781250000000000000000000000, -0.68750000000000000000000000000000,
0.09179687500000000000000000000000, -0.78125000000000000000000000000000,
0.14550781250000000000000000000000, -0.68750000000000000000000000000000,
0.22949218750000000000000000000000, -0.56250000000000000000000000000000,
0.34765625000000000000000000000000, -0.41601562500000000000000000000000,
0.22949218750000000000000000000000, -0.56250000000000000000000000000000,
0.34765625000000000000000000000000, -0.41601562500000000000000000000000,
0.50390625000000000000000000000000, -0.25976562500000000000000000000000,
0.69140625000000000000000000000000, -0.11962890625000000000000000000000,
0.50390625000000000000000000000000, -0.25976562500000000000000000000000,
0.69140625000000000000000000000000, -0.11962890625000000000000000000000,
0.86718750000000000000000000000000, -0.03076171875000000000000000000000,
1.00000000000000000000000000000000, 0.00000000000000000000000000000000,
0.86718750000000000000000000000000, -0.03076171875000000000000000000000,
1.00000000000000000000000000000000, 0.00000000000000000000000000000000,
1.00000000000000000000000000000000, 0.00000000000000000000000000000000,
0.86718750000000000000000000000000, 0.03076171875000000000000000000000,
1.00000000000000000000000000000000, 0.00000000000000000000000000000000,
0.86718750000000000000000000000000, 0.03076171875000000000000000000000,
0.69140625000000000000000000000000, 0.11962890625000000000000000000000,
0.50390625000000000000000000000000, 0.25976562500000000000000000000000,
0.69140625000000000000000000000000, 0.11962890625000000000000000000000,
0.50390625000000000000000000000000, 0.25976562500000000000000000000000,
0.34765625000000000000000000000000, 0.41601562500000000000000000000000,
0.22949218750000000000000000000000, 0.56250000000000000000000000000000,
0.34765625000000000000000000000000, 0.41601562500000000000000000000000,
0.22949218750000000000000000000000, 0.56250000000000000000000000000000,
0.14550781250000000000000000000000, 0.68750000000000000000000000000000,
0.09179687500000000000000000000000, 0.78125000000000000000000000000000,
0.14550781250000000000000000000000, 0.68750000000000000000000000000000,
0.09179687500000000000000000000000, 0.78125000000000000000000000000000,
0.05639648437500000000000000000000, 0.85156250000000000000000000000000,
0.03540039062500000000000000000000, 0.89843750000000000000000000000000,
0.05639648437500000000000000000000, 0.85156250000000000000000000000000,
0.03540039062500000000000000000000, 0.89843750000000000000000000000000,
0.02124023437500000000000000000000, 0.93359375000000000000000000000000,
0.01269531250000000000000000000000, 0.95703125000000000000000000000000,
0.02124023437500000000000000000000, 0.93359375000000000000000000000000,
0.01269531250000000000000000000000, 0.95703125000000000000000000000000,
0.00750732421875000000000000000000, 0.97265625000000000000000000000000,
0.00509643554687500000000000000000, 0.98046875000000000000000000000000,
0.00750732421875000000000000000000, 0.97265625000000000000000000000000,
0.00509643554687500000000000000000, 0.98046875000000000000000000000000,
0.00283813476562500000000000000000, 0.98828125000000000000000000000000,
0.00000000000000000000000000000000, 1.00000000000000000000000000000000,
0.00283813476562500000000000000000, 0.98828125000000000000000000000000,
0.00000000000000000000000000000000, 1.00000000000000000000000000000000,
};
27 changes: 27 additions & 0 deletions aie_runtime_lib/AIE2/lut_based_ops.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,4 +79,31 @@ __attribute__((always_inline)) bfloat16 getInvBf16(float x) {
inv_x = (bfloat16 *)&inv_x_val;
return *inv_x;
}

extern float tanh_lut_ab[];
extern float tanh_lut_cd[];

inline __attribute__((always_inline)) v16bfloat16
getTanhBf16(v16bfloat16 vInput) {
aie::vector<bfloat16, 16> input = vInput;

int step_bits = -2;
int bias = 16;
int data_size = 16;
int LUT_elems = 32;
int shift_offset = 0; // unused

using lut_type = aie::lut<4, float, bfloat16>;

lut_type test_lut(LUT_elems, (bfloat16 *)tanh_lut_ab,
(bfloat16 *)tanh_lut_cd);

aie::linear_approx<bfloat16, lut_type> lin_aprox(test_lut, step_bits, bias,
shift_offset);

aie::vector<bfloat16, 16> output =
lin_aprox.compute(input).to_vector<bfloat16>();

return (v16bfloat16)output;
}
#endif //__LUT_BASED_OPS_H__
Loading

0 comments on commit a8d67c5

Please sign in to comment.