-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move aiecc.py implementation to library #387
Conversation
This also adds a |
@@ -102,37 +102,30 @@ add_subdirectory(docs) | |||
add_dependencies(docs mlir-doc) | |||
|
|||
# python install directory | |||
if (AIE_ENABLE_BINDINGS_PYTHON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little nervous about python being a hard dependency here. MLIR practice has been for that to be optional, if only because getting it to build has lots of pitfalls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one advantage of requiring python bindings is that aiecc.py could use them to walk the IR, rather than relying on aie-translate in a few places to extract information.
Status of this PR? |
2540e86
to
f8d57b6
Compare
* install target.h for the memory allocator as well (Xilinx#606) This is redundant, but reflects the true dependencies better. * Fix path used for tests Peano should come before the regular vitis path, due to name collisions. * Add basic AIE2 tests. * [AIE] Add decoding of DMA status The test library now does this for both AIE1 and AIE2 DMAs. * [AIE] Add packet stream tests for ShimDMAs This test looks at a common scenario where we have 3 tensors input to a tile from 3 independent DMAs, but only 2 receiving tile DMAs. Using packet routing, this scenario can be accommodated by time-sharing one of the destination DMAs. Obsoletes Xilinx#85 * Fix error message if no device. We need to return with an error message, or later code will segfault. * Put exp lookup table into run_time_lib/AIE2 (Xilinx#604) * Update chess_intrinsic_wrapper.cpp (Xilinx#610) Remove event intrinsic declarations from AIEv1 wrapper * Add TOSA tensor broadcast and mixed precision tests (Xilinx#609) * Add the following TOSA integration tests to test/Integration/Dialect/TOSA/ * List of PASS tests: i16xi16_add_elem (lane=32) i16xi16_mul_elem (lane=32) i16xi16_sel (lane=32) i16xi16_sub_elem (lane=32) i8xi8_add_elem (lane=64) i8xi8_mul_elem (lane=32) i8xi8_sel (lane=64) i8xi8_sub_elem (lane=64) bf16xbf16_sub_elem_2d_broadcast_1d (lane=16) bf16xbf16_sub_elem_2d_broadcast_1d_reshape (lane=16) * List of XFAIL tests: i8xi16_sub_elem (lane=32) bf16xbf16_sub_elem_2d_broadcast_2d (lane=16) bf16xbf16_sub_elem_2d_broadcast_1d_unit_dim (lane=16) * Fix include order (Xilinx#613) * [aievec] Add hoisting patterns for arith.extsi Hoisting cast operations as close as possible to the source of data can make later patterns more robust to typical variations in the source code. We might need to revisit this one if, in the future, this process causes unintended consequences. * Implement inverse of a float by lookup tables (Xilinx#612) * Fix some test failures (Xilinx#614) * Move aiecc.py implementation to python library (Xilinx#387) * Use correct macros for C API (Xilinx#615) * capi * reformat * Re-export MLIR target set in CMake (Xilinx#617) * mlirconfig * reformat * Disable `-Wno-unknown-warning-option` on windows (Xilinx#620) * unknownwarning * reformat * win32 (Xilinx#618) * Use new policy CMP0091 for MSVC (Xilinx#619) * msvc * reformat * Revert "Use new policy CMP0091 for MSVC (Xilinx#619)" (Xilinx#622) This reverts commit 1520898. * Use upstream CMake macros to find python (Xilinx#616) * cmake * reformat * Bump cmakeModules (Xilinx#624) * ObjFifo unroll dependency fixes (Xilinx#621) * Fixes for objFifo unrolling algorithm. * EOF * clang-format (Xilinx#625) * Use target model functions to get number of DMA channels. * Clang format * Fix function call * Add shim tiles that are not in noc columns in the getNumDestShimMuxConnections() functions. * Add isShimNOCorPLTile () to the target model. * Add missing target model. * Add improvements to the doc * Add isShimNOCorPLTile() virtual function. * Clang format --------- Co-authored-by: abisca <[email protected]> Co-authored-by: Joseph Melber <[email protected]> * Add missing function implementation in IPU target model. * Fix aiecc configure path * Revert path change. * Update paths in xclbin generation. * Typo * Fixed aie unit tests. * Fix aievec test. * xfail failing test. Co-authored-by: Stephen Neuendorffer <[email protected]> Co-authored-by: Hanchen Ye <[email protected]> Co-authored-by: Lina Yu <[email protected]> Co-authored-by: Jeff Fifield <[email protected]> Co-authored-by: James Lin <[email protected]> Co-authored-by: Javier Setoain <[email protected]> Co-authored-by: Maksim Levental <[email protected]> Co-authored-by: Andra Bisca <[email protected]> Co-authored-by: Joseph Melber <[email protected]>
* install target.h for the memory allocator as well (Xilinx#606) This is redundant, but reflects the true dependencies better. * Fix path used for tests Peano should come before the regular vitis path, due to name collisions. * Add basic AIE2 tests. * [AIE] Add decoding of DMA status The test library now does this for both AIE1 and AIE2 DMAs. * [AIE] Add packet stream tests for ShimDMAs This test looks at a common scenario where we have 3 tensors input to a tile from 3 independent DMAs, but only 2 receiving tile DMAs. Using packet routing, this scenario can be accommodated by time-sharing one of the destination DMAs. Obsoletes Xilinx#85 * Fix error message if no device. We need to return with an error message, or later code will segfault. * Put exp lookup table into run_time_lib/AIE2 (Xilinx#604) * Update chess_intrinsic_wrapper.cpp (Xilinx#610) Remove event intrinsic declarations from AIEv1 wrapper * Add TOSA tensor broadcast and mixed precision tests (Xilinx#609) * Add the following TOSA integration tests to test/Integration/Dialect/TOSA/ * List of PASS tests: i16xi16_add_elem (lane=32) i16xi16_mul_elem (lane=32) i16xi16_sel (lane=32) i16xi16_sub_elem (lane=32) i8xi8_add_elem (lane=64) i8xi8_mul_elem (lane=32) i8xi8_sel (lane=64) i8xi8_sub_elem (lane=64) bf16xbf16_sub_elem_2d_broadcast_1d (lane=16) bf16xbf16_sub_elem_2d_broadcast_1d_reshape (lane=16) * List of XFAIL tests: i8xi16_sub_elem (lane=32) bf16xbf16_sub_elem_2d_broadcast_2d (lane=16) bf16xbf16_sub_elem_2d_broadcast_1d_unit_dim (lane=16) * Fix include order (Xilinx#613) * [aievec] Add hoisting patterns for arith.extsi Hoisting cast operations as close as possible to the source of data can make later patterns more robust to typical variations in the source code. We might need to revisit this one if, in the future, this process causes unintended consequences. * Implement inverse of a float by lookup tables (Xilinx#612) * Fix some test failures (Xilinx#614) * Move aiecc.py implementation to python library (Xilinx#387) * Use correct macros for C API (Xilinx#615) * capi * reformat * Re-export MLIR target set in CMake (Xilinx#617) * mlirconfig * reformat * Disable `-Wno-unknown-warning-option` on windows (Xilinx#620) * unknownwarning * reformat * win32 (Xilinx#618) * Use new policy CMP0091 for MSVC (Xilinx#619) * msvc * reformat * Revert "Use new policy CMP0091 for MSVC (Xilinx#619)" (Xilinx#622) This reverts commit 1520898. * Use upstream CMake macros to find python (Xilinx#616) * cmake * reformat * Bump cmakeModules (Xilinx#624) * ObjFifo unroll dependency fixes (Xilinx#621) * Fixes for objFifo unrolling algorithm. * EOF * clang-format (Xilinx#625) * Use target model functions to get number of DMA channels. * Clang format * Fix function call * Add shim tiles that are not in noc columns in the getNumDestShimMuxConnections() functions. * Add isShimNOCorPLTile () to the target model. * Add missing target model. * Add improvements to the doc * Add isShimNOCorPLTile() virtual function. * Clang format --------- Co-authored-by: abisca <[email protected]> Co-authored-by: Joseph Melber <[email protected]> * need ONLY option to make cmake find numpy (Xilinx#630) * Split ccache database according to the parallel jobs (Xilinx#600) This fixes a race condition in the ccache database writing happening at the end of each job running in parallel. By using a unique key per job, each database is correctly written and can be used for a next CI run. Use also the real LLVM commit hash used by ccache database key in CI instead of previous hack assuming the textual commit was present inside utils/clone-llvm.sh. * Fix TOSA broadcast and mixed precision tests (Xilinx#631) Fix the following TOSA tests: - bf16xbf16_sub_elem_2d_broadcast_2d - i8xi16_sub_elem Add the following new TOSA tests: - i16xi16_sub_elem_2d_broadcast_scalar (pass) - i16xi16_sub_elem_2d_broadcast_1d_unit_dim (pass) - bf16xbf16_sub_elem_2d_broadcast_scalar (xfail) * Fix ordering of putStream intrinsic. The argument order for the intrinsic didn't match. Argument 0: channel # Argument 1: value * Fix decoding of tile status for stream stalls. These were just non-sensical. * Add end-to-end tests for CPU stream access. * Fix intrinsic wrapper for aie2 acquire/release * Explictly compile intrinsic-dependent code with chess frontend. * [tests] Remove address. This address is ignored, resulting in a warning. * catch up to TOM MLIR (Xilinx#590) * catch up to llvm TOM * Update VectorToAIEVecConversions.cpp * Get `VectorType` instead of `Type` * format * xfail opaque pointer related tests and update test * update finalize-memref-to-llvm --------- Co-authored-by: Javier Setoain <[email protected]> * Add softmax test cases (Xilinx#635) * Revised the xchess compilation commands for lut test cases. (Xilinx#636) * Add more combined precision tosa tests (Xilinx#637) Add the following passing element-wise tosa tests: - i32xi32_add_elem (lane=32) - i32xi32_mul_elem (lane=16) - i32xi32_sel (lane=16) - i32xi32_sub_elem (lane=32) Add the following passing combined precision element-wise tosa tests: - i8xi16_add_elem (lane=32) - i8xi16_sub_elem (lane=32) - i8xi32_add_elem (lane=32) - i8xi32_sub_elem (lane=32) - i16xi32_add_elem_v16 (lane=16) - i16xi32_sub_elem_v16 (lane=16) Add the following XFAIL combined precision element-wise tosa tests: - i16xi32_add_elem_v32 (lane=32) - i16xi32_sub_elem_v32 (lane=32) * [aievec] Generalize vector passes Right now, vectorization passes hook to FuncOp, which prevents conversion to AIEVec within other top level operations, like AIE.device ops. This patch makes all passes generic and allows for conversion within AIE.device. * Implement tanh(x) based on linear approximation lookup tables (Xilinx#639) * Refactor conversion of aievec.mul_elem to support combined precision (Xilinx#643) * Refactor AIE-ML acc datatype emission * Refactor arith.muli/mulf to aievec.mul_elem conversion pattern to make it extensible and clean - Reorganize the existing case-by-case patterns and decouple the pattern that requires two inputs to be the same type - Make it a cleaner pattern considering lhs/rhs/out datatype - Verified that all the dut.cc are identical before/after the refactor * Add convertValueToTargetTypeAieML() which can be helpful for handling the vector lane mismatch issue later on. * Add CPP emission for aievec.unpack op * Add VectorToAIEVec lit tests to cover the lowering patterns * Add new combined precision tosa tests for element-wise multiply: - i8xi16_mul_elem_v32 (out=i32, lane=32) (cycle count=144, PM=272), PASS - i8xi16_mul_elem_v16 (out=i32, lane=16) (cycle count=792, PM=368), XFAIL - No intent to work on this at the moment, but keep a record there - i16xi32_mul_elem (out=i32, lane=16) (cycle count=408, PM=384), PASS - i8xi32_mul_elem (out=i32, lane=16) (cycle count=728, PM=368), PASS * Compute memref sizes by multiplying all shape sizes. (Xilinx#641) Co-authored-by: abisca <[email protected]> * [aievec][nfc] Clean-up aievec to llvm conversion This code needed updating its use of a couple of constructs, and namespaces. * Add tosa-to-tensor pass to fix regression (Xilinx#645) * Add tosa-to-tensor pass to fix regression of tosa broadcast tests * Convert math.sqrt to a function call getSqrtBf16() for v16bfloat16 and v32bfloat16 types (Xilinx#646) * Add comments for sqrt.h (Xilinx#648) * Adding more tosa tests for combined precision inputs and broadcast (Xilinx#650) * Add floatxfloat_sub_elem tosa test * Add floatxfloat_add_elem tosa test * Add floatxfloat_sel tosa test * Add bf16xfloat_sub_elem tosa test * Add bf16xfloat_add_elem tosa test * Add i16xi16_sub_elem broadcast tests * Add i8xi8_sub_elem broadcast tests * Reorganize bf16xbf16 broadcast tosa tests * Add floatxfloat_sub_elem broadcast tests * Fix tosa lowering pipeline for bf16xbf16 sub_elem broadcast tests * [aievec] Add missing conversion warnings for mac_elem and broadcast This patch is a first step towards enabling AIEVec to LLVM Dialect conversion for AIEml intrinsics. * Add support of broadcast with vector width = 256 or 1024 and fix TOSA tests (Xilinx#653) *Add support of broadcast_elem/broadcast_to_vxx for vector width == 256 (e.g. v16bf16) or 1024 (e.g. v32int32). *Since we lower vector.broadcast op to multiple aievec ops, we have to fix FoldMulAddChainToConv pass to recognize the new aievec.broadcast patterns *Add the following list of PASS tests for implicit broadcast: i32xi32_sub_elem_16x1024_broadcast_1 i32xi32_sub_elem_2d_broadcast_1d_unit_dim_v16 (out=i32, lane=16) i32xi32_sub_elem_2d_broadcast_1d_unit_dim_v32 (out=i32, lane=32) i32xi32_sub_elem_2d_broadcast_scalar_v16 (out=i32, lane=16) i32xi32_sub_elem_2d_broadcast_scalar_v32 (out=i32, lane=32) i32xi32_sub_elem_16x1024_broadcast_1024 i32xi32_sub_elem_2d_broadcast_1d_reshape_v16 (out=i32, lane=16) i32xi32_sub_elem_2d_broadcast_1d_reshape_v32 (out=i32, lane=32) i32xi32_sub_elem_2d_broadcast_1d_v16 (out=i32, lane=16) i32xi32_sub_elem_2d_broadcast_1d_v32 (out=i32, lane=32) i32xi32_sub_elem_2d_broadcast_2d_v16 (out=i32, lane=16) i32xi32_sub_elem_2d_broadcast_2d_v32 (out=i32, lane=32) *Add dut.cc reference for bf16xbf16_sub_elem_16x1024_broadcast_1 tests. The resulting dut.cc is legal, but it's blocked by "broadcast_elem() of v32bfloat16" bug. Hence, the tests are still marked XFAIL. *Add conversion test coverage for aievec.broadcast and aievec.broadcast_scalar in test_broadcast.mlir *Fix i8xi16_mul_elem_v32 mlir script * Convert tosa.erf and math.erf to a function call getErfBf16() for v16bfloat16 and v32bfloat16 types (Xilinx#652) * Enable use of mlir pass manager in aiecc (Xilinx#628) * Enable use of mlir pass manager in aiecc * clang-format * limit scope of mlir context, rebase * fixup * Revert "catch up to TOM MLIR (Xilinx#590)" (Xilinx#656) This reverts commit 47ff7d3. * Make pathfinder aware of the arch-specific routing constraints (Xilinx#657) * Convert math.rsqrt to a function call getRsqrtBf16() for v16bfloat16 and v32bfloat16 types and reorganize files in aie_runtime_lib (Xilinx#655) * Add more add/sub/mul mixed precision tests (Xilinx#659) * Refactor the tosa-to-vector pipelines script in each test to a central place at test/Integration/lit.local.cfg for better maintainability. Also, make sure each .mlir test is running in a unique workdir for placing multiple .mlir test in a single directory. * For add/sub/mul mixed precision tests, we add tests with swapped inputs * Per the TOSA spec at https://www.mlplatform.org/tosa/tosa_spec.html#_mul, we add test coverage for i16xi16_mul_elem_i32, and i8xi8_mul_elem_i32. Our refactored mul_elem lowering pattern works on these two cases directly, and the acctype for the i8/i16 mac intrinsics we used is i32. * Enable AIEX dialect bindings (Xilinx#658) * Enable AIEX dialect bindings * Replace 'Aie' prefix with 'AIE' in python cmake * Fixs after merge * Apply xca_udm_dbg workaround to new tests * Change runner to xsj * XFAIL some tests after merge Co-authored-by: Stephen Neuendorffer <[email protected]> Co-authored-by: Hanchen Ye <[email protected]> Co-authored-by: Lina Yu <[email protected]> Co-authored-by: James Lin <[email protected]> Co-authored-by: Javier Setoain <[email protected]> Co-authored-by: Maksim Levental <[email protected]> Co-authored-by: Andra Bisca <[email protected]> Co-authored-by: abisca <[email protected]> Co-authored-by: Joseph Melber <[email protected]> Co-authored-by: Kristof Denolf <[email protected]> Co-authored-by: Ronan Keryell <[email protected]> Co-authored-by: Javier Setoain <[email protected]> Co-authored-by: erwei-xilinx <[email protected]>
This allows access to aiecc from other python. It also removes the
AIE_ENABLE_BINDINGS_PYTHON
flag (it's always on).for example,