diff --git a/aie_kernels/README.md b/aie_kernels/README.md new file mode 100644 index 0000000000..acc9ddcd8d --- /dev/null +++ b/aie_kernels/README.md @@ -0,0 +1,57 @@ + + +# AIE Kernels + +These kernels are provided as example building blocks for larger designs, and also as illustrations of how to write single core programs for AIEs which can then be duplicated or mixed into multi-core designs using the structural IRON API. + +In some cases, the kernels are just generic C code, and will run on any family of AI Engines with varying performance. Other kernels are then optimized for the AIE1 and AIE2 architectures. Finally, some kernels use the AIE API, which is a C++ header-only library providing types and operations that get translated into efficient low-level intrinsics, and whose documentation can be found [here](https://www.xilinx.com/htmldocs/xilinx2023_2/aiengine_api/aie_api/doc/index.html), while others use the architecture specific low-level intrinsics directly + +> **NOTE:** this set of AIE kernels are meant for demonstration along with the programming examples. The goal is not to be 100% performant, there may be room for further improvement. The kernels are provided as-is with no guarantees of support of AMD or AMD Research and Advanced Development. + +## Generic +| Class | Name | Coding style | Purpose | Datatypes | +|-|-|-|-|-| +| basic | [passThrough.cc](./generic/passThrough.cc) | AIE API | A simple memcpy operation | `uint8_t`, `int16_t`, `int32_t` | + +## AIE1 +| Name | Coding style | Purpose | +|-|-|-| + +## AIE2 +| Class | Name | Coding style | Purpose | Datatypes | +|-|-|-|-|-| +| basic | [zero.cc](../../aie_kernels/aie2/zero.cc) | AIE API | Fill a tensor with zeroes | template | +| basic | [add.cc](../../aie_kernels/aie2/add.cc) | AIE API | Pointwise addition of 2 tensors | `bfloat16` | +| basic | [mul.cc](../../aie_kernels/aie2/mul.cc) | AIE API | Pointwise multiplication of 2 tensors | `bfloat16` | +| basic | [scale.cc](../../aie_kernels/aie2/scale.cc) | AIE API | Scale all elements of a tensor with a scale factor | `int32_t` | +| basic | [bitwiseOR.cc](../../aie_kernels/aie2/bitwiseOR.cc) | AIE API | Bitwise OR of fixed point tensors | `uint8_t`,`int16_t`,`int32_t`| +| basic | [bitwiseAND.cc](../../aie_kernels/aie2/bitwiseAND.cc) | AIE API | Bitwise AND of fixed point tensors | `uint8_t`,`int16_t`,`int32_t` | +| gemm | [mm.cc](../../aie_kernels/aie2/mm.cc) | AIE API | Matrix/Matrix multiplication | `int16_t`,`bfloat16_t` | +| gemm | [mv.cc](../../aie_kernels/aie2/mv.cc) | AIE API | Matrix/Vector multiplication | `bfloat16_t` | +| | +| reduction | [reduce_add.cc](../../aie_kernels/aie2/reduce_add.cc) | Intrinsics | Find the sum of elements in a tensor | `int32 _t` | +| reduction| [reduce_max.cc](../../aie_kernels/aie2/reduce_max.cc) | Intrinsics | Find max value across a tensor | `int32 _t` | +| reduction | [reduce_min.cc](../../aie_kernels/aie2/reduce_min.cc) | Intrinsics | Find min value across a tensor | `int32 _t` | +|| +| ml | [conv2dk1_i8.cc](../../aie_kernels/aie2/conv2dk1_i8.cc) | AIE API | 1x1 Conv2D | `int8_t` | +| ml | [conv2dk1.cc](../../aie_kernels/aie2/conv2dk1.cc) | AIE API | 1x1 Conv2D with fused ReLU | `int8_t`, `uint8_t` | +| ml | [conv2dk3.cc](../../aie_kernels/aie2/conv2dk3.cc) | AIE API | 3x3 Conv2D with fused ReLU | `int8_t`, `uint8_t` | +| ml | [conv2dk1_skip.cc](../../aie_kernels/aie2/conv2dk1_skip.cc) | AIE API| 1x1 Conv2D with fused skip addition | `int8_t`, `uint8_t` | +| ml | [conv2dk1_skip_init.cc](../../aie_kernels/aie2/conv2dk1_skip_init.cc) | AIE API | 1x1 Conv2D with fused 1x1 Conv2D skip addition | `int8_t`, `uint8_t` | +| ml |[relu.cc](../../aie_kernels/aie2/relu.cc) | Intrinsics | ReLU activation function | `bfloat16_t` | +| ml | [bf16_exp.cc](../../aie_kernels/aie2/bf16_exp.cc) | AIE API | Raise all elements in a `bfloat` tensor to $e^x$ | `bfloat16_t` | +| | +| vision | [gray2rgba.cc](../../aie_kernels/aie2/gray2rgba.cc) | AIE API | Convert from grayscale to RGBA format | `uint8_t` | +| vision |[rgba2gray.cc](../../aie_kernels/aie2/rgba2gray.cc) | AIE API | Convert from RGBA format to grayscale | `uint8_t` | +| vision | [rgba2hue.cc](../../aie_kernels/aie2/rgba2hue.cc) | AIE API | Convert from RGBA to hue | `uint8_t` | +| vision | [addWeighted.cc](../../aie_kernels/aie2/addWeighted.cc) | AIE API | Fixed point weighted sum of two tensors | `uint8_t` | +| vision | [threshold.cc](../../aie_kernels/aie2/threshold.cc) | AIE API | Clipping | `uint8_t` | +| vision | [filter2d.cc](../../aie_kernels/aie2/filter2d.cc) | AIE API | Fixed point 2D image processing filter | `uint8_t` | diff --git a/programming_examples/ml/eltwise_add/Makefile b/programming_examples/ml/eltwise_add/Makefile index 294fc902d8..38b1eef61f 100644 --- a/programming_examples/ml/eltwise_add/Makefile +++ b/programming_examples/ml/eltwise_add/Makefile @@ -13,7 +13,6 @@ all: build/final.xclbin targetname = myEltwiseAdd trace_size = 8192 - VPATH := ../../../aie_kernels/aie2 build/%.o: %.cc @@ -29,6 +28,10 @@ build/aie_trace.mlir: aie2.py python3 $< ${trace_size} > $@ +build/aie_trace.mlir: aie2.py + mkdir -p ${@D} + python3 $< ${trace_size} > $@ + build/final.xclbin: build/aie.mlir build/add.o mkdir -p ${@D} cd ${@D} && aiecc.py --aie-generate-cdo --aie-generate-ipu --no-compile-host \ diff --git a/programming_examples/ml/eltwise_mul/aie2.py b/programming_examples/ml/eltwise_mul/aie2.py index b18994e180..7a0a0670a9 100644 --- a/programming_examples/ml/eltwise_mul/aie2.py +++ b/programming_examples/ml/eltwise_mul/aie2.py @@ -160,6 +160,7 @@ def sequence(A, B, C): trace_size = 0 if (len(sys.argv) < 2) else int(sys.argv[1]) except ValueError: print("Argument is not an integer") + with mlir_mod_ctx() as ctx: my_eltwise_mul(trace_size) res = ctx.module.operation.verify() diff --git a/programming_examples/ml/relu/aie2.py b/programming_examples/ml/relu/aie2.py index bb7d1e16d9..0580a0113d 100644 --- a/programming_examples/ml/relu/aie2.py +++ b/programming_examples/ml/relu/aie2.py @@ -131,6 +131,7 @@ def sequence(A, C): trace_size = 0 if (len(sys.argv) != 2) else int(sys.argv[1]) except ValueError: print("Argument is not an integer") + with mlir_mod_ctx() as ctx: my_relu(trace_size) res = ctx.module.operation.verify()