Skip to content

Commit

Permalink
[MERGE] Merge
Browse files Browse the repository at this point in the history
  • Loading branch information
pjr committed Apr 22, 2024
2 parents 23e5f92 + 0a0ff6c commit 69d63ef
Show file tree
Hide file tree
Showing 15 changed files with 91 additions and 52 deletions.
25 changes: 14 additions & 11 deletions docs/conferenceDescriptions/asplos24TutorialDescription.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,19 @@ Prerequisite: please bring your laptop, so that you can ssh into our Ryzen AI en

| Time | Topic | Presenter | Slides or Code |
|------|-------|-----------|----------------|
| 08:30am | Intro to spatial compute and explicit data movement | Kristof | tbd |
| 08:45am | "Hello World" from Ryzen AI | Jack | tbd |
| 09:00am | Data movement on Ryzen AI with objectFIFOs | Joe | tbd |
| 09:30am | Exersise 1: Build and run your first program | All | tbd |
| 09:45am | Exersise 2: Vector-scalar | All |tbd |
| 08:30am | Intro to spatial compute and explicit data movement | Kristof | [Programming Guide](../../programming_guide/) |
| 08:45am | "Hello World" from Ryzen AI | Joe | [AI Engine Basic Building Blocks](../../programming_guide/section-1/) |
| 09:00am | Data movement on Ryzen AI with objectFIFOs | Joe | [Data Movement](../../programming_guide/section-2/) |
| 09:30am | Your First Program | Kristof | [My First Program](../../programming_guide/section-3) |
| 09:50am | Exercise 1: Build and run your first program | All | [Passthrough](../../programming_examples/basic/passthrough_kernel/) |
| 10:00am | Break | | |
| 11:00am | Tracing and performance analysis | Jack | tbd |
| 11:10am | Exercise 3: Tracing vector-scalar | All | tbd |
| 11:30am | Vectorizing on AIE | Kristof | tbd |
| 11:40am | Exercise 4: Vectorized vector-scalar | All | tbd |
| 12:00pm | Dataflow and larger designs | Joe | tbd |
| 12:15pm | Exercises | All | |
| 10:30am | Exercise 2: Vector-Scalar Mul | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
| 10:40am | Tracing and performance analysis | Jack | [Timers](../../programming_guide/section-4/section-4a/) and [Tracing](../../programming_guide/section-4/section-4b/) |
| 11:10am | Exercise 3: Tracing vector-scalar | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
| 11:30am | Vectorizing on AIE | Jack | [Kernel Vectorization](../../programming_guide/section-4/section-4c/) |
| 11:40am | Exercise 4: Vectorized vector-scalar | All | [Vector Scalar Mul](../../programming_examples/basic/vector_scalar_mul/) |
| 12:00pm | Dataflow and larger designs | Joe | [Example Vector Designs](../../programming_guide/section-5/) and [Large Example Designs](../../programming_guide/section-6/) |
| 12:15pm | Exercises | All | [Programming Examples](../../programming_examples/) |
| 12:30pm | Close Tutorial | All | |


Expand All @@ -46,3 +47,5 @@ Prerequisite: please bring your laptop, so that you can ssh into our Ryzen AI en
*Kristof Denolf* is a Fellow in AMD's Research and Advanced Development group where he is working on energy efficient computer vision and video processing applications to shape future AMD devices. He earned a M.Eng. in electronics from the Katholieke Hogeschool Brugge-Oostende (1998), now part of KULeuven, a M.Sc. in electronic system design from Leeds Beckett University (2000) and a Ph.D. from the Technical University Eindhoven (2007). He has over 25 years of combined research and industry experience at IMEC, Philips, Barco, Apple, Xilinx and AMD. His main research interest are all aspects of the cost-efficient and dataflow oriented design of video, vision and graphics systems.

*Phil James-Roxby* is a Senior Fellow in AMD’s Research and Advanced Development group, working on compilers and runtimes to support current and future AMD devices, particularly in the domain on AI processing. In the past, he has been responsible for a number of software enablement activities for hardware devices, including SDNet and SDAccel at Xilinx, and the original development environement for the AI Engines. He holds a PhD from the University of Manchester on hardware acceleration of embedded machine learning applications, and his main research interest continues to be how to enable users to efficiently use diverse hardware in heterogenous systems.

*Samuel Bayliss* is a Fellow in the Research and Advanced Development group at AMD. His academic experience includes formative study at Imperial College London, for which he earned MEng and PhD degrees in 2006 and 2012 respectively. He is energized by his current work in advancing compiler tooling using MLIR, developing programming abstractions for parallel compute and evolving hardware architectures for efficient machine learning.
12 changes: 6 additions & 6 deletions programming_examples/basic/vector_exp/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@ include ../../makefile-common

targetname = testExp

KERNELLIB = ${REPO_ROOT}/aie_kernels/aie2

all: build/final.xclbin build/insts.txt

build/exp.o: ${KERNELLIB}/bf16_exp.cc
VPATH := ../../../aie_kernels/aie2

build/exp.o: bf16_exp.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -I../../../../aie_runtime_lib/AIE2 -c $< -o ${@F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -I../../../../aie_runtime_lib/AIE2 -c $(<:%=../%) -o ${@F}

build/lut_based_ops.o:
build/lut_based_ops.o: ../../../aie_runtime_lib/AIE2/lut_based_ops.cpp
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -I. -c ../../../../aie_runtime_lib/AIE2/lut_based_ops.cpp -o ${@F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -I. -c $(<:%=../%) -o ${@F}

build/kernels.a: build/exp.o build/lut_based_ops.o
ar rvs $@ $+
Expand Down
9 changes: 5 additions & 4 deletions programming_examples/basic/vector_reduce_add/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,20 @@ targetname = reduce_add
devicename = ipu
col = 0
CHESS_FLAGS=${CHESSCCWRAP2_FLAGS}
KERNEL_LIB=../../../aie_kernels/aie2/

all: build/final.xclbin build/insts.txt

build/reduce_add.cc.o: ${KERNEL_LIB}/reduce_add.cc
VPATH := ../../../aie_kernels/aie2

build/%.o: %.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESS_FLAGS} -c $(<:%=../%) -o ${@F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $(<:%=../%) -o ${@F}

build/aie.mlir: aie2.py
mkdir -p ${@D}
python3 $< ${devicename} ${col} > $@

build/final.xclbin: build/aie.mlir build/reduce_add.cc.o
build/final.xclbin: build/aie.mlir build/reduce_add.o
mkdir -p ${@D}
cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
--aie-generate-ipu --ipu-insts-name=insts.txt $(<:%=../%)
Expand Down
2 changes: 1 addition & 1 deletion programming_examples/basic/vector_reduce_add/run.lit
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
//
// REQUIRES: ryzen_ai
//
// RUN: xchesscc_wrapper aie2 -I %aietools/include -c %S/../../../aie_kernels/aie2/reduce_add.cc -o reduce_add.cc.o
// RUN: xchesscc_wrapper aie2 -I %aietools/include -c %S/../../../aie_kernels/aie2/reduce_add.cc -o reduce_add.o
// RUN: %python %S/aie2.py ipu 0 | aie-opt -cse -canonicalize -o ./aie.mlir
// RUN: %python aiecc.py --aie-generate-cdo --aie-generate-ipu --no-compile-host --xclbin-name=aie.xclbin --ipu-insts-name=insts.txt ./aie.mlir
// RUN: g++ %S/test.cpp -o test.exe -std=c++23 -Wall -I%S/../../../runtime_lib/test_lib %S/../../../runtime_lib/test_lib/test_utils.cpp %xrt_flags -lrt -lstdc++ -lboost_program_options -lboost_filesystem
Expand Down
9 changes: 5 additions & 4 deletions programming_examples/basic/vector_reduce_max/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,20 @@ targetname = reduce_max
devicename = ipu
col = 0
CHESS_FLAGS=${CHESSCCWRAP2_FLAGS}
KERNEL_LIB=../../../aie_kernels/aie2

all: build/final.xclbin build/insts.txt

build/reduce_max.cc.o: ${KERNEL_LIB}/reduce_max.cc
VPATH := ../../../aie_kernels/aie2

build/%.o: %.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESS_FLAGS} -c $(<:%=../%) -o ${@F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $(<:%=../%) -o ${@F}

build/aie.mlir: aie2.py
mkdir -p ${@D}
python3 $< ${devicename} ${col} > $@

build/final.xclbin: build/aie.mlir build/reduce_max.cc.o
build/final.xclbin: build/aie.mlir build/reduce_max.o
mkdir -p ${@D}
cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
--aie-generate-ipu --ipu-insts-name=insts.txt $(<:%=../%)
Expand Down
2 changes: 1 addition & 1 deletion programming_examples/basic/vector_reduce_max/run.lit
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
//
// REQUIRES: ryzen_ai, chess
//
// RUN: xchesscc_wrapper aie2 -I %aietools/include -c %S/../../../aie_kernels/aie2/reduce_max.cc -o reduce_max.cc.o
// RUN: xchesscc_wrapper aie2 -I %aietools/include -c %S/../../../aie_kernels/aie2/reduce_max.cc -o reduce_max.o
// RUN: %python %S/aie2.py ipu 0 | aie-opt -cse -canonicalize -o ./aie.mlir
// RUN: %python aiecc.py --aie-generate-cdo --aie-generate-ipu --no-compile-host --xclbin-name=aie.xclbin --ipu-insts-name=insts.txt ./aie.mlir
// RUN: g++ %S/test.cpp -o test.exe -std=c++23 -Wall -I%S/../../../runtime_lib/test_lib %S/../../../runtime_lib/test_lib/test_utils.cpp %xrt_flags -lrt -lstdc++ -lboost_program_options -lboost_filesystem
Expand Down
9 changes: 5 additions & 4 deletions programming_examples/basic/vector_reduce_min/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,20 @@ targetname = reduce_min
devicename = ipu
col = 0
CHESS_FLAGS=${CHESSCCWRAP2_FLAGS}
KERNEL_LIB=../../../aie_kernels/aie2

all: build/final.xclbin build/insts.txt

build/reduce_min.cc.o: ${KERNEL_LIB}/reduce_min.cc
VPATH := ../../../aie_kernels/aie2

build/%.o: %.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESS_FLAGS} -c $(<:%=../%) -o ${@F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $(<:%=../%) -o ${@F}

build/aie.mlir: aie2.py
mkdir -p ${@D}
python3 $< ${devicename} ${col} > $@

build/final.xclbin: build/aie.mlir build/reduce_min.cc.o
build/final.xclbin: build/aie.mlir build/reduce_min.o
mkdir -p ${@D}
cd ${@D} && aiecc.py --aie-generate-cdo --no-compile-host --xclbin-name=${@F} \
--aie-generate-ipu --ipu-insts-name=insts.txt $(<:%=../%)
Expand Down
2 changes: 1 addition & 1 deletion programming_examples/basic/vector_reduce_min/run.lit
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
//
// REQUIRES: ryzen_ai, chess
//
// RUN: xchesscc_wrapper aie2 -I %aietools/include -c %S/../../../aie_kernels/aie2/reduce_min.cc -o reduce_min.cc.o
// RUN: xchesscc_wrapper aie2 -I %aietools/include -c %S/../../../aie_kernels/aie2/reduce_min.cc -o reduce_min.o
// RUN: %python %S/aie2.py ipu 0 | aie-opt -cse -canonicalize -o ./aie.mlir
// RUN: %python aiecc.py --aie-generate-cdo --aie-generate-ipu --no-compile-host --xclbin-name=aie.xclbin --ipu-insts-name=insts.txt ./aie.mlir
// RUN: g++ %S/test.cpp -o test.exe -std=c++23 -Wall -I%S/../../../runtime_lib/test_lib %S/../../../runtime_lib/test_lib/test_utils.cpp %xrt_flags -lrt -lstdc++ -lboost_program_options -lboost_filesystem
Expand Down
4 changes: 3 additions & 1 deletion programming_examples/ml/conv2d/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ include ../../makefile-common

mlirFileName = aieWithTrace_1core

VPATH := ../../../aie_kernels/aie2

all: build/conv2dk1_i8.o build/final.xclbin


Expand All @@ -20,7 +22,7 @@ build/${mlirFileName}.mlir: aie2.py
insts.txt: build/${mlirFileName}.mlir
aiecc.py -v --aie-only-generate-ipu --ipu-insts-name=$@ $<

build/conv2dk1_i8.o: ../../../aie_kernels/aie2/conv2dk1_i8.cc
build/conv2dk1_i8.o: conv2dk1_i8.cc
xchesscc -d ${CHESSCC2_FLAGS} -DINT8_ACT -c $< -o $@

build/final.xclbin: build/${mlirFileName}.mlir
Expand Down
4 changes: 3 additions & 1 deletion programming_examples/ml/conv2d_fused_relu/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ include ../../makefile-common

mlirFileName = aieWithTrace_1core

VPATH := ../../../aie_kernels/aie2

all: build/conv2dk1.o build/final.xclbin

build/${mlirFileName}.mlir: aie2.py
Expand All @@ -19,7 +21,7 @@ build/${mlirFileName}.mlir: aie2.py
insts.txt: build/${mlirFileName}.mlir
aiecc.py -v --aie-only-generate-ipu --ipu-insts-name=$@ $<

build/conv2dk1.o: ../../../aie_kernels/aie2/conv2dk1.cc
build/conv2dk1.o: conv2dk1.cc
xchesscc -d ${CHESSCC2_FLAGS} -DINT8_ACT -c $< -o $@

build/final.xclbin: build/${mlirFileName}.mlir
Expand Down
6 changes: 4 additions & 2 deletions programming_examples/ml/eltwise_add/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@ targetname = myEltwiseAdd
trace_size = 8192


build/add.o:
VPATH := ../../../aie_kernels/aie2

build/%.o: %.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -I. -c ../../../../aie_kernels/aie2/add.cc -o ${@F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $(<:%=../%) -o ${@F}

build/aie.mlir: aie2.py
mkdir -p ${@D}
Expand Down
6 changes: 4 additions & 2 deletions programming_examples/ml/eltwise_mul/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ all: build/final.xclbin
targetname = myEltwiseMul
trace_size = 8192

build/mul.o:
VPATH := ../../../aie_kernels/aie2

build/%.o: %.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -I. -c ../../../../aie_kernels/aie2/mul.cc -o ${@F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $(<:%=../%) -o ${@F}

build/aie.mlir: aie2.py
mkdir -p ${@D}
Expand Down
6 changes: 4 additions & 2 deletions programming_examples/ml/relu/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,11 @@ all: build/final.xclbin
targetname = myReLU
trace_size = 8192

build/relu.o:
VPATH := ../../../aie_kernels/aie2

build/%.o: %.cc
mkdir -p ${@D}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -I. -c ../../../../aie_kernels/aie2/relu.cc -o ${@F}
cd ${@D} && xchesscc_wrapper ${CHESSCCWRAP2_FLAGS} -c $(<:%=../%) -o ${@F}

build/aie.mlir: aie2.py
mkdir -p ${@D}
Expand Down
21 changes: 11 additions & 10 deletions programming_examples/utils/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@

These utilities are helpful in the current programming examples context and include helpful C/C++ libraries, and python and shell scripts.

- [Open CV Utilities](#Open-CV-Utilities-(OpenCVUtils.h)) ([OpenCVUtils.h](./OpenCVUtils.h))
- [Clean microcode shell script](#Clean-microcode-shell-script) ([clean_microcode.sh](./clean_microcode.sh))
- [Trace parser - eventIR based](#Trace-parser---eventIR-based-(parse_eventIR.py)) ([parse_eventIR.py](./parse_eventIR.py))
- [Trace parser, custom](#Trace-parser,-custom) ([parse_trace.py](./parse_trace.py))
- [Open CV Utilities](#open-cv-utilities-opencvutilsh) ([OpenCVUtils.h](./OpenCVUtils.h))
- [Clean microcode shell script](#clean-microcode-shell-script-clean_microcodesh) ([clean_microcode.sh](./clean_microcode.sh))
- [Trace parser - eventIR based](#trace-parser---eventir-based-parse_eventirpy) ([parse_eventIR.py](./parse_eventIR.py))
- [Trace parser, custom](#trace-parser-custom-parse_tracepy) ([parse_trace.py](./parse_trace.py))

## <u>Open CV Utilities ([OpenCVUtils.h](./OpenCVUtils.h))</u>
OpenCV utilities used in vision processing pipelines to help read and/or initialize images and video. Currently supported functions include the following. Please view header for more specific function information.
Expand All @@ -43,12 +43,13 @@ parse_eventIR.py --filename trace.txt --mlir build/aie_trace.mlir --colshift 1 >
* **--colshift** : runtime column shift. This specifies how much the actual design was shifted from the default position when it was scheduled and called. The reason we need this is becuase even if our design is configured for column 0, the actual loading and execution of the design may place it in column 1, 2, 3 etc. We account for this shift since the parser needs to match the actual column location of the generated trace data. Usually 1 is the right value. **NOTE** - the underlying tools currently default to column 1 to avoid using column 0 on Ryzen AI since that column does not have a shimDMA and is therefore avoided at the moment.

The parse script create a temporary directory `tmpTrace` performs the following steps within that folder:
1. Fixes raw trace data
1. Parse MLIR to build event table
1. Create .target file
1. Create config.json
1. Run Vitis/aietools hwfrontend utility to parse raw trace data --> generates eventIR.txt
1. Convert eventIR.txt to perfetto_compatible.json
1. [Fixes raw trace data](#1-fixes-raw-trace-data)
1. [Parse MLIR to build event table](#2-parse-mlir-to-build-event-table)
1. [Create .target file](#3-create-target-file)
1. [Create config.json](#4-create-configjson)
1. [Run Vitis/aietools hwfrontend utility to parse raw trace data --> generates eventIR.txt](#5-run-vitisaietools-hwfrontend-utility-to-parse-raw-trace-data----generates-eventirtxt)
1. [Convert eventIR.txt to perfetto_compatible.json](#6-convert-eventirtxt-to-perfetto_compatiblejson)
* [Additional Tips](#tips)

### <u>1. Fixes raw trace data</u>
We prepend `0x` before each hex line and save it `prep.<trace file>` since the `hwfrontend` utility expects it.
Expand Down
26 changes: 24 additions & 2 deletions programming_guide/quick_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@

# <ins>IRON Quick Reference</ins>

* [Python Bindings](#python-bindings)
* [Python Helper Functions](#python-helper-functions)
* [Helpful AI Engine Architecture References and Tables](#helpful-ai-engine-architecture-references-and-tables)
* [AI Engine documentation](#ai-engine-documentation)

----

## Python Bindings

| Function Signature | Definition | Parameters | Return Type | Example |
Expand Down Expand Up @@ -42,6 +49,22 @@
| `print(ctx.module)` | Converts our ctx wrapped structural code to mlir and prints to stdout|
| `ctx.module.operation.verify()` | Runs additional structural verficiation on the python binded source code and return result to stdout |

## Helpful AI Engine Architecture References and Tables
* [AIE2 - Table of supported data types and vector sizes (AIE API)](https://www.xilinx.com/htmldocs/xilinx2023_2/aiengine_api/aie_api/doc/group__group__basic__types.html)

* Some useful Tile core Trace Events
| Some common events | event ID | dec value |
|--------------------|----------|-----------|
| True |0x01| 1 |
| Stream stalls |0x18| 24 |
| Core Instruction - Event 0 |0x21| 33|
| Core Instruction - Event 1 |0x22| 34 |
| Vector Instructions (e.g. VMAC, VADD, VCMP) |0x25| 37 |
| Lock acquire requests |0x2C| 44 |
| Lock release requests |0x2D| 45 |
| Lock stall |0x1A| 26 |
| Core Port Running 1 |0x4F| 79 |
| Core Port Running 0 |0x4B| 75 |

## AI Engine documentation
* [Summary Documentation Links in UG1076](https://docs.amd.com/r/en-US/ug1076-ai-engine-environment/Documentation)
Expand All @@ -51,5 +74,4 @@
* [AIE2 Register Reference - AM025](https://docs.amd.com/r/en-US/am025-versal-aie-ml-register-reference/Overview)
* [AIE API User Guide - v2023.2](https://www.xilinx.com/htmldocs/xilinx2023_2/aiengine_intrinsics/intrinsics/index.html)

## AIE Detailedd References
* [AIE2 - Table of supported data types and vector sizes (AIE API)](https://www.xilinx.com/htmldocs/xilinx2023_2/aiengine_api/aie_api/doc/group__group__basic__types.html)

0 comments on commit 69d63ef

Please sign in to comment.