From 1089ded9849de95b111e8a42f9b5dded48bb8821 Mon Sep 17 00:00:00 2001 From: Yuka Ikarashi Date: Tue, 29 Oct 2024 15:23:16 -0400 Subject: [PATCH] Add more stuff to object_code.md --- docs/Imports.md | 2 +- docs/README.md | 2 +- docs/object_code.md | 65 +++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 65 insertions(+), 4 deletions(-) diff --git a/docs/Imports.md b/docs/Imports.md index f5385c8f..6bf3ad8d 100644 --- a/docs/Imports.md +++ b/docs/Imports.md @@ -89,7 +89,7 @@ Alternatively, users can define their own scheduling operations by composing sch ## 8. API Cursors -Cursors (see [Cursors.md](./Cursors.md)) are Exo's reference mechanism that allows users to navigate and inspect object code. When users define new scheduling operators using Cursors, they may wish to write their own inspection pass. API Cursors define types that will be useful for user inspection. +Cursors (see [Cursors.md](./Cursors.md)) are Exo's reference mechanism that allows users to navigate and inspect object code. When users define new scheduling operators using Cursors, they may wish to write their own inspection pass (see [inspection.md](./inspection.md). API Cursors define types that will be useful for user inspection. ```python from exo.API_cursors import ForCursor, AssignCursor, InvalidCursor diff --git a/docs/README.md b/docs/README.md index b7d4c537..32288783 100644 --- a/docs/README.md +++ b/docs/README.md @@ -7,7 +7,7 @@ This directory provides detailed documentation about Exo's interface and interna - For information on writing Exo object code, APIs, and imports, refer to [Procedures.md](Procedures.md), [object_code.md](object_code.md), and [Imports.md](Imports.md). - To learn how to define **hardware targets externally to the compiler**, refer to [externs.md](externs.md), [instructions.md](instructions.md), and [memories.md](memories.md). - To learn how to define **new scheduling operations externally to the compiler**, refer to [Cursors.md](./Cursors.md) and [inspection.md](./inspection.md). -- To understand the available scheduling primitives and how to use them, look into the primitives/ directory. +- To understand the available scheduling primitives and how to use them, look into the [primitives/](./primitives) directory. The scheduling primitives are classified into six categories: diff --git a/docs/object_code.md b/docs/object_code.md index 38028674..b4bd24ab 100644 --- a/docs/object_code.md +++ b/docs/object_code.md @@ -62,13 +62,14 @@ name: type[size] @ memory ``` - **`name`**: The variable name. -- **`type`**: The data type (e.g., `i32`, `f32`). +- **`type`**: The data type. Supported precision types are: `f16`, `f32`, `f64`, `i8`, `i32`, `ui8`, and `ui16`. - **`[size]`**: The dimensions of the array (optional for scalars). - **`@ memory`**: The memory space where the variable resides. + #### Procedure Arguments -Procedure arguments are declared with their types, sizes, and memory spaces. They can have dependent sizes based on other arguments. +Procedure arguments are declared with their types, sizes, and memory spaces. They can have sizes that depend on other arguments. Example from the code: @@ -81,6 +82,59 @@ data: i32[IC, N] @ DRAM - **`[IC, N]`**: A 2D array with dimensions `IC` and `N`. - **`@ DRAM`**: Specifies that `data` resides in DRAM memory. +The `data` buffer above represents **tensor** types, which means the stride of the innermost dimension is 1, and the strides of other dimensions are simple multiples of the shapes of the inner dimensions. + +Exo allows **window expressions** as well, which are similar to array slicing in Python. Instead of accessing the buffer point-wise (e.g., `x[i]`), users can *window* the array as `x[i:i+2]`. This will create a windowed array of size 2. +Exo procedures take tensor expressions when annotated with `x:f32[3]` syntax and take window expressions when annotated with `x:[f32][3]`, with square brackets around the types. + +```python +@proc +def foo(x: [f32][3]): + for i in seq(0, 3): + x[i] = 0.0 + +@proc +def bar(y: f32[10], z: f32[20, 20]): + foo(y[2:5]) + foo(z[1, 10:13]) +``` + +In this example, `foo` takes a window array of size 3, and `bar` calls `foo` by slicing `y` and `z`, respectively. Running `exocc` on this will generate the following C code: + +```c +#include "tmp.h" + +#include +#include + +// bar( +// y : f32[10] @DRAM, +// z : f32[20, 20] @DRAM +// ) +void bar(void *ctxt, float* y, float* z) { + foo(ctxt, (struct exo_win_1f32){ &y[2], { 1 } }); + foo(ctxt, (struct exo_win_1f32){ &z[20 + 10], { 1 } }); +} + +// foo( +// x : [f32][3] @DRAM +// ) +void foo(void *ctxt, struct exo_win_1f32 x) { + for (int_fast32_t i = 0; i < 3; i++) { + x.data[i * x.strides[0]] = 0.0f; + } +} +``` + +Moreover, Exo checks the consistency of tensor and window bounds in the frontend. If you modify `foo(y[2:5])` to `foo(y[2:6])` in the code above, the bounds check will fail and emit the following error: + +``` +TypeError: Errors occurred during effect checking: +/private/tmp/tmp.py:12:8: type-shape of calling argument may not equal the required type-shape: [Effects.BinOp(op='-', lhs=Effects.Const(val=6, type=LoopIR.Int(), srcinfo=), rhs=Effects.Const(val=2, type=LoopIR.Int(), srcinfo=), type=LoopIR.Index(), srcinfo=)] vs. [Effects.Const(val=3, type=LoopIR.Int(), srcinfo=)]. It could be non equal when: + y_stride_0 = 1, z_stride_0 = 20, z_stride_1 = 1 +``` + + #### Allocations Variables within the procedure are declared similarly to arguments. @@ -167,6 +221,11 @@ else: data[c, j + r] ``` +- **Window Statements**: Creates a slice (in other words, _window_) of the buffer and assign a new name. + ```python + y = x[0:3] + ``` + ## Limitations Exo has a few limitations that users should be aware of: @@ -179,6 +238,8 @@ Exo has a few limitations that users should be aware of: pass ``` + Exo allows quasi-affine indexing by division (e.g., `i/3`) and modulo (e.g., `i%3`) by constants. + To work around this limitation, you may need to restructure your code or use additional variables to represent the non-affine expressions. 2. **Value-dependent control flow**: Exo separates control values from buffer values, which means that it is not possible to write value-dependent control flow. For instance, the following code is not allowed: