Skip to content

Commit

Permalink
[LLM] Add information on full model simulation
Browse files Browse the repository at this point in the history
  • Loading branch information
Viviane Potocnik committed Jul 26, 2024
1 parent 06890dc commit 0da06dd
Showing 1 changed file with 31 additions and 13 deletions.
44 changes: 31 additions & 13 deletions sw/dnn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,11 @@ The applications are compiled into a folder which can be enabled by adding `add_
## Requirements
- `torch`

# Running MHA and MLP Layers for ViT and GPT Models on Snitch Cluster
# Running ViT and GPT Models on the Snitch Cluster

## Introduction

This repository provides implementations of the Multi-Head Attention (MHA) and Multi-Layer Perceptron (MLP) layers for Vision Transformers (ViT) and Generative Pre-trained Transformer (GPT) models.
The applications are designed to run on the Snitch cluster, leveraging its unique architecture for efficient execution.
This repository provides implementations of the Multi-Head Attention (MHA) and Multi-Layer Perceptron (MLP) layers for Vision Transformers (ViT) and Generative Pre-trained Transformer (GPT) models. The applications are designed to run on the Snitch cluster, leveraging its unique architecture for efficient execution.
This work stems from a journal paper currently under review at IEEE Transactions on Circuits and Systems for Artificial Intelligence. A preview of the paper can be found [here](https://arxiv.org/pdf/2405.19284).

The below figure shows a block diagram of the basic Attention layer.
Expand Down Expand Up @@ -96,23 +95,42 @@ cd target/snitch_cluster
To build the hardware, navigate to the `target/snitch_cluster` directory and follow the instructions in the provided README file.
This will set up the necessary environment and compile the hardware of the Snitch cluster.

### Building Software
### Full Model Simulation (Slow)

To simulate the full ViT (encoder) or GPT (decoder) model, you can build the `encoder` and `decoder` application with the correct configuration file.
The configuration files for the ViT model can be found in the `sw/dnn/encoder/data` directory, while the GPT model configuration files are located in the `sw/dnn/decoder/data` directory.
The prefixes of the subdirectories indicate the model architecture. Furthermore, we provide the configuration files for `FP32`, `FP16`, nd `FP8` precision. The following table summarizes the available configurations:

You can follow the above instructions to build the software applications. This will build all of the `dnn` applications, including the MHA and MLP layers for ViT and GPT models.
If you prefer to only build the MHA and MLP layers, you can run the following commands:

| Models| ViT-B | ViT-L | ViT-H | GPT3-XL | GPT-J |
|------|-------|-------|---------|-----------|--------|
| Blocks | 12 | 24 | 32 | 40 | 28 |
| Params | 86M | 307M | 632M | 1.3B | 6B |
| E | 768 | 1024 | 1280 | 2048 | 4096 |
| P | 64 | 64 | 80 | 128 | 256 |
| S | 197 | 197 | 197 | [128-2048]| [128-2048] |
| FF | 3072| 4096 | 5120 | 8192 | 16384 |
| H | 12 | 16 | 16 | 16 | 16 |

The default configuration from the `params.json` can be overwritten by setting the `DATA_CFG` environment variable. An example command to run the ViT-B model in `FP16` precision is shown below:

```bash
make DEBUG=ON sw/apps/dnn/<mlp or mha>
make DEBUG=ON DATA_CFG=sw/dnn/encoder/data/vit-b/vit-b-fp16.json sw/apps/dnn/encoder
```

After building the software, you can run the applications on the Snitch cluster.
Below is an example command using the `QuestaSim` simulator:
After building the software, you can run the applications on the Snitch cluster. Below is an example command using the `QuestaSim` simulator:

```bash
bin/snitch_cluster.vsim sw/apps/dnn/<mlp or mha>/build/<mlp or mha>.elf
bin/snitch_cluster.vsim sw/apps/dnn/<app_name>/build/<app_name>.elf
```

The parameters of the MHA and MLP layers can be configured in the `data/params.json` file.
The current configuration will run a single tile of the MHA and MLP computation.
One can modify said parameters to compute the full ViT and GPT models, however, this will significantly increase the simulation time.
### Single Layer Simulation (Fast)

You can follow the above instructions to build the software applications. This will build all of the `dnn` applications, including the MHA and MLP layers for ViT and GPT models.
If you prefer to only build the MHA and MLP layers, you can run the following commands:

```bash
make DEBUG=ON sw/apps/dnn/<app_name>
```

The parameters of the MHA and MLP layers can be configured in the `data/params.json` file. The current configuration will run a single tile of the MHA and MLP computation.

0 comments on commit 0da06dd

Please sign in to comment.