Skip to content

Commit

Permalink
readme detailed
Browse files Browse the repository at this point in the history
  • Loading branch information
ismetdagli committed Nov 27, 2023
1 parent 257367b commit ff28fac
Show file tree
Hide file tree
Showing 4 changed files with 73 additions and 17 deletions.
82 changes: 73 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,61 @@ Artifact described here includes the source code for HaX-CoNN GPU and DLA runtim

## Description

1. Checklist(meta information)
* Hardware: Jetson Xavier AGX 32 GB
* Software easy installation: Jetpack 4.5.1
* Software details of Jetpack 4.5.1 includes: #TODO_ISMET
1. Check-list (artifact meta information)
* Hardware: NVIDIA Jetson Xavier AGX 32 GB and NVIDIA Jetson Orin AGX 32 GB
* Software easy installation: [Jetpack 4.5.1](https://developer.nvidia.com/embedded/jetpack-sdk-451-archive) on Xavier AGX and [TODO-Jetpack Version](https://developer.nvidia.com/embedded/jetpack-sdk-451-archive) on Orin AGX
* Architecture: aarch64
* Software details needed: Xavier AGX uses Python 3.6.9, TensorRT 7.1.3, CUDA 10.2.89 and Orin AGX uses Python 3.8, TensorRT 8.4.0, CUDA 11.2
* Binary: Binary files are large. So, generating binary files are neccesary by using scripts in this artifact.
* Output: Profiling data (execution time, transition time, memory use) for both layers and neural networks. The end results is the improved execution time/throughput.
* Experiment workflow: Python and bash scripts

2. Hardware dependencies

# Experimental Setup
We performed our experiments on an NVIDIA Jetson Xavier AGX 32 GB and NVIDIA Jetson AGX Orin 32 GB. While HaX-CoNN is compatible with any architectures using TensorRT with NVIDIA GPUs, we also use DLA which does only exist in Jetson Families. So, reproducibility of current status of the code requires Xavier AGX or AGX Orin.

First and foremost, this is a empirical study. We are open sourcing all the details how we collected data. The data collected through profiling has been encoded to script.
3. Software dependencies

The easiest way to follow our dependencies is to use [Jetpack 4.5.1](https://developer.nvidia.com/embedded/jetpack-sdk-451-archive) on Xavier AGX and [TODO-Jetpack Version](https://developer.nvidia.com/embedded/jetpack-sdk-451-archive) on Orin AGX. We mainly use TensorRT as ML framework in our implementation since DLA can be programmed via only TensorRT. Xavier AGX has TensorRT 7.1.3 and Orin AGX uses TensorRT 8.4.0. It is important to note that manually installing TensorRT/Cuda etc. is not suggested.

4. Installation

We assume installation through JetPack is followed. Upon it, run the script below to install python dependencies.

TODO: write a script that install these
pip: sudo apt install -y python3-pip
stats: sudo pip3 install -U jetson-stats
Z3: pip3 install z3-solver



## Experimental Setup

This is a empirical study. We are listing the details how we collected data. The data collected through profiling has been encoded to scripts. Run the makefile to built some of the necessary binaries to collect data

TODO_EYMEN: Eymen, you need to explain what has been built after make file ()
My understanding is this: (please modify/elaborate/update etc. to make this instruction clear and detailed)
1- Built googlenet 22 tensorrt binary file running only GPU and DLA. The first 11 binary uses only GPU and the next 11 binary uses DLA. (Line 17)
2- We collect iterate through binary files (.plan/.engine) to collect total execution time (line 23). (Refer to "Transition time profiling" section below for further details )

3- QUESTION_EYMEN:Do we run such things? We built 25 convolution layer engines varying input sizes and filter (kernel) sizes. We measure external memory controller (EMC) utilization while running these engines of convolution layers.
4- QUESTION_EYMEN: Do we run EMC profiling here?



NOTE: Running make takes ~1/2 hours on Xavier AGX.

```bash
cd HaX-CoNN/
export PYTHONPATH="$(pwd):$PYTHONPATH"
make
```


## Layer profiling:
This creates a text file of a DNN. The line after " [I] GPU Compute" are our target data. We use mean data as the average of X number of iterations #TODO_ISMET
This creates a text file of a DNN. The line after " [I] GPU Compute" are our target data. We use *mean* data as the average of X number of iterations iteration is passed as argument to our trtexec binary file. We generally use 1000 iteration to mitigate if any unexpected noise occurs.


```bash
python3 collect_data_single_layers.py
```
Expand All @@ -37,6 +74,19 @@ python3 scripts/layer_analysis/layer_gpu_util.py --profile <profile-path>
python3 scripts/layer_analysis/layer_gpu_util.py --profile build/googlenet_transition_plans/profiles/googlenet_dla_transition_at_24.profile
```


#TODO_EYMEN: We need to add a script/command here showing that we are generating the layer's execution time. The command should generate an output file as this:

Layer group GPU(ms) DLA(ms)
0-9 x y
10-24 x y
25-38 x y
39-52 x y
.
.
.


## Transition time profiling:
The easiest way to profile the layer's transition cost is to generate transition per layer engines. ([TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#abstract) refers to executable DNN files, we follow the same terms to prevent any confusion)
```bash
Expand All @@ -55,22 +105,36 @@ python3 src/build_engine.py --prototxt prototxt_input_files/googlenet.prototxt -
/usr/src/tensorrt/bin/trtexec --iterations=10000 --dumpProfile --exportProfile=build/googlenet_transition_plans/profiles/googlenet_gpu_transition_at_0.profile --avgRuns=1 --warmUp=5000 --duration=0 --loadEngine=build/googlenet_transition_plans/googlenet_gpu_transition_at_0.plan > build/googlenet_transition_plans/profile_logs/googlenet_gpu_transition_at_0.log
```

#TODO_EYMEN: Similar to above, We need to add a script/command here showing that we are generating the layer's transition cost time. The command should generate an output file as this:

Layer group Transition from GPU to DLA
0-9 x
10-24 x

25-38 x
39-52 x
.
.
.

## EMC utilization can be profiled running the command below.
Figure 3 is calculated running the commands below.

DNNs are generated by running the script below. The script reads prototxt files from `convolution_characterization_prototxts/` and generates a TensorRT engine for each layer in `build/convolution_characterization_plans/`.

#TODO_EYMEN: This script is updated but the command is outdated, revisit is needed. I guess emc_single_run.sh?

```bash
python3 scripts/emc_analysis/engine_build_convolution_characterization.py
```

NOTE to eymen: while updating the code, please update the emc output data above. emc_util_all.py can use the output data.
To run the generated DNNs and profile them, run the command below.

```bash
python3 scripts/emc_analysis/emc_util_all.py
```

The output is visible in output/emc_results.yaml
The output is visible in output/emc_results.yaml (TODO_ISMET: We will give a reference to figure 3 in the paper)

```bash
cat output/emc_results.yaml
Expand Down
4 changes: 0 additions & 4 deletions build_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,7 @@
"""
import tensorrt as trt
import sys, os

# from natsort import natsorted
import time
from pathlib import Path
import glob

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

Expand Down
1 change: 0 additions & 1 deletion build_engine_orin.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import sys, os
import logging
from pathlib import Path
import numpy as np

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

Expand Down
3 changes: 0 additions & 3 deletions run_multiple_dnn.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
import glob
import subprocess
import threading
from datetime import datetime
from pathlib import Path

# from natsort import natsorted
import time


Expand Down

0 comments on commit ff28fac

Please sign in to comment.