Skip to content

Commit

Permalink
Fix isolation (#186)
Browse files Browse the repository at this point in the history
  • Loading branch information
IlyasMoutawwakil authored Apr 29, 2024
1 parent d9a8423 commit 10e4ece
Show file tree
Hide file tree
Showing 28 changed files with 570 additions and 505 deletions.
6 changes: 2 additions & 4 deletions .github/workflows/test_cli_rocm_pytorch_single_gpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,8 @@ jobs:

- name: Target devices
run: |
echo "DEVICE0: $DEVICE0"
echo "DEVICE1: $DEVICE1"
echo "DEVICE0=$DEVICE0" >> $GITHUB_ENV
echo "DEVICE1=$DEVICE1" >> $GITHUB_ENV
echo "DEVICE: $DEVICE"
echo "DEVICE=$DEVICE" >> $GITHUB_ENV
- name: Build image
run: docker build
Expand Down
1 change: 0 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ run_rocm_container:
docker run \
-it \
--rm \
--pid host \
--shm-size 64G \
--device /dev/kfd \
--device /dev/dri \
Expand Down
1 change: 1 addition & 0 deletions examples/pytorch_bert.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ experiment_name: pytorch_bert

launcher:
device_isolation: true
device_isolation_action: warn

benchmark:
latency: true
Expand Down
7 changes: 4 additions & 3 deletions examples/pytorch_llama.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,16 @@ defaults:

experiment_name: pytorch_llama

launcher:
device_isolation: true
device_isolation_action: warn

backend:
device: cuda
device_ids: 0
no_weights: true
model: TheBloke/Llama-2-70B-AWQ

launcher:
device_isolation: true

benchmark:
input_shapes:
batch_size: 1
Expand Down
7 changes: 4 additions & 3 deletions examples/pytorch_timm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,15 @@ defaults:

experiment_name: pytorch_timm

launcher:
device_isolation: true
device_isolation_action: warn

backend:
device: cuda
device_ids: 0
model: timm/mobilenetv3_large_100.ra_in1k

launcher:
device_isolation: true

benchmark:
memory: true
input_shapes:
Expand Down
4 changes: 4 additions & 0 deletions examples/trt_llama.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ defaults:

experiment_name: trt_llama

launcher:
device_isolation: true
device_isolation_action: warn

backend:
device: cuda
device_ids: 0
Expand Down
5 changes: 1 addition & 4 deletions optimum_benchmark/backends/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,8 @@ def train(self, **kwargs) -> TrainerState:
"""
raise NotImplementedError("Backend must implement train method")

def delete_pretrained_model(self) -> None:
def clean(self) -> None:
if hasattr(self, "pretrained_model"):
del self.pretrained_model

def clean(self) -> None:
LOGGER.info(f"Cleaning {self.NAME} backend")
self.delete_pretrained_model()
gc.collect()
Loading

0 comments on commit 10e4ece

Please sign in to comment.