update readme and hopefully fix trt-llm

huggingface · Feb 16, 2024 · 9fb2d64 · 9fb2d64
1 parent fdbb264
commit 9fb2d64
Show file tree

Hide file tree

Showing 3 changed files with 37 additions and 58 deletions.
diff --git a/.github/workflows/test_cli_tensorrt_llm.yaml b/.github/workflows/test_cli_tensorrt_llm.yaml
@@ -40,4 +40,4 @@ jobs:
           --gpus '"device=0,1"'
           --entrypoint /bin/bash
           opt-bench-tensorrt-llm:latest
-          -c "pip install pynvml && pip install -e .[testing] && pytest -k 'cli and tensorrt_llm' -x"
+          -c "pip install -e .[testing] && pip uninstall nvidia-ml-py && pytest -k 'cli and tensorrt_llm' -x"
diff --git a/README.md b/README.md
@@ -5,33 +5,41 @@
 
 Optimum-Benchmark is a unified [multi-backend & multi-device](#backends--devices-) utility for benchmarking [Transformers](https://github.com/huggingface/transformers), [Diffusers](https://github.com/huggingface/diffusers), [PEFT](https://github.com/huggingface/peft), [TIMM](https://github.com/huggingface/pytorch-image-models) and [Optimum](https://github.com/huggingface/optimum) flavors, along with all their supported [optimizations & quantization schemes](#backend-features-), for [inference & training](#benchmark-features-%EF%B8%8F), in [distributed & non-distributed settings](#backend-features-).
 
-## Motivation 🤔
-
+*Motivation* 🤔
 - HF hardware partners wanting to know how their hardware performs compared to another hardware on the same models.
 - HF ecosystem users wanting to know how their chosen model performs in terms of latency, throughput, memory usage, energy consumption, etc compared to another model.
 - Experimenting with hardware & backend specific optimizations & quantization schemes that can be applied to models and improve their computational/memory/energy efficiency.
 
+*Latest News* 📰
+- Added a simple Python API to run benchmarks with all isolation and tracking features supported by the CLI.
+
 ## Current status 📈
 
 ### API
 
 [![CPU](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cpu.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cpu.yaml)
 [![CUDA](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cuda.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_cuda.yaml)
 [![ROCM](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_rocm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_rocm.yaml)
+[![MISC](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_misc.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_api_misc.yaml)
 
 ### CLI
 
-[![CPU Pytorch Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cpu_pytorch.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cpu_pytorch.yaml)
-[![CPU OnnxRuntime Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cpu_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cpu_onnxruntime.yaml)
-[![CPU Intel Neural Compressor Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cpu_neural_compressor.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cpu_neural_compressor.yaml)
-[![CPU OpenVINO Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cpu_openvino.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cpu_openvino.yaml)
-[![CUDA Pytorch Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cuda_pytorch.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cuda_pytorch.yaml)
-[![CUDA OnnxRuntime Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cuda_onnxruntime_inference.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cuda_onnxruntime_inference.yaml)
-[![CUDA Torch-ORT Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cuda_torch_ort_training.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cuda_torch_ort_training.yaml)
-[![TensorRT OnnxRuntime Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_tensorrt_onnxruntime_inference.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_tensorrt_onnxruntime_inference.yaml)
-[![TensorRT-LLM Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_tensorrt_llm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_tensorrt_llm.yaml)
-[![ROCm Pytorch Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_rocm_pytorch.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_rocm_pytorch.yaml)
-[![ROCm OnnxRuntime Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_rocm_onnxruntime_inference.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_rocm_onnxruntime_inference.yaml)
+[![CPU Pytorch Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_pytorch.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_pytorch.yaml)
+[![CPU OnnxRuntime Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_onnxruntime.yaml)
+[![CPU Intel Neural Compressor Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_neural_compressor.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_neural_compressor.yaml)
+[![CPU OpenVINO Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_openvino.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cpu_openvino.yaml)
+
+[![CUDA Pytorch Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_pytorch.yaml)
+[![CUDA OnnxRuntime Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_onnxruntime.yaml)
+[![CUDA Torch-ORT Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_cuda_torch_ort.yaml)
+
+[![TensorRT OnnxRuntime Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_tensorrt_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_tensorrt_onnxruntime.yaml)
+[![TensorRT-LLM Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_tensorrt_llm.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_tensorrt_llm.yaml)
+
+[![ROCm Pytorch Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_pytorch.yaml)
+[![ROCm OnnxRuntime Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_onnxruntime.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_rocm_onnxruntime.yaml)
+
+[![MISC Tests](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_misc.yaml/badge.svg)](https://github.com/huggingface/optimum-benchmark/actions/workflows/test_cli_misc.yaml)
 
 ## Quickstart 🚀
 
@@ -64,7 +72,7 @@ Depending on the backends you want to use, you might need to install some extra
 
 ### Running benchmarks from Python API 🧪
 
-You can run benchmarks from the Python API, using the `launch` function from the `optimum_benchmark.experiment` module. Here's an example of how to run a benchmark using the `pytorch` backend, `process` launcher and `inference` benchmark.
+You can run benchmarks from the Python API, using the `launch` function. Here's an example of how to run a benchmark using the `pytorch` backend, `torchrun` launcher and `inference` benchmark.
 
 ```python
 from optimum_benchmark.logging_utils import setup_logging
@@ -85,68 +93,45 @@ if __name__ == "__main__":
         backend=backend_config,
     )
     benchmark_report = launch(experiment_config)
-    experiment_config.push_to_hub("IlyasMoutawwakil/benchmarks")
-    benchmark_report.push_to_hub("IlyasMoutawwakil/benchmarks")
+    experiment_config.push_to_hub("IlyasMoutawwakil/benchmarks") # pushes experiment_config.json to the hub
+    benchmark_report.push_to_hub("IlyasMoutawwakil/benchmarks") # pushes benchmark_report.json to the hub
 ```
 
-Yep, it's that simple! Check the supported backends, launchers and benchmarks in the [features](#features-) section.
+Yep, it's that simple! Check the supported backends, launchers and benchmarks matrix in the [features](#features-) section.
 
 ### Running benchmarks from CLI 🏃‍♂️
 
-You can run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for [`hydra`](https://hydra.cc/). `--config-dir` is the directory where the configuration files are stored and `--config-name` is the name of the configuration file without its `.yaml` extension.
+You can also run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for [`hydra`](https://hydra.cc/). `--config-dir` is the directory where the configuration files are stored and `--config-name` is the name of the configuration file without its `.yaml` extension.
 
 ```bash
 optimum-benchmark --config-dir examples/ --config-name pytorch_bert
 ```
 
 This will run the benchmark using the configuration in [`examples/pytorch_bert.yaml`](examples/pytorch_bert.yaml) and store the results in `runs/pytorch_bert`.
 
-The result files are `benchmark_report.json`, the program's logs `experiment.log` and the configuration that's been used `experiment_config.yaml`, including backend, launcher, benchmark and environment configurations.
+The result files are `benchmark_report.json`, the program's logs `cli.log` and the configuration that's been used `experiment_config.json`, including backend, launcher, benchmark and environment configurations.
 
 The directory for storing these results can be changed by setting `hydra.run.dir` (and/or `hydra.sweep.dir` in case of a multirun) in the command line or in the config file.
 
-### Configuration overrides 🎛️
+#### Configuration overrides 🎛️
 
 It's easy to override the default behavior of a benchmark from the command line.
 
 ```bash
 optimum-benchmark --config-dir examples/ --config-name pytorch_bert backend.model=gpt2 backend.device=cuda
 ```
 
-### Configuration multirun sweeps 🧹
+#### Configuration multirun sweeps 🧹
 
 You can easily run configuration sweeps using the `-m` or `--multirun` option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins : `=submitit`, `hydra/launcher=rays`, etc.
-Note that the hydra launcher `hydra/launcher` is different than our own `launcher`, specifically `hydra/launcher` can only be used in `--multirun` mode, and will only handle the inter-run behavior.
 
 ```bash
 optimum-benchmark --config-dir examples --config-name pytorch_bert -m backend.device=cpu,cuda
 ```
 
-Also, for integer parameters like `batch_size`, one can specify a range of values to sweep over:
-
-```bash
-optimum-benchmark --config-dir examples --config-name pytorch_bert -m device=cpu,cuda benchmark.input_shapes.batch_size='range(1,10,step=2)'
-```
-
 ### Configurations structure 📁
 
-You can create custom configuration files following the [examples here](examples).
-You can also use `hydra`'s [composition](https://hydra.cc/docs/0.11/tutorial/composition/) with a base configuration ([`examples/pytorch_bert.yaml`](examples/pytorch_bert.yaml) for example) and override/define parameters.
-
-To create a configuration that uses a `wav2vec2` model and `onnxruntime` backend, it's as easy as:
-
-```yaml
-defaults:
-  - pytorch_bert
-  - _self_
-  - override backend: onnxruntime
-
-experiment_name: onnxruntime_wav2vec2
-model: bookbot/distil-wav2vec2-adult-child-cls-37m
-device: cpu
-```
-
-Other than the [examples](examples), you can also check [tests](tests/configs/).
+You can create custom configuration files following the [examples here]([examples](https://github.com/IlyasMoutawwakil/optimum-benchmark-examples)).
 
 ## Features 🎨
 
@@ -161,9 +146,9 @@ Everything else is optional or inferred at runtime, but can be configured to you
 
 ### Launchers 🚀
 
+- [x] Distributed inference/training (`launcher=torchrun`)
 - [x] Process isolation between consecutive runs (`launcher=process`)
 - [x] Assert GPU devices (NVIDIA & AMD) isolation (`launcher.device_isolation=true`)
-- [x] Distributed inference/training (`launcher=torchrun`, `launcher.n_proc_per_node=2`)
 
 ### Backends & Devices 📱
 
@@ -181,19 +166,18 @@ Everything else is optional or inferred at runtime, but can be configured to you
 ### Benchmarking 🏋️
 
 - [x] Memory tracking (`benchmark.memory=true`)
-- [x] Latency and throughput tracking of forward pass (default)
+- [x] Energy and efficiency tracking (`benchmark.energy=true`)
+- [x] Latency and throughput tracking (`benchmark.latency=true`)
 - [x] Warm up runs before inference (`benchmark.warmup_runs=20`)
 - [x] Warm up steps during training (`benchmark.warmup_steps=20`)
-- [x] Energy and carbon emissions tracking (`benchmark.energy=true`)
 - [x] Inputs shapes control (e.g. `benchmark.input_shapes.sequence_length=128`)
 - [x] Dataset shapes control (e.g. `benchmark.dataset_shapes.dataset_size=1000`)
-- [x] Latancy and throughput tracking of generation pass (auto-enabled for generative models)
-- [x] Prefill latency and Decoding throughput deduced from generation and forward pass (auto-enabled for generative models)
-- [x] Forward and Generation pass control (e.g. for an LLM `benchmark.generate_kwargs.max_new_tokens=100`, for a diffusion model `benchmark.forward_kwargs.num_images_per_prompt=4`)
+- [x] Prefill latency and Decoding throughput deduced from Generate and Forward pass (auto-enabled for text generation models)
+- [x] Forward, Call and Generate pass kwargs control (e.g. for an LLM `benchmark.generate_kwargs.max_new_tokens=100`, for a diffusion model `benchmark.call_kwargs.num_images_per_prompt=4`)
 
 ### Backend features 🧰
 
-- [x] Random weights initialization (`backend.no_weights=true` for fast model instantiation without downloading weights)
+- [x] "No weights" to benchmark models without downloading their weights (`backend.no_weights=true`)
 - [x] Onnxruntime Quantization and AutoQuantization (`backend.quantization=true` or `backend.auto_quantization=avx2`, etc)
 - [x] Onnxruntime Calibration for Static Quantization (`backend.quantization_config.is_static=true`, etc)
 - [x] Onnxruntime Optimization and AutoOptimization (`backend.optimization=true` or `backend.auto_optimization=O4`, etc)

diff --git a/tests/configs/_ds_tp_.yaml b/tests/configs/_ds_tp_.yaml
@@ -10,8 +10,3 @@ backend:
   deepspeed_inference_config:
     tensor_parallel:
       tp_size: 2
-
-hydra:
-  launcher:
-    n_jobs: 1
-    batch_size: 1