The recommended way to access Perf Analyzer is to run the pre-built executable from within the Triton SDK docker container available on the NVIDIA GPU Cloud Catalog. As long as the SDK container has its network exposed to the address and port of the inference server, Perf Analyzer will be able to run.
export RELEASE=<yy.mm> # e.g. to use the release from the end of December of 2024, do `export RELEASE=24.12`
docker run --rm --gpus=all -it --net=host nvcr.io/nvidia/tritonserver:${RELEASE}-py3-sdk
# inside container
perf_analyzer -m <model>
pip install tritonclient
perf_analyzer -m <model>
Warning: If any runtime dependencies are missing, Perf Analyzer will produce errors showing which ones are missing. You will need to manually install them.
docker run --rm --gpus=all -it --net=host ubuntu:24.04
# inside container, install build/runtime dependencies
apt update && DEBIAN_FRONTEND=noninteractive apt install -y cmake g++ git libssl-dev nvidia-cuda-toolkit python3 rapidjson-dev zlib1g-dev
git clone --depth=1 https://github.com/triton-inference-server/perf_analyzer.git
mkdir perf_analyzer/build
cmake -B perf_analyzer/build -S perf_analyzer
cmake --build perf_analyzer/build -- -j8
export PATH=$(pwd)/perf_analyzer/build/perf_analyzer/src/perf-analyzer-build:$PATH
perf_analyzer -m <model>
- To enable
OpenAI mode, add
-D TRITON_ENABLE_PERF_ANALYZER_OPENAI=ON
to the firstcmake
command. - To enable
C API mode, add
-D TRITON_ENABLE_PERF_ANALYZER_C_API=ON
to the firstcmake
command. - To enable TorchServe backend, add
-D TRITON_ENABLE_PERF_ANALYZER_TS=ON
to the firstcmake
command. - To enable
Tensorflow Serving backend,
add
-D TRITON_ENABLE_PERF_ANALYZER_TFS=ON
to the firstcmake
command. - To disable
CUDA shared memory support and the dependency
on CUDA toolkit libraries, add
-D TRITON_ENABLE_GPU=OFF
to the firstcmake
command.