Two Docker images are available from NVIDIA GPU Cloud (NGC) that make it possible to easily construct customized versions of Triton. By customizing Triton you can significantly reduce the size of the Triton image by removing functionality that you don't require.
Currently the customization is limited as described below but future releases will increase the amount of customization that is available. It is also possible to build Triton yourself to get more exact customization.
The two Docker images used for customization are retrieved using the following commands.
$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3-min
$ docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3
Where <xx.yy> is the version of Triton that you want to customize. The <xx.yy>-py3-min image is a minimal, base image that contains the CUDA, cuDNN, etc. dependencies that are required to run Triton. The <xx.yy>-py3 image contains the complete Triton with all options and backends.
To create an image containing the minimal possible Triton use the following multi-stage Dockerfile. As mentioned above the amount of customization currently available is limited. As a result the minimum Triton still contains both HTTP/REST and GRPC endpoints; S3, GCS and Azure Storage filesystem support; and the TensorRT backend.
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3 as full
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3-min
COPY --from=full /opt/tritonserver/bin /opt/tritonserver/bin
COPY --from=full /opt/tritonserver/lib /opt/tritonserver/lib
Then use Docker to create the image.
$ docker build -t tritonserver_min .
One or more of the supported PyTorch, TensorFlow1, TensorFlow2, ONNX Runtime, OpenVINO, Python, and DALI backends can be added to the minimum Triton image. The backend can be built from scratch or the appropriate backend directory can be copied from from the full Triton image. For example, to create a Triton image that creates a minimum Triton plus support for TensorFlow1 use the following Dockerfile.
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3 as full
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3-min
COPY --from=full /opt/tritonserver/bin /opt/tritonserver/bin
COPY --from=full /opt/tritonserver/lib /opt/tritonserver/lib
COPY --from=full /opt/tritonserver/backends/tensorflow1 /opt/tritonserver/backends/tensorflow1
Depending on the backend it may also be necessary to include additional dependencies in the image. For example, the Python backend requires that Python3 be installed in the image.
Then use Docker to create the image.
$ docker build -t tritonserver_custom .
You can create and build your own Triton backend. The result of that build should be a directory containing your backend shared library and any additional files required by the backend. Assuming your backend is called "mybackend" and that the directory is "./mkbackend", the following Dockerfile will create a Triton image that contains all the supported Triton backends plus your custom backend.
FROM nvcr.io/nvidia/tritonserver:<xx.yy>-py3 as full
COPY ./mybackend /opt/tritonserver/backends/mybackend
You also need to install any additional dependencies required by your backend as part of the Dockerfile. Then use Docker to create the image.
$ docker build -t tritonserver_custom .