Python backend without GIL #8032

zeruniverse · 2025-02-25T04:49:36Z

There is a pretty old issue: for onnxruntime/libtorch etc. backend, in nvidia-smi, you only see tritonserver process taking GPU memory. But if I use pytorch in python backend, each model instance will be a process that takes GPU memory (and for same models, 10 libtorch instances takes way less memory than python instances since libtorch models seems to share some CUDA context memory). I remember I once discussed this issue with someone from Triton team and asked if it's possible all GPU tasks share the tritonserver process, and he told me it's not possible due to Python GIL.

Now Python 3.13 has released with option to disable GIL, is it possible to bundle all GPU tasks into tritonserver process to save GPU memory? Currently I'm able to serve ~50 models in a 16G GPU memory server using onnxruntime or libtorch backend, but with python backend, I can only serve ~10.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python backend without GIL #8032

Python backend without GIL #8032

zeruniverse commented Feb 25, 2025

Python backend without GIL #8032

Python backend without GIL #8032

Comments

zeruniverse commented Feb 25, 2025