You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a pretty old issue: for onnxruntime/libtorch etc. backend, in nvidia-smi, you only see tritonserver process taking GPU memory. But if I use pytorch in python backend, each model instance will be a process that takes GPU memory (and for same models, 10 libtorch instances takes way less memory than python instances since libtorch models seems to share some CUDA context memory). I remember I once discussed this issue with someone from Triton team and asked if it's possible all GPU tasks share the tritonserver process, and he told me it's not possible due to Python GIL.
Now Python 3.13 has released with option to disable GIL, is it possible to bundle all GPU tasks into tritonserver process to save GPU memory? Currently I'm able to serve ~50 models in a 16G GPU memory server using onnxruntime or libtorch backend, but with python backend, I can only serve ~10.
The text was updated successfully, but these errors were encountered:
There is a pretty old issue: for onnxruntime/libtorch etc. backend, in nvidia-smi, you only see tritonserver process taking GPU memory. But if I use pytorch in python backend, each model instance will be a process that takes GPU memory (and for same models, 10 libtorch instances takes way less memory than python instances since libtorch models seems to share some CUDA context memory). I remember I once discussed this issue with someone from Triton team and asked if it's possible all GPU tasks share the tritonserver process, and he told me it's not possible due to Python GIL.
Now Python 3.13 has released with option to disable GIL, is it possible to bundle all GPU tasks into tritonserver process to save GPU memory? Currently I'm able to serve ~50 models in a 16G GPU memory server using onnxruntime or libtorch backend, but with python backend, I can only serve ~10.
The text was updated successfully, but these errors were encountered: