Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python backend without GIL #8032

Open
zeruniverse opened this issue Feb 25, 2025 · 0 comments
Open

Python backend without GIL #8032

zeruniverse opened this issue Feb 25, 2025 · 0 comments

Comments

@zeruniverse
Copy link

There is a pretty old issue: for onnxruntime/libtorch etc. backend, in nvidia-smi, you only see tritonserver process taking GPU memory. But if I use pytorch in python backend, each model instance will be a process that takes GPU memory (and for same models, 10 libtorch instances takes way less memory than python instances since libtorch models seems to share some CUDA context memory). I remember I once discussed this issue with someone from Triton team and asked if it's possible all GPU tasks share the tritonserver process, and he told me it's not possible due to Python GIL.

Now Python 3.13 has released with option to disable GIL, is it possible to bundle all GPU tasks into tritonserver process to save GPU memory? Currently I'm able to serve ~50 models in a 16G GPU memory server using onnxruntime or libtorch backend, but with python backend, I can only serve ~10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant