New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

How to get access to the vllm backend model #7916

Closed

lianyiyi opened this issue Jan 3, 2025 · 1 comment

Labels

question

lianyiyi commented Jan 3, 2025

Hi, I want to get access to the backend model, do you know how to make it happen? Thanks!

Contributor

tanmayv25 commented Jan 25, 2025

The term model within Triton vLLM backend is quite overloaded now :P

The python model.py which serves vLLM engine within Triton's python backend is now called one of the python-based backend.
If you are searching for it, the file can be found here: https://github.com/triton-inference-server/vllm_backend/blob/main/src/model.py

See vLLM section within: https://github.com/triton-inference-server/backend?tab=readme-ov-file#where-can-i-find-all-the-backends-that-are-available-for-triton

This python-based backend loads model.json which is basically the EngineArgs that gets passed to the vLLM engine. See here to learn more: https://github.com/triton-inference-server/vllm_backend?tab=readme-ov-file#using-the-vllm-backend

The vLLM backend's model.json can point to a HF model via "model" field.

tanmayv25 closed this as completed

tanmayv25 added the question label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment