-
-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vLLM Integration #1336
Comments
I got it to work with a model hosted on Modal running an OpenAI server https://modal.com/docs/examples/vllm_inference that worked out of the box with the TOOLS mode. Tested it last week with Qwen-2-VL. Going to close this issue for now since it's not an issue, but feel free to open it again if you encounter the same issues. |
hello @ivanleomk! Sorry to reopen, I tested it with Qwen 2.5 VL 72B but it didn't work with tool mode. Here is how I deployed: vllm serve Qwen/Qwen2.5-VL-72B-Instruct --port 8000 --host 0.0.0.0 --dtype bfloat16 --tensor-parallel-size 4 \
--limit-mm-per-prompt image=5,video=0 --enable-auto-tool-choice --tool-call-parser hermes (I also tested without This is the python code: class Response(BaseModel):
reasoning: str
answer: str
images_path = Path(
"/Users/juanovalle/Informa Repositories/ingestion_pipeline/data/images_inference/2023"
)
image1 = instructor.Image.from_path(images_path / "2023_0002.png")
vllm_url = "http://192.153.62.139:8000/v1"
vllm_api_key = "emtpy"
model_name="Qwen/Qwen2.5-VL-72B-Instruct"
vllm_client = AsyncOpenAI(base_url=vllm_url, api_key=vllm_api_key)
instructor_client = instructor.from_openai(
client=vllm_client
)
response = await instructor_client.chat.completions.create_with_completion(
model=model_name,
response_model=Response,
messages=[
{
"role": "user",
"content": ["How many colleagues doe sinforma have", image1],
},
],
max_tokens=1024,
temperature=0.0,
) And I got this error: RetryError: RetryError[<Future at 0x12975dc40 state=finished raised BadRequestError>]
[...]
InstructorRetryException: Error code: 400 - {'object': 'error', 'message': 'Expecting value: line 1 column 1 (char 0)', 'type': 'BadRequestError', 'param': None, 'code': 400} These are the logs from the server:
PD: It works with JSON mode. |
Hello!
I am wondering if there is a recommended way to use Instructor with vLLM.
I have been doing:
But in theory the instructor.Mode.TOOLS should work, shouldn't it?
How has been your experience with this?
The text was updated successfully, but these errors were encountered: