Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vLLM Integration #1336

Closed
jjovalle99 opened this issue Feb 4, 2025 · 2 comments
Closed

vLLM Integration #1336

jjovalle99 opened this issue Feb 4, 2025 · 2 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@jjovalle99
Copy link

Hello!

I am wondering if there is a recommended way to use Instructor with vLLM.

I have been doing:

vllm_client = OpenAI(...)
from_openai(vllm_client, mode=instructor.Mode.JSON)

But in theory the instructor.Mode.TOOLS should work, shouldn't it?
How has been your experience with this?

@github-actions github-actions bot added help wanted Extra attention is needed question Further information is requested labels Feb 4, 2025
@ivanleomk
Copy link
Collaborator

I got it to work with a model hosted on Modal running an OpenAI server https://modal.com/docs/examples/vllm_inference that worked out of the box with the TOOLS mode.

Tested it last week with Qwen-2-VL. Going to close this issue for now since it's not an issue, but feel free to open it again if you encounter the same issues.

@jjovalle99
Copy link
Author

jjovalle99 commented Feb 8, 2025

hello @ivanleomk!

Sorry to reopen, I tested it with Qwen 2.5 VL 72B but it didn't work with tool mode. Here is how I deployed:

vllm serve Qwen/Qwen2.5-VL-72B-Instruct --port 8000 --host 0.0.0.0 --dtype bfloat16 --tensor-parallel-size 4 \
--limit-mm-per-prompt image=5,video=0 --enable-auto-tool-choice --tool-call-parser hermes

(I also tested without --enable-auto-tool-choice --tool-call-parser hermes)

This is the python code:

class Response(BaseModel):
    reasoning: str
    answer: str

images_path = Path(
    "/Users/juanovalle/Informa Repositories/ingestion_pipeline/data/images_inference/2023"
)
image1 = instructor.Image.from_path(images_path / "2023_0002.png")


vllm_url = "http://192.153.62.139:8000/v1"
vllm_api_key = "emtpy"
model_name="Qwen/Qwen2.5-VL-72B-Instruct"
vllm_client = AsyncOpenAI(base_url=vllm_url, api_key=vllm_api_key)
instructor_client = instructor.from_openai(
    client=vllm_client
)

response = await instructor_client.chat.completions.create_with_completion(
    model=model_name,
    response_model=Response,
    messages=[
        {
            "role": "user",
            "content": ["How many colleagues doe sinforma have", image1],
        },
    ],
    max_tokens=1024,
    temperature=0.0,
)

And I got this error:

RetryError: RetryError[<Future at 0x12975dc40 state=finished raised BadRequestError>]
[...]
InstructorRetryException: Error code: 400 - {'object': 'error', 'message': 'Expecting value: line 1 column 1 (char 0)', 'type': 'BadRequestError', 'param': None, 'code': 400}

These are the logs from the server:

INFO 02-08 11:19:07 logger.py:39] Received request chatcmpl-ae0907b2968c4822a9962a1949cae300: prompt: '<|im_start|>system\nYou are a helpful assistant.\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{"type": "function", "function": {"name": "Response", "description": "Correctly extracted `Response` with all the required parameters with correct types", "parameters": {"properties": {"reasoning": {"title": "Reasoning", "type": "string"}, "answer": {"title": "Answer", "type": "string"}}, "required": ["answer", "reasoning"], "type": "object"}}}\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{"name": <function-name>, "arguments": <args-json-object>}\n</tool_call><|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>\nHow many colleagues doe sinforma have<|im_end|>\n<|im_start|>assistant\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=1024, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=GuidedDecodingParams(json={'properties': {'reasoning': {'title': 'Reasoning', 'type': 'string'}, 'answer': {'title': 'Answer', 'type': 'string'}}, 'required': ['answer', 'reasoning'], 'type': 'object'}, regex=None, choice=None, grammar=None, json_object=None, backend=None, whitespace_pattern=None)), prompt_token_ids: None, lora_request: None, prompt_adapter_request: None.
INFO 02-08 11:19:07 async_llm.py:161] Added request chatcmpl-ae0907b2968c4822a9962a1949cae300.
INFO 02-08 11:19:09 loggers.py:72] Avg prompt throughput: 2587.6 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs GPU KV cache usage: 4.2%.
INFO:     100.67.5.15:1886 - "POST /v1/chat/completions HTTP/1.1" 200 OK
WARNING 02-08 11:19:09 chat_utils.py:825] Skipping multimodal part (type: 'text')with empty / unparsable content.
ERROR 02-08 11:19:09 serving_chat.py:193] Error in preprocessing prompt inputs
ERROR 02-08 11:19:09 serving_chat.py:193] Traceback (most recent call last):
ERROR 02-08 11:19:09 serving_chat.py:193]   File "/home/ubuntu/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 177, in create_chat_completion
ERROR 02-08 11:19:09 serving_chat.py:193]     ) = await self._preprocess_chat(
ERROR 02-08 11:19:09 serving_chat.py:193]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-08 11:19:09 serving_chat.py:193]   File "/home/ubuntu/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 386, in _preprocess_chat
ERROR 02-08 11:19:09 serving_chat.py:193]     conversation, mm_data_future = parse_chat_messages_futures(
ERROR 02-08 11:19:09 serving_chat.py:193]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-08 11:19:09 serving_chat.py:193]   File "/home/ubuntu/.venv/lib/python3.12/site-packages/vllm/entrypoints/chat_utils.py", line 959, in parse_chat_messages_futures
ERROR 02-08 11:19:09 serving_chat.py:193]     _postprocess_messages(conversation)
ERROR 02-08 11:19:09 serving_chat.py:193]   File "/home/ubuntu/.venv/lib/python3.12/site-packages/vllm/entrypoints/chat_utils.py", line 914, in _postprocess_messages
ERROR 02-08 11:19:09 serving_chat.py:193]     item["function"]["arguments"] = json.loads(
ERROR 02-08 11:19:09 serving_chat.py:193]                                     ^^^^^^^^^^^
ERROR 02-08 11:19:09 serving_chat.py:193]   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/json/__init__.py", line 346, in loads
ERROR 02-08 11:19:09 serving_chat.py:193]     return _default_decoder.decode(s)
ERROR 02-08 11:19:09 serving_chat.py:193]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-08 11:19:09 serving_chat.py:193]   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/json/decoder.py", line 338, in decode
ERROR 02-08 11:19:09 serving_chat.py:193]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
ERROR 02-08 11:19:09 serving_chat.py:193]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-08 11:19:09 serving_chat.py:193]   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/json/decoder.py", line 356, in raw_decode
ERROR 02-08 11:19:09 serving_chat.py:193]     raise JSONDecodeError("Expecting value", s, err.value) from None
ERROR 02-08 11:19:09 serving_chat.py:193] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
INFO:     100.67.5.15:1886 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
WARNING 02-08 11:19:12 chat_utils.py:825] Skipping multimodal part (type: 'text')with empty / unparsable content.
ERROR 02-08 11:19:12 serving_chat.py:193] Error in preprocessing prompt inputs
ERROR 02-08 11:19:12 serving_chat.py:193] Traceback (most recent call last):
ERROR 02-08 11:19:12 serving_chat.py:193]   File "/home/ubuntu/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_chat.py", line 177, in create_chat_completion
ERROR 02-08 11:19:12 serving_chat.py:193]     ) = await self._preprocess_chat(
ERROR 02-08 11:19:12 serving_chat.py:193]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-08 11:19:12 serving_chat.py:193]   File "/home/ubuntu/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/serving_engine.py", line 386, in _preprocess_chat
ERROR 02-08 11:19:12 serving_chat.py:193]     conversation, mm_data_future = parse_chat_messages_futures(
ERROR 02-08 11:19:12 serving_chat.py:193]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-08 11:19:12 serving_chat.py:193]   File "/home/ubuntu/.venv/lib/python3.12/site-packages/vllm/entrypoints/chat_utils.py", line 959, in parse_chat_messages_futures
ERROR 02-08 11:19:12 serving_chat.py:193]     _postprocess_messages(conversation)
ERROR 02-08 11:19:12 serving_chat.py:193]   File "/home/ubuntu/.venv/lib/python3.12/site-packages/vllm/entrypoints/chat_utils.py", line 914, in _postprocess_messages
ERROR 02-08 11:19:12 serving_chat.py:193]     item["function"]["arguments"] = json.loads(
ERROR 02-08 11:19:12 serving_chat.py:193]                                     ^^^^^^^^^^^
ERROR 02-08 11:19:12 serving_chat.py:193]   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/json/__init__.py", line 346, in loads
ERROR 02-08 11:19:12 serving_chat.py:193]     return _default_decoder.decode(s)
ERROR 02-08 11:19:12 serving_chat.py:193]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-08 11:19:12 serving_chat.py:193]   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/json/decoder.py", line 338, in decode
ERROR 02-08 11:19:12 serving_chat.py:193]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
ERROR 02-08 11:19:12 serving_chat.py:193]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-08 11:19:12 serving_chat.py:193]   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/json/decoder.py", line 356, in raw_decode
ERROR 02-08 11:19:12 serving_chat.py:193]     raise JSONDecodeError("Expecting value", s, err.value) from None
ERROR 02-08 11:19:12 serving_chat.py:193] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
INFO:     100.67.5.15:1886 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
INFO 02-08 11:19:14 loggers.py:72] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 2.4 tokens/s, Running: 0 reqs, Waiting: 0 reqs GPU KV cache usage: 0.0%.
INFO 02-08 11:19:19 loggers.py:72] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs GPU KV cache usage: 0.0%.

PD: It works with JSON mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants