An error occurred when vLLM deployed the GGUF file #13

bingoohe · 2024-09-30T09:05:26Z

Hi! This is a great job. I have tried using the vLLM deployment model. The vLLM service can be started normally, but the following error occurs when the service is invoked.
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}

The service startup command is as follows:
CUDA_VISIBLE_DEVICES=0 vllm serve ./xLAM-7b-fc/xLAM-7b-fc-r.Q5_K_M.gguf
--trust-remote-code
--served-model-name xlam-7b-fc
--port 4040
--api-key agent-model
--gpu-memory-utilization 0.5

vllm=0.6.0
transformers=4.43.2

from openai import OpenAI
client = OpenAI()
messages=[]
messages.append({"role": "user", "content": "你好"})
result = client.chat.completions.create(messages=messages, model=model_name, temperature=0)

jianguoz · 2024-10-01T06:43:44Z

@zuxin666 Could you take a look at the issue regarding xLAM-7b-fc-r??

liuzuxin · 2024-10-01T06:51:50Z

Hi @bingoohe , we haven't tried using vllm to deploy the GGUF versions of xLAM models. As the vllm official doc said, it might still be an experimental feature. So we would suggest you use our non-quantized versions when deployed with vllm, or follow the deployment instructions here for the quantized models.

jianguoz assigned zuxin666 Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An error occurred when vLLM deployed the GGUF file #13

An error occurred when vLLM deployed the GGUF file #13

bingoohe commented Sep 30, 2024

jianguoz commented Oct 1, 2024

liuzuxin commented Oct 1, 2024

An error occurred when vLLM deployed the GGUF file #13

An error occurred when vLLM deployed the GGUF file #13

Comments

bingoohe commented Sep 30, 2024

jianguoz commented Oct 1, 2024

liuzuxin commented Oct 1, 2024