You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! This is a great job. I have tried using the vLLM deployment model. The vLLM service can be started normally, but the following error occurs when the service is invoked.
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}
The service startup command is as follows:
CUDA_VISIBLE_DEVICES=0 vllm serve ./xLAM-7b-fc/xLAM-7b-fc-r.Q5_K_M.gguf
--trust-remote-code
--served-model-name xlam-7b-fc
--port 4040
--api-key agent-model
--gpu-memory-utilization 0.5
vllm=0.6.0
transformers=4.43.2
from openai import OpenAI
client = OpenAI()
messages=[]
messages.append({"role": "user", "content": "你好"})
result = client.chat.completions.create(messages=messages, model=model_name, temperature=0)
The text was updated successfully, but these errors were encountered:
Hi @bingoohe , we haven't tried using vllm to deploy the GGUF versions of xLAM models. As the vllm official doc said, it might still be an experimental feature. So we would suggest you use our non-quantized versions when deployed with vllm, or follow the deployment instructions here for the quantized models.
Hi! This is a great job. I have tried using the vLLM deployment model. The vLLM service can be started normally, but the following error occurs when the service is invoked.
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.', 'type': 'BadRequestError', 'param': None, 'code': 400}
The service startup command is as follows:
CUDA_VISIBLE_DEVICES=0 vllm serve ./xLAM-7b-fc/xLAM-7b-fc-r.Q5_K_M.gguf
--trust-remote-code
--served-model-name xlam-7b-fc
--port 4040
--api-key agent-model
--gpu-memory-utilization 0.5
vllm=0.6.0
transformers=4.43.2
from openai import OpenAI
client = OpenAI()
messages=[]
messages.append({"role": "user", "content": "你好"})
result = client.chat.completions.create(messages=messages, model=model_name, temperature=0)
The text was updated successfully, but these errors were encountered: