Tool calling with llm.chat #12557
-
According to this example I create some model: tokenizer = AutoTokenizer.from_pretrained(model_dir)
sampling_params = SamplingParams(temperature=0.85, top_p=0.9, repetition_penalty=1.1, max_tokens=2048)
llm = LLM(model=model_dir, gpu_memory_utilization=0.9, max_model_len=16384) Do inference: text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
output = llm.chat([text], sampling_params, tools=tools)[0]
generated_text = output.outputs[0].text Then I ask the model about temperature in San Francisco and it fails.
Any suggestions? |
Beta Was this translation helpful? Give feedback.
Answered by
alexanderbrodko
Jan 29, 2025
Replies: 1 comment
-
My bad. I do not need tokenize when I use
In fact, the model is Qwen2.5-Coder-Instruct-0.5B |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
alexanderbrodko
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
My bad. I do not need tokenize when I use
llm.chat
instead ofllm.generate
. It works, the model answer isIn fact, the model is Qwen2.5-Coder-Instruct-0.5B