-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2-14B generate_stream return some garbled code #606
Comments
Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible? |
Just by setting up streaming requests, sometimes the responses may contain individual garbled characters. In the screenshot above, the input prompt=”第五项修炼“, the issue can be resolved by setting bad_word=["1."], but other cases where some garbled characters in the response cannot be resolved." Simply turn on accumulate_tokens resolves the garbled character issue, so I personally believe it should be related to the decoding capability of the tokens. |
Hi, I have the same problem. Is there any solution? |
Hi, I have the same problem. Is there any solution? |
Description
stream request return garbled code
Triton Information
tritonserver 24:08
run container with this image: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
To Reproduce
Steps to reproduce the behavior.
This issue only occurs when using a streaming request.
v2/models/tensorrt_llm_bls/generate_stream (both ensemble)
payload = {
"text_input": QWEN_PROMPT_TEMPLATE.format(input_text=prompt),
"max_tokens": max_tokens,
"stream": True,
}
The screenshot below shows the results of non-streaming and streaming requests.
Expected behavior
same result with v2/models/tensorrt_llm_bls/generate
The text was updated successfully, but these errors were encountered: