Qwen2-14B generate_stream return some garbled code #606

kazyun · 2024-09-24T06:42:57Z

Description
stream request return garbled code

Triton Information
tritonserver 24:08
run container with this image: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3

To Reproduce
Steps to reproduce the behavior.

This issue only occurs when using a streaming request.
v2/models/tensorrt_llm_bls/generate_stream (both ensemble)
payload = {
"text_input": QWEN_PROMPT_TEMPLATE.format(input_text=prompt),
"max_tokens": max_tokens,
"stream": True,
}

The screenshot below shows the results of non-streaming and streaming requests.

Expected behavior
same result with v2/models/tensorrt_llm_bls/generate

oandreeva-nv · 2024-09-27T22:32:14Z

Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?

kazyun · 2024-09-29T02:42:14Z

Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?

Just by setting up streaming requests, sometimes the responses may contain individual garbled characters. In the screenshot above, the input prompt=”第五项修炼“, the issue can be resolved by setting bad_word=["1."], but other cases where some garbled characters in the response cannot be resolved."

Simply turn on accumulate_tokens resolves the garbled character issue, so I personally believe it should be related to the decoding capability of the tokens.

will-jay · 2024-10-28T14:00:49Z

Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?

Hi, I have the same problem. Is there any solution?

will-jay · 2024-10-28T14:00:55Z

Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?

Just by setting up streaming requests, sometimes the responses may contain individual garbled characters. In the screenshot above, the input prompt=”第五项修炼“, the issue can be resolved by setting bad_word=["1."], but other cases where some garbled characters in the response cannot be resolved."

Simply turn on accumulate_tokens resolves the garbled character issue, so I personally believe it should be related to the decoding capability of the tokens.

Hi, I have the same problem. Is there any solution?

oandreeva-nv transferred this issue from triton-inference-server/server Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2-14B generate_stream return some garbled code #606

Qwen2-14B generate_stream return some garbled code #606

kazyun commented Sep 24, 2024

oandreeva-nv commented Sep 27, 2024

kazyun commented Sep 29, 2024

will-jay commented Oct 28, 2024

will-jay commented Oct 28, 2024

Qwen2-14B generate_stream return some garbled code #606

Qwen2-14B generate_stream return some garbled code #606

Comments

kazyun commented Sep 24, 2024

oandreeva-nv commented Sep 27, 2024

kazyun commented Sep 29, 2024

will-jay commented Oct 28, 2024

will-jay commented Oct 28, 2024