Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2-14B generate_stream return some garbled code #606

Open
kazyun opened this issue Sep 24, 2024 · 4 comments
Open

Qwen2-14B generate_stream return some garbled code #606

kazyun opened this issue Sep 24, 2024 · 4 comments

Comments

@kazyun
Copy link

kazyun commented Sep 24, 2024

Description
stream request return garbled code

Triton Information
tritonserver 24:08
run container with this image: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3

To Reproduce
Steps to reproduce the behavior.

This issue only occurs when using a streaming request.
v2/models/tensorrt_llm_bls/generate_stream (both ensemble)
payload = {
"text_input": QWEN_PROMPT_TEMPLATE.format(input_text=prompt),
"max_tokens": max_tokens,
"stream": True,
}

The screenshot below shows the results of non-streaming and streaming requests.
Dingtalk_20240924143637

Expected behavior
same result with v2/models/tensorrt_llm_bls/generate

@oandreeva-nv
Copy link

Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?

@oandreeva-nv oandreeva-nv transferred this issue from triton-inference-server/server Sep 27, 2024
@kazyun
Copy link
Author

kazyun commented Sep 29, 2024

Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?

Just by setting up streaming requests, sometimes the responses may contain individual garbled characters. In the screenshot above, the input prompt=”第五项修炼“, the issue can be resolved by setting bad_word=["1."], but other cases where some garbled characters in the response cannot be resolved."

Simply turn on accumulate_tokens resolves the garbled character issue, so I personally believe it should be related to the decoding capability of the tokens.

@will-jay
Copy link

Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?

Hi, I have the same problem. Is there any solution?

@will-jay
Copy link

Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?

Just by setting up streaming requests, sometimes the responses may contain individual garbled characters. In the screenshot above, the input prompt=”第五项修炼“, the issue can be resolved by setting bad_word=["1."], but other cases where some garbled characters in the response cannot be resolved."

Simply turn on accumulate_tokens resolves the garbled character issue, so I personally believe it should be related to the decoding capability of the tokens.

Hi, I have the same problem. Is there any solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants