You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the output is incomplete compared to the results obtained from using the script. What could be the cause of this issue?
maybe the config.pbtxt cause the problem
Who can help?
No response
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
start triton server
Expected behavior
get the same results with run.py script
actual behavior
When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the output is incomplete compared to the results obtained from using the script. What could be the cause of this issue?
additional notes
no
The text was updated successfully, but these errors were encountered:
This issue only occurs when using a streaming request.
payload = {
"text_input": QWEN_PROMPT_TEMPLATE.format(input_text=prompt),
"max_tokens": max_tokens,
"stream": True,
}
This issue only occurs when using a streaming request. payload = { "text_input": QWEN_PROMPT_TEMPLATE.format(input_text=prompt), "max_tokens": max_tokens, "stream": True, }
System Info
When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the output is incomplete compared to the results obtained from using the script. What could be the cause of this issue?
maybe the config.pbtxt cause the problem
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
get the same results with run.py script
actual behavior
When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the output is incomplete compared to the results obtained from using the script. What could be the cause of this issue?
additional notes
no
The text was updated successfully, but these errors were encountered: