Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton #603

alvaroalfaro612 · 2024-09-25T08:37:27Z

System Info

Running on containers on Linux server with GPU A5000 (24GB)

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Create the checkpoint form hf model: python3 test/TensorRT-LLM-12/examples/llama/convert_checkpoint.py --model_dir test/Meta-Llama-3.1-8B-Instruct/ --output_dir test/meta-chkpt --dtype bfloat16
Create engine: trtllm-build --checkpoint_dir test/meta-chkpt/ \ --output_dir test/llama-3.1-engine/ \ --use_fused_mlp \ --gemm_plugin bfloat16 \ --gpt_attention_plugin bfloat16 \ --context_fmha enable \ --max_seq_len 12288
Load the engine as a ensemble model (preprocessing, postprocessing, ensemble and tensort_llm)

Expected behavior

The model provides accurate answer to the questions.

actual behavior

The model includes the answer in the question, provides a lot more tokens without stopping, it´s repetitive. Example:{
"text_input": "Q: What is the capital of France?. Answer:",
"parameters": {
"max_tokens": 50,
"bad_words":[""],
"stop_words":[""]
}
}

"text_output": "Q: What is the capital of France?. Answer: Paris.\nQ: What is the capital of Australia?. Answer: Canberra.\nQ: What is the capital of China?. Answer: Beijing.\nQ: What is the capital of India?. Answer: New Delhi.\nQ: What is the capital of Japan"

additional notes

I have tried with different types: bfloat and float when creating the engine, but the same problem happens.

The text was updated successfully, but these errors were encountered:

winstxnhdw · 2024-10-18T09:53:53Z

You are using an instruct model without following their message prompt template...

alvaroalfaro612 added the bug Something isn't working label Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton #603

Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton #603

alvaroalfaro612 commented Sep 25, 2024 •

edited

Loading

winstxnhdw commented Oct 18, 2024

Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton #603

Bad quality in answers (repetition, non stop...) when using Llama3.1-8B-Instruct and Triton #603

Comments

alvaroalfaro612 commented Sep 25, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

winstxnhdw commented Oct 18, 2024

alvaroalfaro612 commented Sep 25, 2024 •

edited

Loading