Inference with longer context (16k) outputs nonsensical numbers and symbols #1544

ignaceHelsen · 2025-01-15T12:13:14Z

Hello,

I've been enjoying using unsloth and I've trained my first lora with a lora trianing context of 32768.
I have been doing inference tests with lower context lengths and the output is just normal text like I finetuned.

However, when I start going over ~12k tokens, the output is (I capped it at the first 24 tokens for this showcase but it goes on):

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
 ( (�. 1. 1. 1. 1. 2. 1. 1

My code:

max_seq_length = 32768
dtype = None 
load_in_4bit = True
use_gradient_checkpointing = "unsloth"
    
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="outputs/lora_model",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

FastLanguageModel.for_inference(model)
messages = [
    {
        "role": "user",
        "content": query,
    },
]
input_ids = tokenizer.apply_chat_template(
     messages,
     add_generation_prompt=True,
     return_tensors="pt",
).to("cuda")
       
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
output = model.generate(input_ids, streamer = text_streamer, max_new_tokens = 24, temperature=0.5, pad_token_id = tokenizer.eos_token_id)

I counted the number of tokens in query which is 21090 for this example. I checked my prompt for any strange words that might cause this but the query seems fine.

Could this be because I'm close to filling the context length of my trained lora (32k)?
I have been looking around for issues with similar problems but I couldn't find any so hence my post here :)

Any help is greatly appreciated!

The text was updated successfully, but these errors were encountered:

danielhanchen · 2025-01-16T11:16:55Z

Would you happen to know which model? Does the original base model support long context? If no, then sadly this is expected - if your dataset is not long, then doing too long sequence lengths won't work.

A trick is to mix your dataset with some very long context examples from Hugging Face public datasets

ignaceHelsen · 2025-01-16T11:39:43Z

@danielhanchen Thank you for your reponse :)

The model used is unsloth/Llama-3.3-70B-Instruct-bnb-4bit so this seems owkey.
I saw this page that stated that unsloth supports 89K long context for this model.
My dataset I used to train does also contain longer contexts.

Edit: I will add some much longer questions and see if it improves. I will update here.

danielhanchen · 2025-01-19T11:35:58Z

@ignaceHelsen Oh wait another possibility is our inference engine is broken somehow on super long sequences - would it be possible to do inference using Hugging Face natively and seeing if it works fine (vs Unsloth's fast inference?)

danielhanchen · 2025-01-19T11:36:35Z

@Erland366 Could you check if if inference on longer than 16K works as expected - thanks - maybe telling it to count from 1 to 1000000 etc

ignaceHelsen · 2025-01-20T19:14:30Z

@danielhanchen I finally had the chance to test it out again, sorry for the delay. My previous dataset had samples that were at most ~5.4k tokens long, shorter than I thought. I added some samples that were around 25k and 40k tokens long and testing it now, it seems like it's outputting normal text. I will test using vllm (awq) and perhaps if time permits a gguf soon. Will keep you updated.

ignaceHelsen · 2025-01-23T13:36:18Z

Update

AWQ's output did not seem to make sense at first so I switched to gguf with very satisfying results. Everything seems fine.
For me, this issue can be closed.

Thanks once again for the responses :)

danielhanchen · 2025-01-26T21:43:15Z

Ok great! Glad you solved the issue!

danielhanchen closed this as completed Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference with longer context (16k) outputs nonsensical numbers and symbols #1544

Inference with longer context (16k) outputs nonsensical numbers and symbols #1544

ignaceHelsen commented Jan 15, 2025 •

edited

Loading

danielhanchen commented Jan 16, 2025

ignaceHelsen commented Jan 16, 2025 •

edited

Loading

danielhanchen commented Jan 19, 2025

danielhanchen commented Jan 19, 2025

ignaceHelsen commented Jan 20, 2025 •

edited

Loading

ignaceHelsen commented Jan 23, 2025 •

edited

Loading

danielhanchen commented Jan 26, 2025

Inference with longer context (16k) outputs nonsensical numbers and symbols #1544

Inference with longer context (16k) outputs nonsensical numbers and symbols #1544

Comments

ignaceHelsen commented Jan 15, 2025 • edited Loading

danielhanchen commented Jan 16, 2025

ignaceHelsen commented Jan 16, 2025 • edited Loading

danielhanchen commented Jan 19, 2025

danielhanchen commented Jan 19, 2025

ignaceHelsen commented Jan 20, 2025 • edited Loading

ignaceHelsen commented Jan 23, 2025 • edited Loading

danielhanchen commented Jan 26, 2025

ignaceHelsen commented Jan 15, 2025 •

edited

Loading

ignaceHelsen commented Jan 16, 2025 •

edited

Loading

ignaceHelsen commented Jan 20, 2025 •

edited

Loading

ignaceHelsen commented Jan 23, 2025 •

edited

Loading