-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference with longer context (16k) outputs nonsensical numbers and symbols #1544
Comments
Would you happen to know which model? Does the original base model support long context? If no, then sadly this is expected - if your dataset is not long, then doing too long sequence lengths won't work. A trick is to mix your dataset with some very long context examples from Hugging Face public datasets |
@danielhanchen Thank you for your reponse :) The model used is unsloth/Llama-3.3-70B-Instruct-bnb-4bit so this seems owkey. Edit: I will add some much longer questions and see if it improves. I will update here. |
@ignaceHelsen Oh wait another possibility is our inference engine is broken somehow on super long sequences - would it be possible to do inference using Hugging Face natively and seeing if it works fine (vs Unsloth's fast inference?) |
@Erland366 Could you check if if inference on longer than 16K works as expected - thanks - maybe telling it to count from 1 to 1000000 etc |
@danielhanchen I finally had the chance to test it out again, sorry for the delay. My previous dataset had samples that were at most ~5.4k tokens long, shorter than I thought. I added some samples that were around 25k and 40k tokens long and testing it now, it seems like it's outputting normal text. I will test using vllm (awq) and perhaps if time permits a gguf soon. Will keep you updated. |
Update AWQ's output did not seem to make sense at first so I switched to gguf with very satisfying results. Everything seems fine. Thanks once again for the responses :) |
Ok great! Glad you solved the issue! |
Hello,
I've been enjoying using unsloth and I've trained my first lora with a lora trianing context of 32768.
I have been doing inference tests with lower context lengths and the output is just normal text like I finetuned.
However, when I start going over ~12k tokens, the output is (I capped it at the first 24 tokens for this showcase but it goes on):
My code:
I counted the number of tokens in query which is 21090 for this example. I checked my prompt for any strange words that might cause this but the query seems fine.
Could this be because I'm close to filling the context length of my trained lora (32k)?
I have been looking around for issues with similar problems but I couldn't find any so hence my post here :)
Any help is greatly appreciated!
The text was updated successfully, but these errors were encountered: