-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] huge memory usage on iOS device using qwen2.-5-3B, is it a normal perfermance? #3083
Comments
Thank you for the question. In addition to the model size, the memory consumption also contains the KV cache size and some other temporary buffer size. The memory usage you showed here looks reasonable to me. |
thanks for your reply, did i have any method to reduce the memory usage, it is too high to use on iphone, i also test other device llm-inference framework, the memoty usage is lower. |
@ted1995 You can try to reduce
|
@MasterJH5574 i have try your overrides config, but the memory uasge is still huge as followed picture after compile phrase, i found the dist/bundle/qwen-2.5-3B-q4f16_1-MLC/mlc-chat-config.json
the context_window_size and prefill_chunk_size in json file are as followed, not my override value 768 and 256, is this cause the high memory usage?
|
@eaaajay no, our runtime logic will prioritize the values |
❓ General Questions
i convert qwen2.5-3B model to mlc format, when run model on iphone13 pro(ios18), the memory usage is very high, bigger than model size, as the follow picture show:
mlc-package-config.json file content:
The text was updated successfully, but these errors were encountered: