[Question] huge memory usage on iOS device using qwen2.-5-3B, is it a normal perfermance? #3083

eaaajay · 2025-01-06T01:57:44Z

❓ General Questions

i convert qwen2.5-3B model to mlc format, when run model on iphone13 pro(ios18), the memory usage is very high, bigger than model size, as the follow picture show:

mlc-package-config.json file content:

{
    "device": "iphone",
    "model_list": [
        {
            "model": "/Users/wangyujie/mlc-llm/dist/qwen-2.5-0.5B-mlc",
            "model_id": "qwen2.5-0.5B-q4f16_1-MLC",
            "estimated_vram_bytes": 3316000000,
            "bundle_weight": true,
            "overrides": {
                           "context_window_size": 1024
                        }
        },
        {
            "model": "/Users/wangyujie/mlc-llm/dist/qwen-2.5-3B-mlc",
            "model_id": "qwen-2.5-3B-q4f16_1-MLC",
            "estimated_vram_bytes": 3316000000,
            "bundle_weight": true,
            "overrides": {
                           "context_window_size": 1024
                        }
        }
        
    ]
}

The text was updated successfully, but these errors were encountered:

MasterJH5574 · 2025-01-06T03:01:43Z

Thank you for the question. In addition to the model size, the memory consumption also contains the KV cache size and some other temporary buffer size. The memory usage you showed here looks reasonable to me.

eaaajay · 2025-01-06T09:54:00Z

Thank you for the question. In addition to the model size, the memory consumption also contains the KV cache size and some other temporary buffer size. The memory usage you showed here looks reasonable to me.

thanks for your reply, did i have any method to reduce the memory usage, it is too high to use on iphone, i also test other device llm-inference framework, the memoty usage is lower.

MasterJH5574 · 2025-01-06T16:17:00Z

@ted1995 You can try to reduce context_window_size and prefill_chunk_size in mlc-package-config.json. For example like

            "overrides": {
                           "context_window_size": 768,
                           "prefill_chunk_size": 256
                        }

eaaajay · 2025-01-07T03:41:45Z

@ted1995 You can try to reduce context_window_size and prefill_chunk_size in mlc-package-config.json. For example like您可以尝试在 mlc-package-config.json 中减少 context_window_size 和 prefill_chunk_size 。例如，像
            "overrides": {
                           "context_window_size": 768,
                           "prefill_chunk_size": 256
                        }

@MasterJH5574 i have try your overrides config, but the memory uasge is still huge as followed picture

after compile phrase, i found the dist/bundle/qwen-2.5-3B-q4f16_1-MLC/mlc-chat-config.json

{
  "version": "0.1.0",
  "model_type": "qwen2",
  "quantization": "q4f16_1",
  "model_config": {
    "hidden_act": "silu",
    "hidden_size": 2048,
    "intermediate_size": 11008,
    "num_attention_heads": 16,
    "num_hidden_layers": 36,
    "num_key_value_heads": 2,
    "rms_norm_eps": 1e-06,
    "rope_theta": 1000000.0,
    "vocab_size": 151936,
    "tie_word_embeddings": true,
    "context_window_size": 32768,
    "prefill_chunk_size": 8192,
    "tensor_parallel_shards": 1,
    "head_dim": 128,
    "dtype": "float32",
    "max_batch_size": 128
  },
  "vocab_size": 151936,
  "context_window_size": 32768,
  "sliding_window_size": -1,
  "prefill_chunk_size": 8192,
  "attention_sink_size": -1,
  "tensor_parallel_shards": 1,
  "pipeline_parallel_stages": 1,
  "temperature": 0.7,
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "repetition_penalty": 1.05,
  "top_p": 0.8,
  "tokenizer_files": [
    "tokenizer.json",
    "vocab.json",
    "merges.txt",
    "tokenizer_config.json"
  ],
  "tokenizer_info": {
    "token_postproc_method": "byte_level",
    "prepend_space_in_encode": false,
    "strip_space_in_decode": false
  },
  "conv_template": {
    "name": "qwen2",
    "system_template": "<|im_start|>system\n{system_message}<|im_end|>\n",
    "system_message": "You are a helpful assistant.",
    "system_prefix_token_ids": null,
    "add_role_after_system_message": true,
    "roles": {
      "user": "<|im_start|>user",
      "assistant": "<|im_start|>assistant"
    },
    "role_templates": {
      "user": "{user_message}",
      "assistant": "{assistant_message}",
      "tool": "{tool_message}"
    },
    "messages": [],
    "seps": [
      "<|im_end|>\n"
    ],
    "role_content_sep": "\n",
    "role_empty_sep": "\n",
    "stop_str": [
      "<|endoftext|>",
      "<|im_end|>"
    ],
    "stop_token_ids": [
      151643,
      151645
    ],
    "function_string": "",
    "use_function_calling": false
  },
  "pad_token_id": 151643,
  "bos_token_id": 151643,
  "eos_token_id": [
    151645,
    151643
  ]
}

the context_window_size and prefill_chunk_size in json file are as followed, not my override value 768 and 256, is this cause the high memory usage?

"context_window_size": 32768,
"prefill_chunk_size": 8192,

@MasterJH5574

MasterJH5574 · 2025-01-27T15:22:37Z

@eaaajay no, our runtime logic will prioritize the values "overrides".

eaaajay added the question Question about the usage label Jan 6, 2025

eaaajay closed this as completed Jan 7, 2025

eaaajay reopened this Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] huge memory usage on iOS device using qwen2.-5-3B, is it a normal perfermance? #3083

[Question] huge memory usage on iOS device using qwen2.-5-3B, is it a normal perfermance? #3083

eaaajay commented Jan 6, 2025 •

edited

Loading

MasterJH5574 commented Jan 6, 2025

eaaajay commented Jan 6, 2025

MasterJH5574 commented Jan 6, 2025

eaaajay commented Jan 7, 2025 •

edited

Loading

MasterJH5574 commented Jan 27, 2025

[Question] huge memory usage on iOS device using qwen2.-5-3B, is it a normal perfermance? #3083

[Question] huge memory usage on iOS device using qwen2.-5-3B, is it a normal perfermance? #3083

Comments

eaaajay commented Jan 6, 2025 • edited Loading

❓ General Questions

MasterJH5574 commented Jan 6, 2025

eaaajay commented Jan 6, 2025

MasterJH5574 commented Jan 6, 2025

eaaajay commented Jan 7, 2025 • edited Loading

MasterJH5574 commented Jan 27, 2025

eaaajay commented Jan 6, 2025 •

edited

Loading

eaaajay commented Jan 7, 2025 •

edited

Loading