-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Broken for Intel Macs since v0.15 (or earlier) #3078
Comments
just tried mlc stable 0.18.1 + python3.12 in the fresh conda env, and the result is the same: fail Details
|
Thank you @zxcat for bringing this up. The workflow goes smooth on Apple Silicon Macs on our end. However, we don't have available Intel Macs at this moment to test. |
@MasterJH5574, I can try to gather more info on my end. Are there options to do so? btw, seems #2995 is the same issue |
thanks for sharing the latest working build, it's a shame it's no longer functional because there aren't many other backends that support intel mac. maybe I'll check which commit is the cause later on |
I've got exactly the same issue on DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC Traceback (most recent call last): |
🐛 Bug
On macos Ventura
mlc_llm
fails with:in chat/rest mode.
I've tried every accessible mcl_ai/mlc_llm whl pair: current nightly, v0.18.1, v0.17.2, v0.17.1 with
_cpu
suffix and nightly 0.15 without the suffix, but error is the same. I've tried different models, sometimes there isfused_dequantize1_NT_matmul1_…
function instead offused_dequantize1_NT_matmul5_…
, but error persists.There was another error on Catalina: something about unsupported metal version 2.3.
To Reproduce
Steps to reproduce the behavior:
mlc_llm chat --overrides "prefill_chunk_size=4096" ./
prefill_chunk_size
, because with default value there is "not enough GPU memory" error.Model compiles. but fails when chat starts:
Details
Same with
serve
mode and other models.Expected behavior
Chat works.
Environment
conda
, source): pip (tried every stable+nightly version)pip
, source): pip (tried every stable+nightly version)python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models): -Additional context
It seems that the problem is not new. The oldest mlc-ai/mlc-llm pair I was able to test is from September:
The stable version of
mlc_ai_cpu-0.15.1-cp311-cp311-macosx_10_15_x86_64
(from August) has nomlc_llm_…
pair, so I cannot test it, it cannot work with 0.17+mlc_llm
.The text was updated successfully, but these errors were encountered: