Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to convert llama model to mlir #900

Open
bhbruce opened this issue Jan 9, 2025 · 3 comments
Open

Fail to convert llama model to mlir #900

bhbruce opened this issue Jan 9, 2025 · 3 comments

Comments

@bhbruce
Copy link
Contributor

bhbruce commented Jan 9, 2025

Environment setup

  1. Modify requirement.txt to change version of transformer
-transformers==4.37.1
+transformers==4.40.0
  1. Install packages
pip install --no-compile --pre --upgrade -e models -r models/requirements.txt

Instruction to reproduce error

python3 models/turbine_models/custom_models/stateless_llama.py --hf_model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0" --compile_to=torch --external_weights="safetensors" --quantization="unquantized" --precision="f16" --external_weight_file=w.safetensors

Log

lib/python3.11/site-packages/iree/turbine/aot/support/procedural/primitives.py", line 209, in _to_meta_tensor
    assert not any(
AssertionError: Unsupported dynamic dims in meta tensor

pip freeze

accelerate==1.2.1
aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiosignal==1.3.2
attrs==24.3.0
azure-core==1.32.0
azure-storage-blob==12.24.0
brevitas @ git+https://github.com/Xilinx/brevitas.git@6695e8df7f6a2c7715b9ed69c4b78157376bb60b
certifi==2024.12.14
cffi==1.17.1
charset-normalizer==3.4.1
cryptography==44.0.0
datasets==3.0.1
dependencies==2.0.1
diffusers @ git+https://github.com/nod-ai/diffusers@8fe5c93c70cd985dd589424d40a0116253300b4f
dill==0.3.8
einops==0.8.0
filelock==3.16.1
frozenlist==1.5.0
fsspec==2024.6.1
gguf==0.14.0
huggingface-hub==0.22.2
idna==3.10
importlib_metadata==8.5.0
iniconfig==2.0.0
iree-base-compiler==3.1.0
iree-base-runtime==3.1.0
iree-compiler==20241104.1068
iree-runtime==20241104.1068
iree-turbine @ git+https://github.com/iree-org/iree-turbine.git@e4550f37dcd8b9b691db93c30b478c1d67eee83b
isodate==0.7.2
Jinja2==3.1.5
MarkupSafe==3.0.2
ml_dtypes==0.5.1
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
networkx==3.4.2
numpy==2.2.1
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
packaging==24.2
pandas==2.2.3
peft==0.13.2
pillow==11.1.0
pluggy==1.5.0
propcache==0.2.1
protobuf==5.29.3
psutil==6.1.1
pyarrow==18.1.0
pycparser==2.22
pytest==8.3.4
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.2
scipy==1.15.0
sentencepiece==0.2.0
shark-turbine==2.4.1
-e git+https://github.com/nod-ai/sharktank.git@7849f8eb49c7519da48aa794322962211bc9b091#egg=sharktank
six==1.17.0
sympy==1.13.1
tokenizers==0.19.1
torch==2.5.1
torchsde==0.2.6
tqdm==4.67.1
trampoline==0.1.2
transformers==4.40.0
triton==3.1.0
-e git+ssh://[email protected]/nod-ai/SHARK-ModelDev.git@d551ab1d0656831f945af7bafccdf80912d50615#egg=turbine_models&subdirectory=models
typing_extensions==4.12.2
tzdata==2024.2
unfoldNd==0.2.3
urllib3==2.3.0
xxhash==3.5.0
yarl==1.18.3
zipp==3.21.0
@bhbruce
Copy link
Contributor Author

bhbruce commented Jan 9, 2025

@ScottTodd Could you help to this issue?

@ScottTodd
Copy link
Member

stateless_llama.py has been unmaintained for almost a year at this point. The path we are investing in now is documented at https://github.com/nod-ai/shark-ai/blob/main/docs/shortfin/llm/user/llama_serving.md. We'll be streamlining the workflows and documentation there soon, but the MLIR export step is here: https://github.com/nod-ai/shark-ai/blob/main/docs/shortfin/llm/user/llama_serving.md#export-to-mlir-using-sharktank, and I expect that should at least compile and run for CPU and ROCm/HIP. Might also work on Vulkan/CUDA/Metal/etc. but that is less tested.

@bhbruce
Copy link
Contributor Author

bhbruce commented Jan 10, 2025

OK. I still see some updates in this Repo for onnx. Does this repo deprecated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants