Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could you give an elaborated steps about how to run llm-foundry on AMD mi250 devices #1242

Open
Alice1069 opened this issue May 27, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Alice1069
Copy link

could run the llm-foundry on AMD 4xMi250 machine

Steps to reproduce the behavior:

  1. follow latest instructions from: https://github.com/ROCm/flash-attention/tree/flash_attention_for_rocm
    start from docker image: rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1
    export GPU_ARCHS="gfx90a"
    export PYTHON_SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
    patch "${PYTHON_SITE_PACKAGES}/torch/utils/hipify/hipify_python.py" hipify_patch.patch
    pip install .
    verified by PYTHONPATH=$PWD python benchmarks/benchmark_flash_attention.py
    "pip list" shows "flash-attn 2.0.4"

  2. get llm-foundry v0.7 code
    modify setup.py

  • 'torch>=2.2.1,<2.3',
  • 'torch>=2.0,<2.0.2',
  1. pip3 install --upgrade pip
  2. pip install -e .
  3. command to run :
    python data_prep/convert_dataset_hf.py
    --dataset c4 --data_subset en
    --out_root my-copy-c4 --splits train_small val_small
    --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text '<|endoftext|>'

composer train/train.py train/yamls/pretrain/mpt-1b.yaml data_local=my-copy-c4 train_loader.dataset.split=train_small eval_loader.dataset.split=val_small max_duration=10ba eval_interval=0 loss_fn=torch_crossentropy save_folder=mpt-1b

  1. it said lack of lotary_emb
  2. pip install lotary_emb
  3. re run command, it said lack of libcudart.11.0
  4. export LD_LIBRARY_PATH to include libudart
  5. re run command , it said lack of libtorch_cuda.so

could you give me a detailed version of hwo to run llm-foundry on AMD mi250, i read through the 2 blogs about AMD, but not get the hint. any version of code is ok. Thank you!

@Alice1069 Alice1069 added the bug Something isn't working label May 27, 2024
@nik-mosaic
Copy link
Contributor

Hi @Alice1069, the FlashAttention ROCM version is likely fairly old now, so the easiest thing to do would be to disable FlashAttention. The other thing to try would be to manually comment out all the rotary emb codepaths and not use ROPE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants