Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: nrt_load_collectives status=4 message="Allocation Failure" #96

Open
myang-tech42 opened this issue Jan 27, 2025 · 8 comments

Comments

@myang-tech42
Copy link

myang-tech42 commented Jan 27, 2025

Running into "RuntimeError: nrt_load_collectives status=4 message="Allocation Failure"" when trying to run neuron_model.to_neuron() in this notebook.

Instead of 32k I adjusted to 16k with tp_degress of 8 on a ml.inf2.24xlarge.

Image

@aws-rishyraj
Copy link
Contributor

aws-rishyraj commented Jan 28, 2025

Hi @myang-tech42,

Thanks for filing this issue, we'll take a look on our end.

Could you also provide the following info:

  1. Is the model still using the bf16 datatype
  2. Was context_length_estimate adjusted to have a max of 16k
  3. Is batch_size=1?

@myang-tech42
Copy link
Author

@aws-rishyraj

  1. Yes
  2. Yes
  3. Yes

@aws-rishyraj
Copy link
Contributor

Hi @myang-tech42 ,

Thanks for clarifying!

Using the info I have, I wrote the below script based on the notebook example you shared:

from transformers_neuronx import LlamaForSampling

model_path = "Meta-Llama-3.1-8B-Instruct"

context_length_estimate = [2**(i+6) for i in range(9)]

neuron_model = LlamaForSampling.from_pretrained(model_path, n_positions=16384, context_length_estimate=context_length_estimate, batch_size=1, tp_degree=8, amp='bf16')

neuron_model.to_neuron()


import time
import torch
from transformers import AutoTokenizer
import requests, re

# construct a tokenizer and encode prompt text
# For the prompt we take a recent publication (html format), strip html tags to convert to txt
# and ask the model to summarize the paper. The input length is 26k+ tokens.
tokenizer = AutoTokenizer.from_pretrained(model_path)
#prompt = re.sub('<[^<]+?>', '', requests.get("https://arxiv.org/html/2402.19427v1").text) # strip html tags
#prompt += "\n\n========================THE END======================\n"
#prompt += "A 10 point summary of the paper in simple words: "
prompt = "What is the capital of France, and could you give a detailed history of teh capital"
# put in prompt format https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/#prompt-format
prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|> {prompt} <|eot_id|><|start_header_id|>assistant<|end_header_id|>"

input_ids = tokenizer.encode(prompt, return_tensors="pt")
num_input_tokens = len(input_ids[0]) # over 26k tokens
print(f"num_input_tokens: {num_input_tokens}")

# run inference with top-k sampling
with torch.inference_mode():
    start = time.time()
    generated_sequences = neuron_model.sample(input_ids, sequence_length=32768, top_k=10)
    elapsed = time.time() - start

# display the new generated tokens
generated_sequences = [tokenizer.decode(seq[num_input_tokens:]) for seq in generated_sequences]
print(f'generated sequence {generated_sequences[0]} in {elapsed} seconds')

But I was unable to reproduce the issue you ran into with an inf2.24xlarge, using the Neuron Release 2.21.1 dependencies (latest public dependencies at this time). Could you verify that you're using the same dependencies (pip list | grep "torch\|neuron" && sudo apt list | grep "neuronx"), and that my script is representative of what you ran. Thanks!

@myang-tech42
Copy link
Author

myang-tech42 commented Jan 28, 2025

Here are my dependencies. I tried below but doesnt get me to 2.21.1.

!pip install --upgrade neuronx-cc==2.* torch-neuronx torchvision --extra-index-url https://pip.repos.neuron.amazonaws.com

aws-neuronx-runtime-discovery 2.9
libneuronxla                  0.5.3396
neuronx-cc                    2.16.372.0+4a9b2326
torch                         1.13.1
torch-neuronx                 1.13.1.1.17.0
torch-xla                     1.13.1+torchneuronh
torchvision                   0.14.1
transformers-neuronx          0.13.380
sudo: apt: command not found

@aws-rishyraj
Copy link
Contributor

I tried below but doesnt get me to 2.21.1.

No, by release 2.21.1, I'm referring to the Neuron SDK, which is the entire collection of software associated with a release.

With that being said, thanks for listing the pip dependencies, and they do appear to be part of the 2.21.1 Neuron SDK.

However it looks like the sagemaker instance is not an Ubuntu instance, based on apt: command not found. Could you try running sudo yum list | grep neuron instead? This gets us the runtime dependencies, which is the part that's failing.

@myang-tech42
Copy link
Author

myang-tech42 commented Jan 28, 2025

Image

@aws-rishyraj
Copy link
Contributor

Looks like the runtime versions are older than expected. We will try to reproduce with the specified runtime versions.

@aws-rishyraj
Copy link
Contributor

We found that the runtime packages might not be compatible with the compiler used. I suggest upgrading your runtime packages to the latest public ones, and this should fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants