RuntimeError: nrt_load_collectives status=4 message="Allocation Failure" #96

myang-tech42 · 2025-01-27T17:54:38Z

Running into "RuntimeError: nrt_load_collectives status=4 message="Allocation Failure"" when trying to run neuron_model.to_neuron() in this notebook.

Instead of 32k I adjusted to 16k with tp_degress of 8 on a ml.inf2.24xlarge.

The text was updated successfully, but these errors were encountered:

aws-rishyraj · 2025-01-28T18:05:42Z

Hi @myang-tech42,

Thanks for filing this issue, we'll take a look on our end.

Could you also provide the following info:

Is the model still using the bf16 datatype
Was context_length_estimate adjusted to have a max of 16k
Is batch_size=1?

myang-tech42 · 2025-01-28T19:12:55Z

@aws-rishyraj

Yes
Yes
Yes

aws-rishyraj · 2025-01-28T19:36:08Z

Hi @myang-tech42 ,

Thanks for clarifying!

Using the info I have, I wrote the below script based on the notebook example you shared:

from transformers_neuronx import LlamaForSampling

model_path = "Meta-Llama-3.1-8B-Instruct"

context_length_estimate = [2**(i+6) for i in range(9)]

neuron_model = LlamaForSampling.from_pretrained(model_path, n_positions=16384, context_length_estimate=context_length_estimate, batch_size=1, tp_degree=8, amp='bf16')

neuron_model.to_neuron()


import time
import torch
from transformers import AutoTokenizer
import requests, re

# construct a tokenizer and encode prompt text
# For the prompt we take a recent publication (html format), strip html tags to convert to txt
# and ask the model to summarize the paper. The input length is 26k+ tokens.
tokenizer = AutoTokenizer.from_pretrained(model_path)
#prompt = re.sub('<[^<]+?>', '', requests.get("https://arxiv.org/html/2402.19427v1").text) # strip html tags
#prompt += "\n\n========================THE END======================\n"
#prompt += "A 10 point summary of the paper in simple words: "
prompt = "What is the capital of France, and could you give a detailed history of teh capital"
# put in prompt format https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/#prompt-format
prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|> {prompt} <|eot_id|><|start_header_id|>assistant<|end_header_id|>"

input_ids = tokenizer.encode(prompt, return_tensors="pt")
num_input_tokens = len(input_ids[0]) # over 26k tokens
print(f"num_input_tokens: {num_input_tokens}")

# run inference with top-k sampling
with torch.inference_mode():
    start = time.time()
    generated_sequences = neuron_model.sample(input_ids, sequence_length=32768, top_k=10)
    elapsed = time.time() - start

# display the new generated tokens
generated_sequences = [tokenizer.decode(seq[num_input_tokens:]) for seq in generated_sequences]
print(f'generated sequence {generated_sequences[0]} in {elapsed} seconds')

But I was unable to reproduce the issue you ran into with an inf2.24xlarge, using the Neuron Release 2.21.1 dependencies (latest public dependencies at this time). Could you verify that you're using the same dependencies (pip list | grep "torch\|neuron" && sudo apt list | grep "neuronx"), and that my script is representative of what you ran. Thanks!

myang-tech42 · 2025-01-28T22:08:22Z

Here are my dependencies. I tried below but doesnt get me to 2.21.1.

!pip install --upgrade neuronx-cc==2.* torch-neuronx torchvision --extra-index-url https://pip.repos.neuron.amazonaws.com

aws-neuronx-runtime-discovery 2.9
libneuronxla                  0.5.3396
neuronx-cc                    2.16.372.0+4a9b2326
torch                         1.13.1
torch-neuronx                 1.13.1.1.17.0
torch-xla                     1.13.1+torchneuronh
torchvision                   0.14.1
transformers-neuronx          0.13.380
sudo: apt: command not found

aws-rishyraj · 2025-01-28T23:18:56Z

I tried below but doesnt get me to 2.21.1.

No, by release 2.21.1, I'm referring to the Neuron SDK, which is the entire collection of software associated with a release.

With that being said, thanks for listing the pip dependencies, and they do appear to be part of the 2.21.1 Neuron SDK.

However it looks like the sagemaker instance is not an Ubuntu instance, based on apt: command not found. Could you try running sudo yum list | grep neuron instead? This gets us the runtime dependencies, which is the part that's failing.

myang-tech42 · 2025-01-28T23:59:53Z

aws-rishyraj · 2025-01-29T17:48:13Z

Looks like the runtime versions are older than expected. We will try to reproduce with the specified runtime versions.

aws-rishyraj · 2025-01-31T19:06:45Z

We found that the runtime packages might not be compatible with the compiler used. I suggest upgrading your runtime packages to the latest public ones, and this should fix the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: nrt_load_collectives status=4 message="Allocation Failure" #96

RuntimeError: nrt_load_collectives status=4 message="Allocation Failure" #96

myang-tech42 commented Jan 27, 2025 •

edited

Loading

aws-rishyraj commented Jan 28, 2025 •

edited

Loading

myang-tech42 commented Jan 28, 2025

aws-rishyraj commented Jan 28, 2025

myang-tech42 commented Jan 28, 2025 •

edited

Loading

aws-rishyraj commented Jan 28, 2025

myang-tech42 commented Jan 28, 2025 •

edited

Loading

aws-rishyraj commented Jan 29, 2025

aws-rishyraj commented Jan 31, 2025

RuntimeError: nrt_load_collectives status=4 message="Allocation Failure" #96

RuntimeError: nrt_load_collectives status=4 message="Allocation Failure" #96

Comments

myang-tech42 commented Jan 27, 2025 • edited Loading

aws-rishyraj commented Jan 28, 2025 • edited Loading

myang-tech42 commented Jan 28, 2025

aws-rishyraj commented Jan 28, 2025

myang-tech42 commented Jan 28, 2025 • edited Loading

aws-rishyraj commented Jan 28, 2025

myang-tech42 commented Jan 28, 2025 • edited Loading

aws-rishyraj commented Jan 29, 2025

aws-rishyraj commented Jan 31, 2025

myang-tech42 commented Jan 27, 2025 •

edited

Loading

aws-rishyraj commented Jan 28, 2025 •

edited

Loading

myang-tech42 commented Jan 28, 2025 •

edited

Loading

myang-tech42 commented Jan 28, 2025 •

edited

Loading