LLM pretraining encounter ImportError: cannot import name 'AttnBackend' from 'megatron.core.transformer.enums'
#12000
Labels
bug
Something isn't working
Describe the bug
I am running with nvcr.io/nvidia/nemo:24.12
scripts/llm/pretraining --slurm
and run into the following error:`Fiddle context: Error occurred while importing pyref to 'Llama3Config8B' from 'nemo.collections.llm.gpt.model.llama'.
File "", line 992, in _find_and_load_unlocked
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
values = [value for _, value in self._deserialize(serialized_items)]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 241, in _call_with_frames_removed
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 1050, in _gcd_import
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
File "", line 1027, in _find_and_load
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 778, in _deserialize
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
File "", line 1050, in _gcd_import
return self._deserialize_pyref(serialized_object)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 755, in _deserialize_pyref
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 776, in _deserialize
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
return import_symbol(self._pyref_policy, pyref[_MODULE_KEY],
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 293, in import_symbol
return self._deserialize_ref(serialized_object)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 749, in _deserialize_ref
with reraised_exception.try_with_lazy_message(make_message):
File "/usr/lib/python3.10/contextlib.py", line 153, in exit
File "", line 688, in _load_unlocked
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/reraised_exception.py", line 82, in try_with_lazy_message
deserialized_object = self._deserialize(self._serialized_objects[key])
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 798, in _deserialize
raise decorate_exception(exc, message) from None
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/reraised_exception.py", line 74, in try_with_lazy_message
File "", line 883, in exec_module
yield
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 294, in import_symbol
File "", line 241, in _call_with_frames_removed
value = importlib.import_module(module)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
metadata = self._deserialize(serialized_object[_METADATA_KEY])
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 776, in _deserialize
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "/nemo_run/code/nemo/collections/llm/init.py", line 47, in
return self._deserialize_ref(serialized_object)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 749, in _deserialize_ref
File "", line 1027, in _find_and_load
deserialized_object = self._deserialize(self._serialized_objects[key])
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 797, in _deserialize
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
values = [value for _, value in self._deserialize(serialized_items)]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 1050, in _gcd_import
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 241, in _call_with_frames_removed
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 778, in _deserialize
File "", line 992, in _find_and_load_unlocked
return self._deserialize_pyref(serialized_object)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 755, in _deserialize_pyref
File "", line 241, in _call_with_frames_removed
return import_symbol(self._pyref_policy, pyref[_MODULE_KEY],
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 293, in import_symbol
File "", line 1050, in _gcd_import
with reraised_exception.try_with_lazy_message(make_message):
File "/usr/lib/python3.10/contextlib.py", line 153, in exit
File "", line 1027, in _find_and_load
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/reraised_exception.py", line 82, in try_with_lazy_message
raise decorate_exception(exc, message) from None
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/reraised_exception.py", line 74, in try_with_lazy_message
File "", line 1006, in _find_and_load_unlocked
yield
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 294, in import_symbol
File "", line 688, in _load_unlocked
value = importlib.import_module(module)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 883, in exec_module
File "", line 1027, in _find_and_load
File "", line 241, in _call_with_frames_removed
from nemo.collections.llm.gpt.model import (
File "/nemo_run/code/nemo/collections/llm/gpt/model/init.py", line 33, in
File "", line 992, in _find_and_load_unlocked
File "/nemo_run/code/nemo/collections/llm/init.py", line 47, in
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/nemo_run/code/nemo/collections/llm/init.py", line 47, in
from nemo.collections.llm.gpt.model import (
from nemo.collections.llm.gpt.model.gemma import (
File "/nemo_run/code/nemo/collections/llm/gpt/model/init.py", line 33, in
File "/nemo_run/code/nemo/collections/llm/gpt/model/gemma.py", line 21, in
from nemo.collections.llm.gpt.model import (
File "/nemo_run/code/nemo/collections/llm/gpt/model/init.py", line 33, in
from megatron.core.transformer.enums import AttnBackend
from nemo.collections.llm.gpt.model.gemma import (ImportError: cannot import name 'AttnBackend' from 'megatron.core.transformer.enums' (/opt/megatron-lm/megatron/core/transformer/enums.py)
Fiddle context: Error occurred while importing pyref to 'Llama3Config8B' from 'nemo.collections.llm.gpt.model.llama'.
File "/nemo_run/code/nemo/collections/llm/gpt/model/gemma.py", line 21, in
File "/nemo_run/code/nemo/collections/llm/gpt/model/gemma.py", line 21, in
from megatron.core.transformer.enums import AttnBackend
ImportError: cannot import name 'AttnBackend' from 'megatron.core.transformer.enums' (/opt/megatron-lm/megatron/core/transformer/enums.py)
Fiddle context: Error occurred while importing pyref to 'Llama3Config8B' from 'nemo.collections.llm.gpt.model.llama'.
from megatron.core.transformer.enums import AttnBackend
ImportError: cannot import name 'AttnBackend' from 'megatron.core.transformer.enums' (/opt/megatron-lm/megatron/core/transformer/enums.py)
Fiddle context: Error occurred while importing pyref to 'Llama3Config8B' from 'nemo.collections.llm.gpt.model.llama'.
Jan 30 15:52:36.206882 1668289 slurmstepd 0x155550ab8700: error: *** STEP 5046499.1 ON batch-block5-00140 CANCELLED AT 2025-01-30T15:52:36 ***`
Steps/Code to reproduce bug
container: nvcr.io/nvidia/nemo:24.12
command:
scripts/llm/pretraining --slurm
The text was updated successfully, but these errors were encountered: