Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM pretraining encounter ImportError: cannot import name 'AttnBackend' from 'megatron.core.transformer.enums' #12000

Open
j40903272 opened this issue Jan 31, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@j40903272
Copy link

Describe the bug

I am running with nvcr.io/nvidia/nemo:24.12 scripts/llm/pretraining --slurm and run into the following error:

`Fiddle context: Error occurred while importing pyref to 'Llama3Config8B' from 'nemo.collections.llm.gpt.model.llama'.
File "", line 992, in _find_and_load_unlocked
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
values = [value for _, value in self._deserialize(serialized_items)]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 241, in _call_with_frames_removed
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 1050, in _gcd_import
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
File "", line 1027, in _find_and_load
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 778, in _deserialize
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
File "", line 1050, in _gcd_import
return self._deserialize_pyref(serialized_object)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 755, in _deserialize_pyref
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 776, in _deserialize
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
return import_symbol(self._pyref_policy, pyref[_MODULE_KEY],
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 293, in import_symbol
return self._deserialize_ref(serialized_object)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 749, in _deserialize_ref
with reraised_exception.try_with_lazy_message(make_message):
File "/usr/lib/python3.10/contextlib.py", line 153, in exit
File "", line 688, in _load_unlocked
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/reraised_exception.py", line 82, in try_with_lazy_message
deserialized_object = self._deserialize(self._serialized_objects[key])
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 798, in _deserialize
raise decorate_exception(exc, message) from None
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/reraised_exception.py", line 74, in try_with_lazy_message
File "", line 883, in exec_module
yield
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 294, in import_symbol
File "", line 241, in _call_with_frames_removed
value = importlib.import_module(module)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
metadata = self._deserialize(serialized_object[_METADATA_KEY])
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 776, in _deserialize
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "/nemo_run/code/nemo/collections/llm/init.py", line 47, in
return self._deserialize_ref(serialized_object)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 749, in _deserialize_ref
File "", line 1027, in _find_and_load
deserialized_object = self._deserialize(self._serialized_objects[key])
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 797, in _deserialize
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
values = [value for _, value in self._deserialize(serialized_items)]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 1050, in _gcd_import
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in _deserialize
File "", line 241, in _call_with_frames_removed
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 766, in
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
return [self._deserialize(x) for x in serialized_object]
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 778, in _deserialize
File "", line 992, in _find_and_load_unlocked
return self._deserialize_pyref(serialized_object)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 755, in _deserialize_pyref
File "", line 241, in _call_with_frames_removed
return import_symbol(self._pyref_policy, pyref[_MODULE_KEY],
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 293, in import_symbol
File "", line 1050, in _gcd_import
with reraised_exception.try_with_lazy_message(make_message):
File "/usr/lib/python3.10/contextlib.py", line 153, in exit
File "", line 1027, in _find_and_load
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/reraised_exception.py", line 82, in try_with_lazy_message
raise decorate_exception(exc, message) from None
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/reraised_exception.py", line 74, in try_with_lazy_message
File "", line 1006, in _find_and_load_unlocked
yield
File "/usr/local/lib/python3.10/dist-packages/fiddle/_src/experimental/serialization.py", line 294, in import_symbol
File "", line 688, in _load_unlocked
value = importlib.import_module(module)
File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1050, in _gcd_import
File "", line 883, in exec_module
File "", line 1027, in _find_and_load
File "", line 241, in _call_with_frames_removed
from nemo.collections.llm.gpt.model import (
File "/nemo_run/code/nemo/collections/llm/gpt/model/init.py", line 33, in
File "", line 992, in _find_and_load_unlocked
File "/nemo_run/code/nemo/collections/llm/init.py", line 47, in
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 992, in _find_and_load_unlocked
File "", line 241, in _call_with_frames_removed
File "", line 1050, in _gcd_import
File "", line 1027, in _find_and_load
File "", line 1006, in _find_and_load_unlocked
File "", line 688, in _load_unlocked
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/nemo_run/code/nemo/collections/llm/init.py", line 47, in
from nemo.collections.llm.gpt.model import (
from nemo.collections.llm.gpt.model.gemma import (
File "/nemo_run/code/nemo/collections/llm/gpt/model/init.py", line 33, in
File "/nemo_run/code/nemo/collections/llm/gpt/model/gemma.py", line 21, in
from nemo.collections.llm.gpt.model import (
File "/nemo_run/code/nemo/collections/llm/gpt/model/init.py", line 33, in
from megatron.core.transformer.enums import AttnBackend
from nemo.collections.llm.gpt.model.gemma import (ImportError: cannot import name 'AttnBackend' from 'megatron.core.transformer.enums' (/opt/megatron-lm/megatron/core/transformer/enums.py)
Fiddle context: Error occurred while importing pyref to 'Llama3Config8B' from 'nemo.collections.llm.gpt.model.llama'.
File "/nemo_run/code/nemo/collections/llm/gpt/model/gemma.py", line 21, in

from nemo.collections.llm.gpt.model.gemma import (

File "/nemo_run/code/nemo/collections/llm/gpt/model/gemma.py", line 21, in
from megatron.core.transformer.enums import AttnBackend
ImportError: cannot import name 'AttnBackend' from 'megatron.core.transformer.enums' (/opt/megatron-lm/megatron/core/transformer/enums.py)
Fiddle context: Error occurred while importing pyref to 'Llama3Config8B' from 'nemo.collections.llm.gpt.model.llama'.
from megatron.core.transformer.enums import AttnBackend
ImportError: cannot import name 'AttnBackend' from 'megatron.core.transformer.enums' (/opt/megatron-lm/megatron/core/transformer/enums.py)
Fiddle context: Error occurred while importing pyref to 'Llama3Config8B' from 'nemo.collections.llm.gpt.model.llama'.
Jan 30 15:52:36.206882 1668289 slurmstepd 0x155550ab8700: error: *** STEP 5046499.1 ON batch-block5-00140 CANCELLED AT 2025-01-30T15:52:36 ***`

Steps/Code to reproduce bug

container: nvcr.io/nvidia/nemo:24.12
command: scripts/llm/pretraining --slurm

@j40903272 j40903272 added the bug Something isn't working label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant