Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAT示例sample_dit.py运行出错 #13

Open
1 of 2 tasks
lxrmido opened this issue Oct 16, 2024 · 0 comments
Open
1 of 2 tasks

SAT示例sample_dit.py运行出错 #13

lxrmido opened this issue Oct 16, 2024 · 0 comments

Comments

@lxrmido
Copy link

lxrmido commented Oct 16, 2024

System Info / 系統信息

CUDA: 12.2
Torch: 2.4.1
Python: 3.10
模型: CogView-3Plus
错误信息: [rank0]: RuntimeError: input must be a CUDA tensor

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

python sample_dit.py --base configs/cogview3_plus.yaml

[2024-10-16 11:20:49,157] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last): File "/data/app/CogView3/sat/sample_dit.py", line 17, in
from diffusion import SATDiffusionEngine
File "/data/app/CogView3/sat/diffusion.py", line 8, in
from sgm.modules import UNCONDITIONAL_CONFIG
File "/data/app/CogView3/sat/sgm/init.py", line 1, in
from .models import AutoencodingEngine
File "/data/app/CogView3/sat/sgm/models/init.py", line 1, in
from .autoencoder import AutoencodingEngine
File "/data/app/CogView3/sat/sgm/models/autoencoder.py", line 8, in
import pytorch_lightning as pl
File "/data/anaconda3/lib/python3.10/site-packages/pytorch_lightning/init.py", line 27, in
from pytorch_lightning.callbacks import Callback # noqa: E402
File "/data/anaconda3/lib/python3.10/site-packages/pytorch_lightning/callbacks/init.py", line 14, in
from pytorch_lightning.callbacks.batch_size_finder import BatchSizeFinder
File "/data/anaconda3/lib/python3.10/site-packages/pytorch_lightning/callbacks/batch_size_finder.py", line 26, in
from pytorch_lightning.callbacks.callback import Callback
File "/data/anaconda3/lib/python3.10/site-packages/pytorch_lightning/callbacks/callback.py", line 22, in
from pytorch_lightning.utilities.types import STEP_OUTPUT
File "/data/anaconda3/lib/python3.10/site-packages/pytorch_lightning/utilities/types.py", line 42, in
from torchmetrics import Metric
File "/data/anaconda3/lib/python3.10/site-packages/torchmetrics/init.py", line 23, in
from torchmetrics import functional # noqa: E402
File "/data/anaconda3/lib/python3.10/site-packages/torchmetrics/functional/init.py", line 14, in
from torchmetrics.functional.audio._deprecated import _permutation_invariant_training as permutation_invariant_training
File "/data/anaconda3/lib/python3.10/site-packages/torchmetrics/functional/audio/init.py", line 14, in
from torchmetrics.functional.audio.pit import permutation_invariant_training, pit_permutate
File "/data/anaconda3/lib/python3.10/site-packages/torchmetrics/functional/audio/pit.py", line 22, in
from torchmetrics.utilities import rank_zero_warn
File "/data/anaconda3/lib/python3.10/site-packages/torchmetrics/utilities/init.py", line 14, in
from torchmetrics.utilities.checks import check_forward_full_state_property
File "/data/anaconda3/lib/python3.10/site-packages/torchmetrics/utilities/checks.py", line 25, in
from torchmetrics.metric import Metric
File "/data/anaconda3/lib/python3.10/site-packages/torchmetrics/metric.py", line 41, in
from torchmetrics.utilities.plot import _AX_TYPE, _PLOT_OUT_TYPE, plot_single_or_multi_val
File "/data/anaconda3/lib/python3.10/site-packages/torchmetrics/utilities/plot.py", line 24, in
if _MATPLOTLIB_AVAILABLE:
File "/data/anaconda3/lib/python3.10/site-packages/lightning_utilities/core/imports.py", line 164, in bool
self._check_available()
File "/data/anaconda3/lib/python3.10/site-packages/lightning_utilities/core/imports.py", line 158, in _check_available
self._check_requirement()
File "/data/anaconda3/lib/python3.10/site-packages/lightning_utilities/core/imports.py", line 142, in _check_requirement
self.available = module_available(module)
File "/data/anaconda3/lib/python3.10/site-packages/lightning_utilities/core/imports.py", line 61, in module_available
importlib.import_module(module_path)
File "/data/anaconda3/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/data/anaconda3/lib/python3.10/site-packages/matplotlib/init.py", line 161, in
from . import _api, _version, cbook, _docstring, rcsetup
File "/data/anaconda3/lib/python3.10/site-packages/matplotlib/rcsetup.py", line 27, in
from matplotlib.colors import Colormap, is_color_like
File "/data/anaconda3/lib/python3.10/site-packages/matplotlib/colors.py", line 57, in
from matplotlib import _api, _cm, cbook, scale
File "/data/anaconda3/lib/python3.10/site-packages/matplotlib/scale.py", line 22, in
from matplotlib.ticker import (
File "/data/anaconda3/lib/python3.10/site-packages/matplotlib/ticker.py", line 143, in
from matplotlib import transforms as mtransforms
File "/data/anaconda3/lib/python3.10/site-packages/matplotlib/transforms.py", line 49, in
from matplotlib._path import (
Traceback (most recent call last):
File "/data/anaconda3/lib/python3.10/site-packages/lightning_utilities/core/imports.py", line 132, in _check_requirement
pkg_resources.require(self.requirement)
File "/data/anaconda3/lib/python3.10/site-packages/pkg_resources/init.py", line 968, in require
needed = self.resolve(parse_requirements(requirements))
File "/data/anaconda3/lib/python3.10/site-packages/pkg_resources/init.py", line 829, in resolve
dist = self._resolve_dist(
File "/data/anaconda3/lib/python3.10/site-packages/pkg_resources/init.py", line 875, in _resolve_dist
raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (numpy 2.1.2 (/data/anaconda3/lib/python3.10/site-packages), Requirement.parse('numpy<2,>=1.21'), {'matplotlib'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/anaconda3/lib/python3.10/site-packages/numpy/core/_multiarray_umath.py", line 44, in getattr
raise ImportError(msg)
ImportError:
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

/data/anaconda3/lib/python3.10/site-packages/kornia/feature/lightglue.py:44: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
[2024-10-16 11:20:55,068] [WARNING] No training data specified
[2024-10-16 11:20:55,068] [WARNING] No train_iters (recommended) or epochs specified, use default 10k iters.
[2024-10-16 11:20:55,068] [INFO] using world size: 1
[2024-10-16 11:20:55,071] [INFO] [RANK 0] > initializing model parallel with size 1
[2024-10-16 11:20:55,072] [INFO] [RANK 0] You didn't pass in LOCAL_WORLD_SIZE environment variable. We use the guessed LOCAL_WORLD_SIZE=1. If this is wrong, please pass the LOCAL_WORLD_SIZE manually.
[2024-10-16 11:20:55,074] [INFO] [RANK 0] building SATDiffusionEngine model ...
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:25<00:00, 12.55s/it]
/data/app/CogView3/sat/sgm/models/autoencoder.py:164: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
sd = torch.load(path, map_location="cpu")['state_dict']
Missing keys: []
Unexpected keys: []
Restored from /data/models/cogview3/vae/imagekl_ch16.pt
[2024-10-16 11:21:52,183] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 8059191011
[2024-10-16 11:22:00,341] [INFO] [RANK 0] global rank 0 is loading checkpoint /data/models/cogview3/transformer/1/mp_rank_00_model_states.pt
/data/anaconda3/lib/python3.10/site-packages/sat/training/model_io.py:286: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
sd = torch.load(checkpoint_name, map_location='cpu')
[2024-10-16 11:22:22,980] [INFO] [RANK 0] Warning: Missing keys for inference: ['model.diffusion_model.mixins.pos_embed.image_pos_embedding', 'conditioner.embedders.0.transformer.shared.weight', 'conditioner.embedders.0.transformer.encoder.embed_tokens.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.0.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.1.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.2.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.3.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.4.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.5.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.6.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.7.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.8.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.9.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.10.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.11.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.12.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.13.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.14.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.15.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.16.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.17.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.18.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.19.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.20.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.21.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.22.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.0.SelfAttention.q.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.0.SelfAttention.k.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.0.SelfAttention.v.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.0.SelfAttention.o.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.0.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.1.DenseReluDense.wi_0.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.1.DenseReluDense.wi_1.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.1.DenseReluDense.wo.weight', 'conditioner.embedders.0.transformer.encoder.block.23.layer.1.layer_norm.weight', 'conditioner.embedders.0.transformer.encoder.final_layer_norm.weight', 'first_stage_model.encoder.conv_in.weight', 'first_stage_model.encoder.conv_in.bias', 'first_stage_model.encoder.down.0.block.0.norm1.weight', 'first_stage_model.encoder.down.0.block.0.norm1.bias', 'first_stage_model.encoder.down.0.block.0.conv1.weight', 'first_stage_model.encoder.down.0.block.0.conv1.bias', 'first_stage_model.encoder.down.0.block.0.norm2.weight', 'first_stage_model.encoder.down.0.block.0.norm2.bias', 'first_stage_model.encoder.down.0.block.0.conv2.weight', 'first_stage_model.encoder.down.0.block.0.conv2.bias', 'first_stage_model.encoder.down.0.block.1.norm1.weight', 'first_stage_model.encoder.down.0.block.1.norm1.bias', 'first_stage_model.encoder.down.0.block.1.conv1.weight', 'first_stage_model.encoder.down.0.block.1.conv1.bias', 'first_stage_model.encoder.down.0.block.1.norm2.weight', 'first_stage_model.encoder.down.0.block.1.norm2.bias', 'first_stage_model.encoder.down.0.block.1.conv2.weight', 'first_stage_model.encoder.down.0.block.1.conv2.bias', 'first_stage_model.encoder.down.0.block.2.norm1.weight', 'first_stage_model.encoder.down.0.block.2.norm1.bias', 'first_stage_model.encoder.down.0.block.2.conv1.weight', 'first_stage_model.encoder.down.0.block.2.conv1.bias', 'first_stage_model.encoder.down.0.block.2.norm2.weight', 'first_stage_model.encoder.down.0.block.2.norm2.bias', 'first_stage_model.encoder.down.0.block.2.conv2.weight', 'first_stage_model.encoder.down.0.block.2.conv2.bias', 'first_stage_model.encoder.down.0.downsample.conv.weight', 'first_stage_model.encoder.down.0.downsample.conv.bias', 'first_stage_model.encoder.down.1.block.0.norm1.weight', 'first_stage_model.encoder.down.1.block.0.norm1.bias', 'first_stage_model.encoder.down.1.block.0.conv1.weight', 'first_stage_model.encoder.down.1.block.0.conv1.bias', 'first_stage_model.encoder.down.1.block.0.norm2.weight', 'first_stage_model.encoder.down.1.block.0.norm2.bias', 'first_stage_model.encoder.down.1.block.0.conv2.weight', 'first_stage_model.encoder.down.1.block.0.conv2.bias', 'first_stage_model.encoder.down.1.block.0.nin_shortcut.weight', 'first_stage_model.encoder.down.1.block.0.nin_shortcut.bias', 'first_stage_model.encoder.down.1.block.1.norm1.weight', 'first_stage_model.encoder.down.1.block.1.norm1.bias', 'first_stage_model.encoder.down.1.block.1.conv1.weight', 'first_stage_model.encoder.down.1.block.1.conv1.bias', 'first_stage_model.encoder.down.1.block.1.norm2.weight', 'first_stage_model.encoder.down.1.block.1.norm2.bias', 'first_stage_model.encoder.down.1.block.1.conv2.weight', 'first_stage_model.encoder.down.1.block.1.conv2.bias', 'first_stage_model.encoder.down.1.block.2.norm1.weight', 'first_stage_model.encoder.down.1.block.2.norm1.bias', 'first_stage_model.encoder.down.1.block.2.conv1.weight', 'first_stage_model.encoder.down.1.block.2.conv1.bias', 'first_stage_model.encoder.down.1.block.2.norm2.weight', 'first_stage_model.encoder.down.1.block.2.norm2.bias', 'first_stage_model.encoder.down.1.block.2.conv2.weight', 'first_stage_model.encoder.down.1.block.2.conv2.bias', 'first_stage_model.encoder.down.1.downsample.conv.weight', 'first_stage_model.encoder.down.1.downsample.conv.bias', 'first_stage_model.encoder.down.2.block.0.norm1.weight', 'first_stage_model.encoder.down.2.block.0.norm1.bias', 'first_stage_model.encoder.down.2.block.0.conv1.weight', 'first_stage_model.encoder.down.2.block.0.conv1.bias', 'first_stage_model.encoder.down.2.block.0.norm2.weight', 'first_stage_model.encoder.down.2.block.0.norm2.bias', 'first_stage_model.encoder.down.2.block.0.conv2.weight', 'first_stage_model.encoder.down.2.block.0.conv2.bias', 'first_stage_model.encoder.down.2.block.0.nin_shortcut.weight', 'first_stage_model.encoder.down.2.block.0.nin_shortcut.bias', 'first_stage_model.encoder.down.2.block.1.norm1.weight', 'first_stage_model.encoder.down.2.block.1.norm1.bias', 'first_stage_model.encoder.down.2.block.1.conv1.weight', 'first_stage_model.encoder.down.2.block.1.conv1.bias', 'first_stage_model.encoder.down.2.block.1.norm2.weight', 'first_stage_model.encoder.down.2.block.1.norm2.bias', 'first_stage_model.encoder.down.2.block.1.conv2.weight', 'first_stage_model.encoder.down.2.block.1.conv2.bias', 'first_stage_model.encoder.down.2.block.2.norm1.weight', 'first_stage_model.encoder.down.2.block.2.norm1.bias', 'first_stage_model.encoder.down.2.block.2.conv1.weight', 'first_stage_model.encoder.down.2.block.2.conv1.bias', 'first_stage_model.encoder.down.2.block.2.norm2.weight', 'first_stage_model.encoder.down.2.block.2.norm2.bias', 'first_stage_model.encoder.down.2.block.2.conv2.weight', 'first_stage_model.encoder.down.2.block.2.conv2.bias', 'first_stage_model.encoder.down.2.downsample.conv.weight', 'first_stage_model.encoder.down.2.downsample.conv.bias', 'first_stage_model.encoder.down.3.block.0.norm1.weight', 'first_stage_model.encoder.down.3.block.0.norm1.bias', 'first_stage_model.encoder.down.3.block.0.conv1.weight', 'first_stage_model.encoder.down.3.block.0.conv1.bias', 'first_stage_model.encoder.down.3.block.0.norm2.weight', 'first_stage_model.encoder.down.3.block.0.norm2.bias', 'first_stage_model.encoder.down.3.block.0.conv2.weight', 'first_stage_model.encoder.down.3.block.0.conv2.bias', 'first_stage_model.encoder.down.3.block.1.norm1.weight', 'first_stage_model.encoder.down.3.block.1.norm1.bias', 'first_stage_model.encoder.down.3.block.1.conv1.weight', 'first_stage_model.encoder.down.3.block.1.conv1.bias', 'first_stage_model.encoder.down.3.block.1.norm2.weight', 'first_stage_model.encoder.down.3.block.1.norm2.bias', 'first_stage_model.encoder.down.3.block.1.conv2.weight', 'first_stage_model.encoder.down.3.block.1.conv2.bias', 'first_stage_model.encoder.down.3.block.2.norm1.weight', 'first_stage_model.encoder.down.3.block.2.norm1.bias', 'first_stage_model.encoder.down.3.block.2.conv1.weight', 'first_stage_model.encoder.down.3.block.2.conv1.bias', 'first_stage_model.encoder.down.3.block.2.norm2.weight', 'first_stage_model.encoder.down.3.block.2.norm2.bias', 'first_stage_model.encoder.down.3.block.2.conv2.weight', 'first_stage_model.encoder.down.3.block.2.conv2.bias', 'first_stage_model.encoder.mid.block_1.norm1.weight', 'first_stage_model.encoder.mid.block_1.norm1.bias', 'first_stage_model.encoder.mid.block_1.conv1.weight', 'first_stage_model.encoder.mid.block_1.conv1.bias', 'first_stage_model.encoder.mid.block_1.norm2.weight', 'first_stage_model.encoder.mid.block_1.norm2.bias', 'first_stage_model.encoder.mid.block_1.conv2.weight', 'first_stage_model.encoder.mid.block_1.conv2.bias', 'first_stage_model.encoder.mid.block_2.norm1.weight', 'first_stage_model.encoder.mid.block_2.norm1.bias', 'first_stage_model.encoder.mid.block_2.conv1.weight', 'first_stage_model.encoder.mid.block_2.conv1.bias', 'first_stage_model.encoder.mid.block_2.norm2.weight', 'first_stage_model.encoder.mid.block_2.norm2.bias', 'first_stage_model.encoder.mid.block_2.conv2.weight', 'first_stage_model.encoder.mid.block_2.conv2.bias', 'first_stage_model.encoder.norm_out.weight', 'first_stage_model.encoder.norm_out.bias', 'first_stage_model.encoder.conv_out.weight', 'first_stage_model.encoder.conv_out.bias', 'first_stage_model.decoder.conv_in.weight', 'first_stage_model.decoder.conv_in.bias', 'first_stage_model.decoder.mid.block_1.norm1.weight', 'first_stage_model.decoder.mid.block_1.norm1.bias', 'first_stage_model.decoder.mid.block_1.conv1.weight', 'first_stage_model.decoder.mid.block_1.conv1.bias', 'first_stage_model.decoder.mid.block_1.norm2.weight', 'first_stage_model.decoder.mid.block_1.norm2.bias', 'first_stage_model.decoder.mid.block_1.conv2.weight', 'first_stage_model.decoder.mid.block_1.conv2.bias', 'first_stage_model.decoder.mid.block_2.norm1.weight', 'first_stage_model.decoder.mid.block_2.norm1.bias', 'first_stage_model.decoder.mid.block_2.conv1.weight', 'first_stage_model.decoder.mid.block_2.conv1.bias', 'first_stage_model.decoder.mid.block_2.norm2.weight', 'first_stage_model.decoder.mid.block_2.norm2.bias', 'first_stage_model.decoder.mid.block_2.conv2.weight', 'first_stage_model.decoder.mid.block_2.conv2.bias', 'first_stage_model.decoder.up.0.block.0.norm1.weight', 'first_stage_model.decoder.up.0.block.0.norm1.bias', 'first_stage_model.decoder.up.0.block.0.conv1.weight', 'first_stage_model.decoder.up.0.block.0.conv1.bias', 'first_stage_model.decoder.up.0.block.0.norm2.weight', 'first_stage_model.decoder.up.0.block.0.norm2.bias', 'first_stage_model.decoder.up.0.block.0.conv2.weight', 'first_stage_model.decoder.up.0.block.0.conv2.bias', 'first_stage_model.decoder.up.0.block.0.nin_shortcut.weight', 'first_stage_model.decoder.up.0.block.0.nin_shortcut.bias', 'first_stage_model.decoder.up.0.block.1.norm1.weight', 'first_stage_model.decoder.up.0.block.1.norm1.bias', 'first_stage_model.decoder.up.0.block.1.conv1.weight', 'first_stage_model.decoder.up.0.block.1.conv1.bias', 'first_stage_model.decoder.up.0.block.1.norm2.weight', 'first_stage_model.decoder.up.0.block.1.norm2.bias', 'first_stage_model.decoder.up.0.block.1.conv2.weight', 'first_stage_model.decoder.up.0.block.1.conv2.bias', 'first_stage_model.decoder.up.0.block.2.norm1.weight', 'first_stage_model.decoder.up.0.block.2.norm1.bias', 'first_stage_model.decoder.up.0.block.2.conv1.weight', 'first_stage_model.decoder.up.0.block.2.conv1.bias', 'first_stage_model.decoder.up.0.block.2.norm2.weight', 'first_stage_model.decoder.up.0.block.2.norm2.bias', 'first_stage_model.decoder.up.0.block.2.conv2.weight', 'first_stage_model.decoder.up.0.block.2.conv2.bias', 'first_stage_model.decoder.up.0.block.3.norm1.weight', 'first_stage_model.decoder.up.0.block.3.norm1.bias', 'first_stage_model.decoder.up.0.block.3.conv1.weight', 'first_stage_model.decoder.up.0.block.3.conv1.bias', 'first_stage_model.decoder.up.0.block.3.norm2.weight', 'first_stage_model.decoder.up.0.block.3.norm2.bias', 'first_stage_model.decoder.up.0.block.3.conv2.weight', 'first_stage_model.decoder.up.0.block.3.conv2.bias', 'first_stage_model.decoder.up.1.block.0.norm1.weight', 'first_stage_model.decoder.up.1.block.0.norm1.bias', 'first_stage_model.decoder.up.1.block.0.conv1.weight', 'first_stage_model.decoder.up.1.block.0.conv1.bias', 'first_stage_model.decoder.up.1.block.0.norm2.weight', 'first_stage_model.decoder.up.1.block.0.norm2.bias', 'first_stage_model.decoder.up.1.block.0.conv2.weight', 'first_stage_model.decoder.up.1.block.0.conv2.bias', 'first_stage_model.decoder.up.1.block.0.nin_shortcut.weight', 'first_stage_model.decoder.up.1.block.0.nin_shortcut.bias', 'first_stage_model.decoder.up.1.block.1.norm1.weight', 'first_stage_model.decoder.up.1.block.1.norm1.bias', 'first_stage_model.decoder.up.1.block.1.conv1.weight', 'first_stage_model.decoder.up.1.block.1.conv1.bias', 'first_stage_model.decoder.up.1.block.1.norm2.weight', 'first_stage_model.decoder.up.1.block.1.norm2.bias', 'first_stage_model.decoder.up.1.block.1.conv2.weight', 'first_stage_model.decoder.up.1.block.1.conv2.bias', 'first_stage_model.decoder.up.1.block.2.norm1.weight', 'first_stage_model.decoder.up.1.block.2.norm1.bias', 'first_stage_model.decoder.up.1.block.2.conv1.weight', 'first_stage_model.decoder.up.1.block.2.conv1.bias', 'first_stage_model.decoder.up.1.block.2.norm2.weight', 'first_stage_model.decoder.up.1.block.2.norm2.bias', 'first_stage_model.decoder.up.1.block.2.conv2.weight', 'first_stage_model.decoder.up.1.block.2.conv2.bias', 'first_stage_model.decoder.up.1.block.3.norm1.weight', 'first_stage_model.decoder.up.1.block.3.norm1.bias', 'first_stage_model.decoder.up.1.block.3.conv1.weight', 'first_stage_model.decoder.up.1.block.3.conv1.bias', 'first_stage_model.decoder.up.1.block.3.norm2.weight', 'first_stage_model.decoder.up.1.block.3.norm2.bias', 'first_stage_model.decoder.up.1.block.3.conv2.weight', 'first_stage_model.decoder.up.1.block.3.conv2.bias', 'first_stage_model.decoder.up.1.upsample.conv.weight', 'first_stage_model.decoder.up.1.upsample.conv.bias', 'first_stage_model.decoder.up.2.block.0.norm1.weight', 'first_stage_model.decoder.up.2.block.0.norm1.bias', 'first_stage_model.decoder.up.2.block.0.conv1.weight', 'first_stage_model.decoder.up.2.block.0.conv1.bias', 'first_stage_model.decoder.up.2.block.0.norm2.weight', 'first_stage_model.decoder.up.2.block.0.norm2.bias', 'first_stage_model.decoder.up.2.block.0.conv2.weight', 'first_stage_model.decoder.up.2.block.0.conv2.bias', 'first_stage_model.decoder.up.2.block.1.norm1.weight', 'first_stage_model.decoder.up.2.block.1.norm1.bias', 'first_stage_model.decoder.up.2.block.1.conv1.weight', 'first_stage_model.decoder.up.2.block.1.conv1.bias', 'first_stage_model.decoder.up.2.block.1.norm2.weight', 'first_stage_model.decoder.up.2.block.1.norm2.bias', 'first_stage_model.decoder.up.2.block.1.conv2.weight', 'first_stage_model.decoder.up.2.block.1.conv2.bias', 'first_stage_model.decoder.up.2.block.2.norm1.weight', 'first_stage_model.decoder.up.2.block.2.norm1.bias', 'first_stage_model.decoder.up.2.block.2.conv1.weight', 'first_stage_model.decoder.up.2.block.2.conv1.bias', 'first_stage_model.decoder.up.2.block.2.norm2.weight', 'first_stage_model.decoder.up.2.block.2.norm2.bias', 'first_stage_model.decoder.up.2.block.2.conv2.weight', 'first_stage_model.decoder.up.2.block.2.conv2.bias', 'first_stage_model.decoder.up.2.block.3.norm1.weight', 'first_stage_model.decoder.up.2.block.3.norm1.bias', 'first_stage_model.decoder.up.2.block.3.conv1.weight', 'first_stage_model.decoder.up.2.block.3.conv1.bias', 'first_stage_model.decoder.up.2.block.3.norm2.weight', 'first_stage_model.decoder.up.2.block.3.norm2.bias', 'first_stage_model.decoder.up.2.block.3.conv2.weight', 'first_stage_model.decoder.up.2.block.3.conv2.bias', 'first_stage_model.decoder.up.2.upsample.conv.weight', 'first_stage_model.decoder.up.2.upsample.conv.bias', 'first_stage_model.decoder.up.3.block.0.norm1.weight', 'first_stage_model.decoder.up.3.block.0.norm1.bias', 'first_stage_model.decoder.up.3.block.0.conv1.weight', 'first_stage_model.decoder.up.3.block.0.conv1.bias', 'first_stage_model.decoder.up.3.block.0.norm2.weight', 'first_stage_model.decoder.up.3.block.0.norm2.bias', 'first_stage_model.decoder.up.3.block.0.conv2.weight', 'first_stage_model.decoder.up.3.block.0.conv2.bias', 'first_stage_model.decoder.up.3.block.1.norm1.weight', 'first_stage_model.decoder.up.3.block.1.norm1.bias', 'first_stage_model.decoder.up.3.block.1.conv1.weight', 'first_stage_model.decoder.up.3.block.1.conv1.bias', 'first_stage_model.decoder.up.3.block.1.norm2.weight', 'first_stage_model.decoder.up.3.block.1.norm2.bias', 'first_stage_model.decoder.up.3.block.1.conv2.weight', 'first_stage_model.decoder.up.3.block.1.conv2.bias', 'first_stage_model.decoder.up.3.block.2.norm1.weight', 'first_stage_model.decoder.up.3.block.2.norm1.bias', 'first_stage_model.decoder.up.3.block.2.conv1.weight', 'first_stage_model.decoder.up.3.block.2.conv1.bias', 'first_stage_model.decoder.up.3.block.2.norm2.weight', 'first_stage_model.decoder.up.3.block.2.norm2.bias', 'first_stage_model.decoder.up.3.block.2.conv2.weight', 'first_stage_model.decoder.up.3.block.2.conv2.bias', 'first_stage_model.decoder.up.3.block.3.norm1.weight', 'first_stage_model.decoder.up.3.block.3.norm1.bias', 'first_stage_model.decoder.up.3.block.3.conv1.weight', 'first_stage_model.decoder.up.3.block.3.conv1.bias', 'first_stage_model.decoder.up.3.block.3.norm2.weight', 'first_stage_model.decoder.up.3.block.3.norm2.bias', 'first_stage_model.decoder.up.3.block.3.conv2.weight', 'first_stage_model.decoder.up.3.block.3.conv2.bias', 'first_stage_model.decoder.up.3.upsample.conv.weight', 'first_stage_model.decoder.up.3.upsample.conv.bias', 'first_stage_model.decoder.norm_out.weight', 'first_stage_model.decoder.norm_out.bias', 'first_stage_model.decoder.conv_out.weight', 'first_stage_model.decoder.conv_out.bias'].
[2024-10-16 11:22:22,984] [INFO] [RANK 0] > successfully loaded /data/models/cogview3/transformer/1/mp_rank_00_model_states.pt
0it [00:00, ?it/s]/data/anaconda3/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py:214: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
with torch.cuda.amp.autocast(enabled=False):
0it [00:00, ?it/s]
[rank0]: Traceback (most recent call last):
[rank0]: File "/data/app/CogView3/sat/sample_dit.py", line 208, in
[rank0]: sampling_main(args, model_cls=SATDiffusionEngine)
[rank0]: File "/data/app/CogView3/sat/sample_dit.py", line 169, in sampling_main
[rank0]: c, uc = model.conditioner.get_unconditional_conditioning(
[rank0]: File "/data/app/CogView3/sat/sgm/modules/encoders/modules.py", line 235, in get_unconditional_conditioning
[rank0]: c = self(batch_c)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/data/app/CogView3/sat/sgm/modules/encoders/modules.py", line 218, in forward
[rank0]: output = self.get_single_embedding(embedder, batch, output=output, force_zero_embeddings=force_zero_embeddings)
[rank0]: File "/data/app/CogView3/sat/sgm/modules/encoders/modules.py", line 156, in get_single_embedding
[rank0]: emb_out = embedder(batch[embedder.input_key])
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/data/app/CogView3/sat/sgm/modules/encoders/modules.py", line 356, in forward
[rank0]: outputs = self.transformer(input_ids=tokens)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1972, in forward
[rank0]: encoder_outputs = self.encoder(
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1107, in forward
[rank0]: layer_outputs = layer_module(
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 687, in forward
[rank0]: self_attention_outputs = self.layer[0](
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 593, in forward
[rank0]: normed_hidden_states = self.layer_norm(hidden_states)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 416, in forward
[rank0]: return fused_rms_norm_affine(
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 215, in fused_rms_norm_affine
[rank0]: return FusedRMSNormAffineFunction.apply(*args)
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: File "/data/anaconda3/lib/python3.10/site-packages/apex/normalization/fused_layer_norm.py", line 75, in forward
[rank0]: output, invvar = fused_layer_norm_cuda.rms_forward_affine(
[rank0]: RuntimeError: input must be a CUDA tensor

Expected behavior / 期待表现

期待如README正常运行

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant