Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to load XVLMRetrieval checkpoint #15

Open
oferidan1 opened this issue Dec 25, 2023 · 1 comment
Open

fail to load XVLMRetrieval checkpoint #15

oferidan1 opened this issue Dec 25, 2023 · 1 comment

Comments

@oferidan1
Copy link

oferidan1 commented Dec 25, 2023

Hi,
Thanks for sharing your paper and code.
I am trying to run Retrieval.py with the provided checkpoint "xvlm_beit_1b_large_stage2_coco_rerun.th" but it fails to load. It seems the model code is not consistent with the checkpoint.
I get below error.
Can you please check it?
Thanks.
Ofer

Error(s) in loading state_dict for XVLMForRetrieval:
size mismatch for vision_encoder.blocks.0.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
File "/datasets1/ofer/X2-VLM/models/xvlm.py", line 612, in load_pretrained
msg = self.load_state_dict(state_dict, strict=False)
File "/datasets1/ofer/X2-VLM/test.py", line 303, in main
model.load_pretrained(args.checkpoint, config, is_eval=args.evaluate)
File "/datasets1/ofer/X2-VLM/test.py", line 388, in
main(args, config, test_type)
RuntimeError: Error(s) in loading state_dict for XVLMForRetrieval:
size mismatch for vision_encoder.blocks.0.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).

@lpf992
Copy link

lpf992 commented Mar 3, 2024

I have a similar problem, have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants