fail to load XVLMRetrieval checkpoint #15

oferidan1 · 2023-12-25T14:30:29Z

Hi,
Thanks for sharing your paper and code.
I am trying to run Retrieval.py with the provided checkpoint "xvlm_beit_1b_large_stage2_coco_rerun.th" but it fails to load. It seems the model code is not consistent with the checkpoint.
I get below error.
Can you please check it?
Thanks.
Ofer

Error(s) in loading state_dict for XVLMForRetrieval:
size mismatch for vision_encoder.blocks.0.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
File "/datasets1/ofer/X2-VLM/models/xvlm.py", line 612, in load_pretrained
msg = self.load_state_dict(state_dict, strict=False)
File "/datasets1/ofer/X2-VLM/test.py", line 303, in main
model.load_pretrained(args.checkpoint, config, is_eval=args.evaluate)
File "/datasets1/ofer/X2-VLM/test.py", line 388, in
main(args, config, test_type)
RuntimeError: Error(s) in loading state_dict for XVLMForRetrieval:
size mismatch for vision_encoder.blocks.0.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).

lpf992 · 2024-03-03T10:36:50Z

I have a similar problem, have you solved it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to load XVLMRetrieval checkpoint #15

fail to load XVLMRetrieval checkpoint #15

oferidan1 commented Dec 25, 2023 •

edited

Loading

lpf992 commented Mar 3, 2024

fail to load XVLMRetrieval checkpoint #15

fail to load XVLMRetrieval checkpoint #15

Comments

oferidan1 commented Dec 25, 2023 • edited Loading

lpf992 commented Mar 3, 2024

oferidan1 commented Dec 25, 2023 •

edited

Loading