You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thanks for sharing your paper and code.
I am trying to run Retrieval.py with the provided checkpoint "xvlm_beit_1b_large_stage2_coco_rerun.th" but it fails to load. It seems the model code is not consistent with the checkpoint.
I get below error.
Can you please check it?
Thanks.
Ofer
Error(s) in loading state_dict for XVLMForRetrieval:
size mismatch for vision_encoder.blocks.0.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
File "/datasets1/ofer/X2-VLM/models/xvlm.py", line 612, in load_pretrained
msg = self.load_state_dict(state_dict, strict=False)
File "/datasets1/ofer/X2-VLM/test.py", line 303, in main
model.load_pretrained(args.checkpoint, config, is_eval=args.evaluate)
File "/datasets1/ofer/X2-VLM/test.py", line 388, in
main(args, config, test_type)
RuntimeError: Error(s) in loading state_dict for XVLMForRetrieval:
size mismatch for vision_encoder.blocks.0.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
The text was updated successfully, but these errors were encountered:
Hi,
Thanks for sharing your paper and code.
I am trying to run Retrieval.py with the provided checkpoint "xvlm_beit_1b_large_stage2_coco_rerun.th" but it fails to load. It seems the model code is not consistent with the checkpoint.
I get below error.
Can you please check it?
Thanks.
Ofer
Error(s) in loading state_dict for XVLMForRetrieval:
size mismatch for vision_encoder.blocks.0.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
File "/datasets1/ofer/X2-VLM/models/xvlm.py", line 612, in load_pretrained
msg = self.load_state_dict(state_dict, strict=False)
File "/datasets1/ofer/X2-VLM/test.py", line 303, in main
model.load_pretrained(args.checkpoint, config, is_eval=args.evaluate)
File "/datasets1/ofer/X2-VLM/test.py", line 388, in
main(args, config, test_type)
RuntimeError: Error(s) in loading state_dict for XVLMForRetrieval:
size mismatch for vision_encoder.blocks.0.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.0.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.1.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.2.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.3.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.5.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.6.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.7.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.8.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.9.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.10.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.11.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.12.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.13.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.14.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.15.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.16.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.17.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.18.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.19.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.20.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.21.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.22.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_bias_table: copying a param with shape torch.Size([2212, 16]) from checkpoint, the shape in current model is torch.Size([732, 16]).
size mismatch for vision_encoder.blocks.23.attn.relative_position_index: copying a param with shape torch.Size([577, 577]) from checkpoint, the shape in current model is torch.Size([197, 197]).
The text was updated successfully, but these errors were encountered: