Skip to content

This project is to convert ERNIE-VIL2 from paddlepaddle to pytorch format.

License

Notifications You must be signed in to change notification settings

dong03/ERNIEVIL2-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ERNIEVIL2-pytorch

This project is to convert ERNIE-VIL2 from paddlepaddle to pytorch format.

GitHub stars GitHub issues

Requirements

  • pip install -r requirements.txt
  • Follow requirements in official repo.
  • download paddle & torch ckpt from BaiduYun.

Evaluate

To conduct cross-modal similarity computation,

python test.py performance_check

You will get a similarity matrix between three images and three sentences, like this,

tensor([[0.3096, 0.1929, 0.1588],
        [0.2270, 0.2997, 0.1339],
        [0.0894, 0.1035, 0.3198]])

To check the calculation results before and after model conversion,

python check.py logit_check

You will get the output:

### pytorch result
visual_output:  [[-0.25474143 -0.72782516  0.02674462  0.48610407  1.4485253   0.5175752  1.0823581   0.3140268   0.32782146  0.4190097 ]]
text_output:  [[ 0.10204837 -0.5075943  -0.05125085  0.22701152  0.5774069  -0.54781747  -0.1122973   0.46482086  0.2952882   0.1963322 ]]

### paddle result
visual_output: [[-0.25474280, -0.72782505,  0.02674395,  0.48610494,  1.44852710, 0.51757193,  1.08235860,  0.31402659,  0.32782283,  0.41901165]]
text_output: [[ 0.10174122, -0.50740963, -0.05135643,  0.22727829,  0.57717800, -0.54775286, -0.11231337,  0.46472329,  0.29527009,  0.19644395]]

It can be seen that the result of our convert version is the same with the official paddlepaddle's version.

Change Describe

Text encoder:

This part mainly follows ERNIE-Pytorch's pipeline. It has also been merged into huggingface/[email protected].

However, key names in model state_dict and ErnieEmbeddings are slightly changed to suit paddle version's ERNIE-VIL2.

Visual Encoder

I rewrite the paddle version of VIT as it's quite difference wth huggingface's. Some initialization functions are discarded as normally we'll start from the pretrained checkpoint.(And I have no idea of their pytorch version)

Do it yourself

If you want to convert model on your own,

  1. download paddle version checkpoint
  2. python convert.py

Citation

If you use this work in a scientific publication, I would appreciate that you can also cite the following BibTex entries:

@misc{dong03@ERNIEVIL2-pytorch,
  title={ERNIEVIL2-pytorch},
  author={Chengbo Dong},
  howpublished={\url{https://github.com/dong03/ERNIEVIL2-pytorch}},
  year={2023}
}

(or at least \footnote{\url{https://github.com/dong03/ERNIEVIL2-pytorch}} 😆

About

This project is to convert ERNIE-VIL2 from paddlepaddle to pytorch format.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages