Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training dimension problems #13

Open
pyzone49 opened this issue Apr 17, 2024 · 1 comment
Open

Training dimension problems #13

pyzone49 opened this issue Apr 17, 2024 · 1 comment

Comments

@pyzone49
Copy link

Hello, I am having problems when I try to modify the image size, tried it but had several issues in the forward function of the model, have you tried different sizes before ?
and if so what arguments should I change other than --imsize

@linhuixiao
Copy link
Owner

@pyzone49
Hi, for a model that uses CLIP and needs to change its image size, you may encounter numerous issues and require modifications in multiple places. In CLIP-VG, I have incorporated the capability to adapt to different image sizes during image transformations. However, since the CLIP model itself supports an image token length of 197, changing the image size necessitates re-initializing or interpolating upsampling for position embeddings. I provide the implementation of position embedding interpolating upsampling of CLIP for your, , you can refer to it. Furthermore, my latest work implements this feature (HiVG, https://github.com/linhuixiao/HiVG); however, the code is not fully open-sourced yet. So, please watching if you are interested.

        if size != h or size != w:
            new_abs_pos = F.interpolate(  
                abs_pos.reshape(1, size, size, -1).permute(0, 3, 1, 2),
                size=(h, w),
                mode="bicubic",
                antialias=True,
                align_corners=False,
            )
            new_abs_pos = new_abs_pos.permute(0, 2, 3, 1).reshape(1, h * w, -1)
            position_embedding = torch.cat([cls_pos.unsqueeze(0), new_abs_pos], dim=1) 
            embeddings = embeddings + position_embedding.repeat(batch_size, 1, 1)
        else:  # 14 * 14
            embeddings = embeddings + self.position_embedding(self.position_ids)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants