Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How the image crops are obtained in inference #35

Open
avermilov opened this issue Feb 18, 2025 · 2 comments
Open

How the image crops are obtained in inference #35

avermilov opened this issue Feb 18, 2025 · 2 comments

Comments

@avermilov
Copy link

Hello, @zwx8981

Thank you for the paper and the code! I have one question regarding the process of how the image crops are obtained during inference. As far as I understand, the goal is to extract num_crops (which is a hyperparameter) of evenly spaced crops of the original image. In the code, you achieve this using two consecutive unfold operations on the H and W, after which you take num_crops evenly spaced out indices in the crops array. However, when visualizing the crops themselves, it seems that they often result in having similar crops of the same area, while other areas might not have a single crop representing them (crops 1 and 3 are the same area, no crops capture the table). Is this behaviour expected? I understand that it can partially be fixed by setting a bigger num_crops, however, the effect of non-evenly spaced crops will still persist.

Image
@zwx8981
Copy link
Owner

zwx8981 commented Feb 19, 2025

@avermilov Hi, thanks for your insightful question. Yes, this is the expected behavior. Since CLIP requires a fixed input size of (224x224), in theory, the sampling step size could be adaptively determined based on the resolution of different image contents. However, this is more of an engineering problem. Moreover, our experiments have shown that the current setting is already sufficient for quality score prediction. When predicting high-resolution images (such as 4K images), the image can be first scaled down while keeping the aspect ratio.

@avermilov
Copy link
Author

Got it, thank you for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants