You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the paper and the code! I have one question regarding the process of how the image crops are obtained during inference. As far as I understand, the goal is to extract num_crops (which is a hyperparameter) of evenly spaced crops of the original image. In the code, you achieve this using two consecutive unfold operations on the H and W, after which you take num_crops evenly spaced out indices in the crops array. However, when visualizing the crops themselves, it seems that they often result in having similar crops of the same area, while other areas might not have a single crop representing them (crops 1 and 3 are the same area, no crops capture the table). Is this behaviour expected? I understand that it can partially be fixed by setting a bigger num_crops, however, the effect of non-evenly spaced crops will still persist.
The text was updated successfully, but these errors were encountered:
@avermilov Hi, thanks for your insightful question. Yes, this is the expected behavior. Since CLIP requires a fixed input size of (224x224), in theory, the sampling step size could be adaptively determined based on the resolution of different image contents. However, this is more of an engineering problem. Moreover, our experiments have shown that the current setting is already sufficient for quality score prediction. When predicting high-resolution images (such as 4K images), the image can be first scaled down while keeping the aspect ratio.
Hello, @zwx8981
Thank you for the paper and the code! I have one question regarding the process of how the image crops are obtained during inference. As far as I understand, the goal is to extract num_crops (which is a hyperparameter) of evenly spaced crops of the original image. In the code, you achieve this using two consecutive unfold operations on the H and W, after which you take num_crops evenly spaced out indices in the crops array. However, when visualizing the crops themselves, it seems that they often result in having similar crops of the same area, while other areas might not have a single crop representing them (crops 1 and 3 are the same area, no crops capture the table). Is this behaviour expected? I understand that it can partially be fixed by setting a bigger num_crops, however, the effect of non-evenly spaced crops will still persist.
The text was updated successfully, but these errors were encountered: