CLIP resolution mechanics #218

Don-Chad · 2023-06-05T08:27:39Z

Don-Chad
Jun 5, 2023

Hi all,

Can anyone explain how LAVIS is able to give accurate visual results, with CLIP/BLIP only being 256x256? That's really small. With this size you hardly can recognise any detail. Does it extract image sections, scaling these up and judging these individually?

lucas-0liveira · 2023-10-09T19:06:00Z

lucas-0liveira
Oct 9, 2023

It seems that LAVIS leverages the power of CLIP/BLIP by utilizing a combination of techniques. While the resolution of 256x256 may seem small for detailed recognition, LAVIS likely employs advanced algorithms that focus on extracting relevant image sections and intelligently upscaling them. This process allows it to make accurate visual assessments, even with limited pixel data. The integration of CLIP/BLIP may play a crucial role in enhancing the contextual understanding and overall accuracy of the results.

1 reply

Don-Chad Dec 28, 2023
Author

Ah thanks for that detailed answer! Appreciated. Now this does make you wonder, how is it that LLava is still scoring better then Lavis when it comes to object recognition?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLIP resolution mechanics #218

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

CLIP resolution mechanics #218

Don-Chad Jun 5, 2023

Replies: 1 comment · 1 reply

lucas-0liveira Oct 9, 2023

Don-Chad Dec 28, 2023 Author

Don-Chad
Jun 5, 2023

Replies: 1 comment 1 reply

lucas-0liveira
Oct 9, 2023

Don-Chad Dec 28, 2023
Author