Issue with ViT in BioClip Visual Part: ViT Returns CLS Token Instead of Logits #52

Link7808 · 2024-09-26T08:21:11Z

We are using the visual part (ViT) of BioClip to process images. However, there is an issue with the forward method in BaseCAM.
In the following line of code:
self.outputs = outputs = self.activations_and_grads(input_tensor)
target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1)
The outputs in this case is the CLS token embedding, which is a high-dimensional vector used to represent the global semantic information of the input image. This embedding is not a classification result or logits, but rather a feature vector.

The text was updated successfully, but these errors were encountered:

johnbradley · 2024-09-26T12:38:46Z

@Link7808 Do you have some example code that reproduces this issue? Is the BaseCAM class from https://github.com/jacobgil/pytorch-grad-cam ? If so I have a first attempt jupyter notebook that uses pytorch-grad-cam with pybioclip.

Link7808 · 2024-09-26T18:47:07Z

@johnbradley
Let me clarify the issue. In this notebook, we are using:
classifier = TreeOfLifeClassifier() model = classifier.model.visual targets = None
BaseCAM attempts to automatically get the index of the predicted class with:
if targets is None: target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1) targets = [ClassifierOutputTarget(category) for category in target_categories]
However, the issue arises because outputs is the CLS token embedding, which is a feature vector representing the global semantic information of the input image. It’s not logits, so it can’t be used to correctly get target_categories.
This causes Grad-CAM to not correctly select the target for visualization.

johnbradley · 2024-09-27T17:15:27Z

@Link7808 After looking through the gradcam code I see what you mean about it expecting a different output.
I updated the notebook in in the gradcam branch to use a custom model.
This follows a pattern I found on a gradcam PR.

The changes I made where:

class ImageClassifier(nn.Module):
    def __init__(self):
        super(ImageClassifier, self).__init__()
        classifier = TreeOfLifeClassifier()
        self.clip = classifier.model
        self.txt_features = classifier.txt_features
        self.txt_names = classifier.txt_names

    def forward(self, x):
        img_features = self.clip.visual(x)
        img_features = F.normalize(img_features, dim=-1)
        logits = (self.clip.logit_scale.exp() * img_features @ self.txt_features)
        result = F.softmax(logits, dim=1)

        # Print out the target found
        for target in np.argmax(result.cpu().data.numpy(), axis=-1):
            print(target, self.txt_names[target])

        return result

The above code will print out the target number and label associated with it.

Outside of adding a couple imports the only other changes were to instantiate the new model and set target_layers.

model = ImageClassifier()
...

target_layers = [model.clip.visual.transformer.resblocks[-1].ln_1]

Link7808 · 2024-10-06T06:27:52Z

I noticed you’re using eigencam. Do you have any thoughts on why Grad-CAM is performing poorly?

johnbradley · 2024-10-07T14:35:08Z

@Link7808 I've found setting eigen_smooth=True when calling create_grad_cam_image() with method="gradcam" results in more reasonable outputs.

Help for eigen_smooth in vit_example.py says

Reduce noise by taking the first principle componet of cam_weights*activations

This pytorch_grad_cam code seems to be where the eigen_smooth flag takes effect.

johnbradley · 2024-10-07T14:37:31Z

I'm not sure why without eigen_smooth=True the results don't look very good.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with ViT in BioClip Visual Part: ViT Returns CLS Token Instead of Logits #52

Issue with ViT in BioClip Visual Part: ViT Returns CLS Token Instead of Logits #52

Link7808 commented Sep 26, 2024

johnbradley commented Sep 26, 2024

Link7808 commented Sep 26, 2024

johnbradley commented Sep 27, 2024

Link7808 commented Oct 6, 2024

johnbradley commented Oct 7, 2024

johnbradley commented Oct 7, 2024

Issue with ViT in BioClip Visual Part: ViT Returns CLS Token Instead of Logits #52

Issue with ViT in BioClip Visual Part: ViT Returns CLS Token Instead of Logits #52

Comments

Link7808 commented Sep 26, 2024

johnbradley commented Sep 26, 2024

Link7808 commented Sep 26, 2024

johnbradley commented Sep 27, 2024

Link7808 commented Oct 6, 2024

johnbradley commented Oct 7, 2024

johnbradley commented Oct 7, 2024