SSL Augmentations #2164

DimitrisMantas · 2024-07-14T02:59:01Z

DimitrisMantas
Jul 14, 2024

I’m trying to use the MoCo and SimCLR trainers to help improve performance on a downstream task, but I’ve noticed it’s very easy for my representations to degrade and even collapse over time. I’m not doing anything weird, just using a plain Dataset and DataLoader implementation, and the trainers as-is, only playing around with the augmentations.

My understanding is that you’ve guys used them in the SSL4EO paper so I assume they work and it’s my augmentations. I basically have aerial images of rooftops and I’m trying to get the model to grasp different building materials. The issue is that the appearance of many of these features can vary significantly from scene to scene, resulting in them looking very similar in both color and texture to each other…

Anyway, I noticed that both trainers use augmentations that are not mentioned in the original papers, and there’s even a comment about color jitter not being “appropriate” for multispectral imagery. It’s clear that at least some thought went into designing these, so I was hoping for some insights into what worked for you and what didn’t.

adamjstewart · 2024-07-14T09:31:30Z

adamjstewart
Jul 14, 2024
Maintainer

it’s very easy for my representations to degrade and even collapse over time.

I think this is true for SSL as a whole, not just MoCo and SimCLR and not just for our TorchGeo implementations. We also noticed this during some of our experiments, especially with SimCLR.

My understanding is that you’ve guys used them in the SSL4EO paper

Yes.

so I assume they work and it’s my augmentations.

This is a big assumption. We got MoCo to work quite well. We never got SimCLR to work to a satisfactory degree. We believe this was due to computational limitations (we needed to use a much smaller batch size than was used in the paper), but it could also be due to augmentations. I would suggest focusing on MoCo first because we found it to be more forgiving. Also, unless your dataset has millions of images, it is probably useless for SSL. You should instead start with a model that has been pre-trained on millions of images and then later fine-tune it on your dataset.

I noticed that both trainers use augmentations that are not mentioned in the original papers, and there’s even a comment about color jitter not being “appropriate” for multispectral imagery.

As far as I know, there should be no augmentations that are not mentioned in the original papers. We did our best to faithfully recreate the augmentations as closely as possible. This may or may not be a good thing. We use lightly to help implement our trainers. The lightly authors actually recommend using different default augmentations than we did: #1828. So that is something worth playing around with. Also, the augmentations used in the original MoCo/SimCLR papers are designed for ImageNet, not for satellite imagery. ImageNet contains things like cars that can be hundreds of colors but are still a car. Some of the default augmentations like color jitter and random grayscale may be too extreme for satellite imagery where color is often more important than shape/texture.

Note that color jitter is simply a combination of adjusting brightness, contrast, saturation, and hue. Saturation and hue are only defined for RGB imagery, not multispectral imagery, so we skipped those. Also, random grayscale is only defined for RGB imagery, so we opted to average all bands into a single band after normalization. We also use seasonal contrast in our paper as an additional way to add "natural" data augmentation for a scene. Finally, the original papers were implemented in TensorFlow, JAX, and torchvision. There are many subtle differences between these implementations and the ones we use in Kornia, and hyperparameters may have different meanings. These should be the only modifications from the original papers, but let me know if I missed anything.

In our SSL4EO-L paper, @AABNassim did all MoCo/SimCLR pre-training and @nilsleh @yichiac did all of the semantic segmentation fine-tuning, so they may also be able to explain any tricks they discovered along the way that made pre-training or fine-tuning more stable. Hope that helps!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSL Augmentations #2164

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

SSL Augmentations #2164

DimitrisMantas Jul 14, 2024

Replies: 1 comment

adamjstewart Jul 14, 2024 Maintainer

DimitrisMantas
Jul 14, 2024

adamjstewart
Jul 14, 2024
Maintainer