Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[request] Semantic segmentation documentation, training code and / or model weights #55

Open
patricklabatut opened this issue Apr 24, 2023 · 10 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@patricklabatut
Copy link
Contributor

patricklabatut commented Apr 24, 2023

Related issues:

@kanishkanarch
Copy link

I would appreciate an example code for semantic segmentation. Can't do much with the model's output embeddings yet.
Kindly point me out if I am overlooking a relevant reference.

@innat-asj
Copy link

STEGO, an unsupervised semantic segmentation model used DINO v1.

cc. @mhamilton723

@itsprakhar
Copy link

I have created this ( https://github.com/itsprakhar/Downstream-Dinov2 ) repo where I am writing code for using Dinov2 for downstream tasks such as segmentation and classification, you can take look, Create an issue or help improve it :)

Downstream Dinov2 Segmentation and Classification

@innat-asj
Copy link

@itsprakhar
Ideally, there should be no need for a mask/label for downstream tasks, right? (for self-sup)

@itsprakhar
Copy link

@innat-asj, the pretraining does not require labels but finetuning for downstream tasks do. However the number of training samples required would be much less. The finetuning is kind of "few-shot fintetuning" you need some examples because that's how you tell the model what you really want!

@innat-asj
Copy link

The finetuning is kind of "few-shot fintetuning" you need some examples because that's how you tell the model what you really want!

Probably missed if it's also followed in the paper, for segmentation and depth estimation. Coz, even if I need a few samples, that approach would be understood as semi-supervised.

Now, as DINO is meant to be self-supervised, I was wondering do we have to have a fine-tune for downstream tasks using target signal or instead contrastive loss!

@TimDarcet
Copy link

Hi @innat-asj

DINO (and DINOv2) are self supervised pretraining methods. Their goal is to create a pretrained vision encoder with only unlabeled data. This model can then output good embeddings that represent images.

They are not classification, segmentation or depth models. They are just pretrained encoders. You can, however, build a segmentation model using DINOv2, by adding a seg. / depth / classif. head and training the head. We show in the paper that the head can be extremely small (just a linear layer), be trained on very few samples (eg ~1k depth images for NYUv2) and still perform competitively, because the encoder outputs good representations.
These heads still need labeled samples to be trained.

If you are looking unsupervised segmentation, [STEGO] is a method leveraging a DINO to do that.

[STEGO] https://arxiv.org/abs/2203.08414

@innat-asj
Copy link

@TimDarcet
Thanks for the clarification.

@levayz
Copy link

levayz commented Aug 7, 2023

Has anyone managed to reproduce the segmentation results (82.5mIoU) on the VOC pascal 2012 dataset?

@APeiZou
Copy link

APeiZou commented Sep 11, 2024

how can I get the Semantic segmentation documentation and training code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants