-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[request] Semantic segmentation documentation, training code and / or model weights #55
Comments
I would appreciate an example code for semantic segmentation. Can't do much with the model's output embeddings yet. |
STEGO, an unsupervised semantic segmentation model used DINO v1. cc. @mhamilton723 |
I have created this ( https://github.com/itsprakhar/Downstream-Dinov2 ) repo where I am writing code for using Dinov2 for downstream tasks such as segmentation and classification, you can take look, Create an issue or help improve it :) |
@itsprakhar |
@innat-asj, the pretraining does not require labels but finetuning for downstream tasks do. However the number of training samples required would be much less. The finetuning is kind of "few-shot fintetuning" you need some examples because that's how you tell the model what you really want! |
Probably missed if it's also followed in the paper, for segmentation and depth estimation. Coz, even if I need a few samples, that approach would be understood as semi-supervised. Now, as DINO is meant to be self-supervised, I was wondering do we have to have a fine-tune for downstream tasks using target signal or instead contrastive loss! |
Hi @innat-asj DINO (and DINOv2) are self supervised pretraining methods. Their goal is to create a pretrained vision encoder with only unlabeled data. This model can then output good embeddings that represent images. They are not classification, segmentation or depth models. They are just pretrained encoders. You can, however, build a segmentation model using DINOv2, by adding a seg. / depth / classif. head and training the head. We show in the paper that the head can be extremely small (just a linear layer), be trained on very few samples (eg ~1k depth images for NYUv2) and still perform competitively, because the encoder outputs good representations. If you are looking unsupervised segmentation, [STEGO] is a method leveraging a DINO to do that. [STEGO] https://arxiv.org/abs/2203.08414 |
@TimDarcet |
Has anyone managed to reproduce the segmentation results (82.5mIoU) on the VOC pascal 2012 dataset? |
how can I get the Semantic segmentation documentation and training code? |
Related issues:
The text was updated successfully, but these errors were encountered: