Skip to content

Commit

Permalink
readme update and demo
Browse files Browse the repository at this point in the history
  • Loading branch information
lucabarsellotti committed Nov 28, 2024
1 parent f77227c commit 817fb64
Show file tree
Hide file tree
Showing 61 changed files with 325 additions and 8 deletions.
72 changes: 68 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,23 @@
# Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

<div align="center">
<figure>
<img alt="" src="./assets/overview.png">
</figure>
</div>

Talk2DINO is an open-vocabulary segmentation architecture that combines the localized and semantically rich patch-level features of DINOv2 with the multimodal understanding capabilities of CLIP. This is achieved by learning a projection from the CLIP text encoder to the embedding space of DINOv2 using only image-caption pairs and exploiting the self-attention properties of DINOv2 to understand which part of the image has to be aligned to the corresponding caption.

## Results

| **Image** | **Ground Truth** | **FreeDA** | **ProxyCLIP** | **CLIP-DINOiser** | **Ours (Talk2DINO)** |
|-----------|------------------|------------|---------------|-------------------|------------------|
| ![Image](assets/qualitatives/voc/2_img.jpg) | ![Ground Truth](assets/qualitatives/voc/2_gt.png) | ![FreeDA](assets/qualitatives/voc/2_freeda.png) | ![ProxyCLIP](assets/qualitatives/voc/2_proxy.png) | ![CLIP-DINOiser](assets/qualitatives/voc/2_clipdinoiser.png) | ![Ours](assets/qualitatives/voc/2_talk2dino.png) |
| ![Image](assets/qualitatives/object/2r_img.png) | ![Ground Truth](assets/qualitatives/object/2r_gt.png) | ![FreeDA](assets/qualitatives/object/2r_freeda.png) | ![ProxyCLIP](assets/qualitatives/object/2r_proxy.png) | ![CLIP-DINOiser](assets/qualitatives/object/2r_clipdinoiser.png) | ![Ours](assets/qualitatives/object/2r_talk2dino.png) |
| ![Image](assets/qualitatives/cityscapes/1r_image.png) | ![Ground Truth](assets/qualitatives/cityscapes/1r_gt.png) | ![FreeDA](assets/qualitatives/cityscapes/1r_freeda.png) | ![ProxyCLIP](assets/qualitatives/cityscapes/1r_proxyclip.png) | ![CLIP-DINOiser](assets/qualitatives/cityscapes/1r_clipdinoiser.png) | ![Ours](assets/qualitatives/cityscapes/1r_talk2dino.png) |
| ![Image](assets/qualitatives/context/1r_img.png) | ![Ground Truth](assets/qualitatives/context/1r_gt.png) | ![FreeDA](assets/qualitatives/context/1r_freeda.png) | ![ProxyCLIP](assets/qualitatives/context/1r_proxy.png) | ![CLIP-DINOiser](assets/qualitatives/context/1r_clipdinoiser.png) | ![Ours](assets/qualitatives/context/1r_talk2dino.png) |


## Installation
```bash
conda create --name talk2dino python=3.9
Expand Down Expand Up @@ -87,14 +106,59 @@ COCO-Object dataset uses only object classes from COCO-Stuff164k dataset by coll
python convert_dataset/convert_coco.py data/coco_stuff164k/ -o data/coco_stuff164k/
```

To evaluate the model on open-vocabulary segmentation benchmarks, use the `src/open_vocabulary_segmentation/main.py` script. Select the appropriate configuration based on the model, benchmark, and PAMR settings. Below is an example to evaluate the ViT-Base model on Cityscapes without PAMR:
To evaluate the model on open-vocabulary segmentation benchmarks, use the `src/open_vocabulary_segmentation/main.py` script. Select the appropriate configuration based on the model, benchmark, and PAMR settings. The available models are ``[vitb, vitl]``, while the available benchmarks are ``[ade, cityscapes, voc, voc_bg, context, context_bg, cityscapes, coco_object, stuff]``. Below we provide the list of evaluations to reproduce the results reported in the paper for the ViT-Base architecture:

```bash
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/cityscapes/dinotext_cityscapes_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/cityscapes/eval_cityscapes.yml
# ADE20K
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/ade/dinotext_ade_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/ade/eval_ade_pamr.yml
# Cityscapes
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/cityscapes/dinotext_cityscapes_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/cityscapes/eval_cityscapes_pamr.yml
# Pascal VOC (without background)
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/voc/dinotext_voc_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/voc/eval_voc_pamr.yml
# Pascal VOC (with background)
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/voc_bg/dinotext_voc_bg_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/voc_bg/eval_voc_bg_pamr.yml
# Pascal Context (without background)
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/context/dinotext_context_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/context/eval_context_pamr.yml
# Pascal Context (with background)
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/context_bg/dinotext_context_bg_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/context_bg/eval_context_bg_pamr.yml
# COCOStuff
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/stuff/dinotext_stuff_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/stuff/eval_stuff_pamr.yml
# COCO Object
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/coco_object/dinotext_coco_object_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/coco_object/eval_coco_object_pamr.yml
```

ViT-Base model on Cityscapes with PAMR:
Instead, the evaluations for the ViT-Large architecture are:

```bash
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/cityscapes/dinotext_cityscapes_vitb_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/cityscapes/eval_cityscapes_pamr.yml
# ADE20K
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/ade/dinotext_ade_vitl_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/ade/eval_ade_pamr.yml
# Cityscapes
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/cityscapes/dinotext_cityscapes_vitl_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/cityscapes/eval_cityscapes_pamr.yml
# Pascal VOC (without background)
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/voc/dinotext_voc_vitl_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/voc/eval_voc_pamr.yml
# Pascal VOC (with background)
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/voc_bg/dinotext_voc_bg_vitl_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/voc_bg/eval_voc_bg_vitl_pamr.yml
# Pascal Context (without background)
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/context/dinotext_context_vitl_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/context/eval_context_pamr.yml
# Pascal Context (with background)
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/context_bg/dinotext_context_bg_vitl_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/context_bg/eval_context_bg_vitl_pamr.yml
# COCOStuff
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/stuff/dinotext_stuff_vitl_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/stuff/eval_stuff_pamr.yml
# COCO Object
python -m torch.distributed.run src/open_vocabulary_segmentation/main.py --eval --eval_cfg src/open_vocabulary_segmentation/configs/coco_object/dinotext_coco_object_vitl_mlp_infonce.yml --eval_base src/open_vocabulary_segmentation/configs/coco_object/eval_coco_object_vitl_pamr.yml
```
Binary file added assets/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/pikachu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1_freeda.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1_gt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1_image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1_proxyclip.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1_talk2dino.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1r_freeda.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1r_gt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1r_image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1r_proxyclip.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/cityscapes/1r_talk2dino.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/context/1r_clipdinoiser.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/context/1r_freeda.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/context/1r_gt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/context/1r_img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/context/1r_proxy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/context/1r_talk2dino.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/object/2r_clipdinoiser.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/object/2r_freeda.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/object/2r_gt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/object/2r_img.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/qualitatives/object/2r_proxy.png
Binary file added assets/qualitatives/object/2r_talk2dino.png
Binary file added assets/qualitatives/voc/1_clipdinoiser.png
Binary file added assets/qualitatives/voc/1_freeda.png
Binary file added assets/qualitatives/voc/1_gt.png
Binary file added assets/qualitatives/voc/1_img.jpg
Binary file added assets/qualitatives/voc/1_proxy.png
Binary file added assets/qualitatives/voc/1_talk2dino.png
Binary file added assets/qualitatives/voc/2_clipdinoiser.png
Binary file added assets/qualitatives/voc/2_freeda.png
Binary file added assets/qualitatives/voc/2_gt.png
Binary file added assets/qualitatives/voc/2_img.jpg
Binary file added assets/qualitatives/voc/2_proxy.png
Binary file added assets/qualitatives/voc/2_talk2dino.png
53 changes: 53 additions & 0 deletions demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import torch
import numpy as np
from omegaconf import OmegaConf
from torchvision.io import read_image
import matplotlib.pyplot as plt
import sys

sys.path.insert(0, "src/open_vocabulary_segmentation")
from models.dinotext import DINOText
from models import build_model

def plot_qualitative(image, sim, output_path, palette):
qualitative_plot = np.zeros((sim.shape[0], sim.shape[1], 3)).astype(np.uint8)

for j in list(np.unique(sim)):
qualitative_plot[sim == j] = np.array(palette[j])
plt.axis('off')
plt.imshow(image)
plt.imshow(qualitative_plot, alpha=0.6)
plt.tight_layout()
plt.savefig(output_path, bbox_inches='tight', pad_inches=0)


device = "cuda"
config_file = "src/open_vocabulary_segmentation/configs/cityscapes/dinotext_cityscapes_vitb_mlp_infonce.yml"
output_file = "pikachu_seg.png"
cfg = OmegaConf.load(config_file)

model = build_model(cfg.model)
model.to(device).eval()

img = read_image("assets/pikachu.png").to(device).float().unsqueeze(0)
text = ["pikachu", "traffic sign", "forest", "road"]
palette = [
[255, 0, 0],
[255, 255, 0],
[0, 255, 0],
[0, 255, 255],
# [0, 0, 255],
# [128, 128, 128]
]

with torch.no_grad():
text_emb = model.build_dataset_class_tokens("sub_imagenet_template", text)
text_emb = model.build_text_embedding(text_emb)

mask, _ = model.generate_masks(img, img_metas=None, text_emb=text_emb, classnames=text, apply_pamr=True)
# background = torch.ones_like(mask[:, :1]) * 0.55
# mask = torch.cat([background, mask], dim=1)

mask = mask.argmax(dim=1)

plot_qualitative(img.cpu()[0].permute(1,2,0).int().numpy(), mask.cpu()[0].numpy(), output_file, palette)
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ model:
resize_dim: 518
type: DINOText
use_avg_text_token: false
with_bg_clean: true
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ model:
model_name: dinov2_vitl14_reg
proj_class: vitl_mlp_infonce
proj_model: ProjectionLayer
proj_name: vitb_mlp_infonce
proj_name: vitl_mlp_infonce
resize_dim: 518
type: DINOText
use_avg_text_token: false
with_bg_clean: true
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
evaluate:
pamr: false
bg_thresh: 0.54
kp_w: 0.3

pred_qual_path: null
gt_qual_path: null

eval_only: true
template: sub_imagenet_template
task:
# - voc
# - voc20
# - context
# - context59
# - coco_stuff
- coco_object
# - cityscapes
# - ade20k

# training splits
t_voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_voc12_20.py
t_context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_context59.py

# evaluation
voc: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12.py
voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12_20.py
context: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context.py
context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context59.py
coco_stuff: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/stuff.py
coco_object: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/coco.py
cityscapes: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/cityscapes.py
ade20k: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/ade20k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
evaluate:
pamr: true
bg_thresh: 0.54
kp_w: 0.3

pred_qual_path: null
gt_qual_path: null

eval_only: true
template: sub_imagenet_template
task:
# - voc
# - voc20
# - context
# - context59
# - coco_stuff
- coco_object
# - cityscapes
# - ade20k

# training splits
t_voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_voc12_20.py
t_context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_context59.py

# evaluation
voc: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12.py
voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12_20.py
context: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context.py
context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context59.py
coco_stuff: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/stuff.py
coco_object: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/coco.py
cityscapes: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/cityscapes.py
ade20k: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/ade20k.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ model:
model_name: dinov2_vitl14_reg
proj_class: vitl_mlp_infonce
proj_model: ProjectionLayer
proj_name: vitb_mlp_infonce
proj_name: vitl_mlp_infonce
resize_dim: 518
type: DINOText
use_avg_text_token: false
use_avg_text_token: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
evaluate:
pamr: false
bg_thresh: 0.54
kp_w: 0.3

pred_qual_path: null
gt_qual_path: null

eval_only: true
template: sub_imagenet_template
task:
# - voc
# - voc20
- context
# - context59
# - coco_stuff
# - coco_object
# - cityscapes
# - ade20k

# training splits
t_voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_voc12_20.py
t_context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_context59.py

# evaluation
voc: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12.py
voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12_20.py
context: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context.py
context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context59.py
coco_stuff: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/stuff.py
coco_object: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/coco.py
cityscapes: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/cityscapes.py
ade20k: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/ade20k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
evaluate:
pamr: true
bg_thresh: 0.54
kp_w: 0.3

pred_qual_path: null
gt_qual_path: null

eval_only: true
template: sub_imagenet_template
task:
# - voc
# - voc20
- context
# - context59
# - coco_stuff
# - coco_object
# - cityscapes
# - ade20k

# training splits
t_voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_voc12_20.py
t_context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_context59.py

# evaluation
voc: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12.py
voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12_20.py
context: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context.py
context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context59.py
coco_stuff: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/stuff.py
coco_object: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/coco.py
cityscapes: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/cityscapes.py
ade20k: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/ade20k.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ model:
model_name: dinov2_vitl14_reg
proj_class: vitl_mlp_infonce
proj_model: ProjectionLayer
proj_name: vitb_mlp_infonce
proj_name: vitl_mlp_infonce
resize_dim: 518
type: DINOText
use_avg_text_token: false
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
evaluate:
pamr: false
bg_thresh: 0.54
kp_w: 0.3

pred_qual_path: null
gt_qual_path: null

eval_only: true
template: sub_imagenet_template
task:
- voc
# - voc20
# - context
# - context59
# - coco_stuff
# - coco_object
# - cityscapes
# - ade20k

# training splits
t_voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_voc12_20.py
t_context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_context59.py

# evaluation
voc: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12.py
voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12_20.py
context: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context.py
context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context59.py
coco_stuff: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/stuff.py
coco_object: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/coco.py
cityscapes: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/cityscapes.py
ade20k: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/ade20k.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
evaluate:
pamr: true
bg_thresh: 0.54
kp_w: 0.3

pred_qual_path: null
gt_qual_path: null

eval_only: true
template: sub_imagenet_template
task:
- voc
# - voc20
# - context
# - context59
# - coco_stuff
# - coco_object
# - cityscapes
# - ade20k

# training splits
t_voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_voc12_20.py
t_context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/t_pascal_context59.py

# evaluation
voc: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12.py
voc20: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_voc12_20.py
context: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context.py
context59: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/pascal_context59.py
coco_stuff: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/stuff.py
coco_object: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/coco.py
cityscapes: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/cityscapes.py
ade20k: src/open_vocabulary_segmentation/segmentation/configs/_base_/datasets/ade20k.py

0 comments on commit 817fb64

Please sign in to comment.