Skip to content

Latest commit

 

History

History
60 lines (53 loc) · 3.78 KB

preparing_datasets.md

File metadata and controls

60 lines (53 loc) · 3.78 KB

Train

COCONut-S and B

COCONut-S consists of COCO train2017, and B consists of both COCO train2017 and unlabeled set. You should download the images from COCO and our COCONut annotations from Kaggle or Huggingface. We follow detectron2 definition of the dataset. The dataset should be organized as follow:

datasets
└── coco
    ├── annotations 
    │   └── panoptic_train2017.json # coconut-b.json / coconut-s.json
    ├── panoptic_train2017  # coconut-b / coconut-s
    ├── train2017 # original COCO dataset train and unlabeled set images / original COCO train set.

It is noted that the folder names are fixed regardless using COCONut-S or COCONut-B. You can modify it in the detectron2 definition, but so far we don't support it.

COCONut-L

COCONut-Large consists of three subsets from COCO train2017, COCO unlabeled set and subsets from Objects365. To use the COCONut-Large panoptic masks, you should follow the steps below:

  1. Download the panoptic masks and annotation json file from huggingface
  2. Download the images from Objects365. The images are organized using patches, please download the corresponding raw patches:patch32,patch35,patch40 and patch50 from the official website. We also provide a gdrive link for downloading the subsets of the images.
  3. Follow the instruction to set up COCONut-B, which is used to build COCONut-L. The folder organization should be as follow:
datasets
└── coco
   ├── annotations 
   │   └── panoptic_train2017.json # coconut-b.json
   ├── panoptic_train2017  # coconut-b
   ├── train2017 # original COCO dataset train and unlabeled set images
  1. Link the Objects365 images and panoptic masks to the coco/train_2017 and coco/panoptic_train2017 respectively using the dataset path of COCONut-B.
objects365/images ----> coco/train2017
object365/panoptic_masks ----> coco/panoptic_train2017
  1. Merge the object365 json files to COCONut-B json files using the 'merged.py' script. Then it is ready to be used.

Eval

relabeled COCO-val

Our relabeled COCO-val is similar to be used in COCO val but only replacing our annotations. Similarly, you can download it from Kaggle or Huggingface. The dataset should be organized as follow:

datasets
└── coco
    ├── annotations 
    │   └── panoptic_val2017.json # relabeled_coco_val.json
    ├── panoptic_val2017  # relabeled coco val
    ├── val2017 # original COCO val set

COCONut-val

  1. Similar to COCONut-Large, images need to be downloaded from Objects365, we provide a link to download the selected val set images. Link the image to COCO val2017.
  2. Download the panoptic masks from huggingface. Link the panoptic masks to COCO panoptci_val2017.
  3. Then merge the downloaded COCONut-val and relabeled COCO-val jsons using merged.py.

Instance Segmentation

Please refer to the instance segmentation tutorial for exploring converting panoptic masks to instance masks.

Object Detection

Please refer to bounding box extraction script for extracting the bounding boxes from panoptic masks.