Skip to content

Latest commit

 

History

History
50 lines (39 loc) · 2.3 KB

README.md

File metadata and controls

50 lines (39 loc) · 2.3 KB

Feature extraction

This folder provides the code for extracting the features needed for our method using Faster R-CNN.

Prerequisite data

Prior to extraction, the following files need to be prepared:

  1. MIMIC-CXR-JPG converted into 1024x1024 PNG images. These must be saved to the mimic-cxr-png folder. Run mimic_jpg2png() in converter.py.
python converter.py -p <input_path_to_mimic_cxr_jpg> -o <output_path_to_mimic_cxr_png>

After running this, you will obtain three files:

  • mimic_shape_full.pkl: contains the shape of images in the dataset.
  • mimic_shapeid_full.pkl: contains the shape index of image in mimic_shape_full.pkl.
  • dicom2id.pkl: contains the mapping between dicom id and the feature index.
  1. Faster-rcnn checkpoints. Make sure these are located in the checkpoints folder.
    • checkpoints/model_final_for_anatomy_gold.pth (Download link. It is used for anatomical structure detection and can be obtained by running train_anatomy.py)
    • checkpoints/model_final_for_vindr.pth (Download link. It is used for disease detection and can be obtained by running train-vindr-online.py)
  2. Dictionary files. Make sure these are in the dictionary folder.
    • dictionary/category_ana.pkl (An anatomical structure category set)
    • dictionary/GT_counting_adj.pkl (A co-occurrence matrix of findings in mimic-cxr-jpg)
    • dictionary/mimic_ans2label_full.pkl (A dictionary that maps the answer to the label in MIMIC-CXR-JPG)
  3. (Optional) The GT_counting_adj.pkl in step 3 can be generated by run
python dictionary/preparation.py -p <path_to_mimic_cxr_jpg>

Extraction

Working directory: ./feature_extraction

1, Anatomical structure feature extraction

python ana_bbox_generator.py

2, Disease feature extraction

The disease feature are extracted using the trained disease detection model, on the anatomical structure bounding boxes extracted in the previous step.

python bbox_gen_by_coords.py

3, Feature Combination

python combine_datasets.py

cmb_bbox_features_full.hdf5 will be generated in the data/medical_cxr_vqa folder.