Skip to content

Commit

Permalink
Inital upload.
Browse files Browse the repository at this point in the history
The scripts available were originally developed withing the binary label prediction repository
  • Loading branch information
mdcjansen committed May 21, 2024
1 parent 358fc26 commit 0d1b8e8
Show file tree
Hide file tree
Showing 43 changed files with 7,493 additions and 0 deletions.
431 changes: 431 additions & 0 deletions CNN_testing/DenseNet121_testset.py

Large diffs are not rendered by default.

144 changes: 144 additions & 0 deletions CNN_testing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# CNN testing

The scripts in this folder are adapted versions of the training scripts. Here, trained models can be run on a user
specified test set. Results will be logged to wandb and a summary of statistics will be given in csv format

## Table of contents
* [Development](#development)
* [CNN models](#cnn-models)
* [CNN input](#CNN-input)
* [CNN output](#CNN-output)


## Development
This folder contains debugging versions of the CNN scripts. This are not fully functional and will most likely not
produce proper results


## CNN models
All scripts available in this folder can be used to test the CNN the script is named after.
All scripts require the same type of [input](#CNN-input) and will log their training measurements to wandb. A csv
file is also produced, which will be explained in detail down [below](#CNN-output).

The CNNs can all be run in the same manner, by activating the pytorch2 environment and running the desired script in
said environment. An example of running one of the CNNs is given below:
```
conda activate pytorch2
```

```
python AlexNet.py
```


## CNN input

### Input images
Input JPEG images are taken from the training and validation folder, which can be specified in the hyperparameter csv file.
At minimum, the images used have to contain their corresponding study id within the filename in the following format: ```prefix_001_suffix.jpg```
For data organization purposes it is recommended to also have folders with a similar name structure, though this is not
required for the CNN in order for it to be trained.
All models require the images to be 512px in size.

An example of the correct folder structure is shown below:

Structured folder:
```
.
├── ...
├── Training
│ ├── SID_001_characteristics
│ │ ├── SID_001_characteristics-A.jpg
│ │ ├── SID_001_characteristics-B.jpg
│ │ └── ...
│ ├── SID_002_characteristics
│ │ ├── SID_002_characteristics-A.jpg
│ │ ├── SID_002_characteristics-B.jpg
│ │ └── ...
├── Validation
│ ├── SID_003_characteristics
│ │ ├── SID_003_characteristics-A.jpg
│ │ ├── SID_003_characteristics-B.jpg
│ │ └── ...
│ │ └── ...
│ └── ...
└── ...
```

Unstructured folder:
```
.
├── ...
├── Training
│ ├── SID_001_characteristics-A.jpg
│ ├── SID_001_characteristics-B.jpg
│ ├── SID_002_characteristics-A.jpg
│ ├── SID_002_characteristics-B.jpg
│ └── ...
├── Validation
│ ├── SID_003_characteristics-A.jpg
│ ├── SID_003_characteristics-B.jpg
│ └── ...
│ └── ...
└── ...
```


### Hyperparameter csv
The path to the hyperparameter csv file should be specified by the user, before running the CNNs. Paths can be specified
at the top of each of the CNN scripts. Below an example version of the csv file is shown, along with a description of
each variable. The hyperparameter.csv file in this folder can be modified and used for training the CNNs


| Variable | Value | Description |
|---------------------|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| root_dir | D:\path\to\root_dir | Path to local directory, containing a training and validation folder for training CNNs |
| xlsx_path | D:\path\to\binary.xlsx | Path to local [xlsx file](#binary-classification-xlsx), containing two columns. The first column must have all unique study IDs, the second columns the binary classes for majority (1) and minority (0) |
| train_dirname | Training | The name of the folder in root_dir, containing the images for training the model |
| val_dirname | Validation | The name of the folder in root_dir, containing the images for validating the model |
| wandb_name | wandb_project_name | Name of the project that will be stored on wandb |
| wandb_save | D:\path\to\local_wandb_save_folder | Path to local directory where models and CNN data will be stored that is logged on wandb |
| model_param_csv | D:\path\to\CNN_model_hyperparameters.csv | Name of the csv file where the model will be stored along with the hyperparameters to create said model |
| dataload_workers | 3 | Number of multiprocessing workers to be used for the dataloaders (3 workers was determined to be optimal for an Intel(R) I9-13900K |
| accumulation_steps | 5 | Accumulation step size to be taken during training and validation |
| num_epochs | 50 | Number of epochs a model should be trained for |
| num_trials | 100 | Number of trials the code should run for, where one trial equals one model |
| es_counter | 0 | Start value of the early stop counter |
| es_limit | 15 | Value at which early stop is triggered and a trial will be terminated |
| tl_learn_rate | 1e-4; 5e-4;1e-3;2e-3;3e-3;4e-3;6e-3;8e-3;1e-2 | Learning rate values to be chosen at random by the CNN during training |
| tl_batch_norm | True;False | Allow for batch normalization for the entire trial. Statements chosen at random at the start of a trial |
| tl_batch_size_min | 64 | Minimum number of images to be processed simultaneously by a model during training and validation |
| tl_batch_size_max | 512 | Maximum number of images to be processed simultaneously |
| tl_batch_size_step | 64 | Step size from the minimum batch size to the maximum batch size. Here resulting in options 64, 128, 192, 256, 320, 384, 448, and 512 |
| tl_weight_decay_min | 1e-5 | Minimum weight decay value |
| tl_weight_decay_max | 1e-1 | Maximum weight decay value |
| tl_gamma_min | 0.1 | Minimum gamma |
| tl_gamma_max | 1.0 | Maximum gamma |
| tl_gamma_step | 0.1 | Step size used when determining gamma. Here resulting in possible values of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 |


### Binary classification xlsx
This Excel file contains two columns which detail which study id has which binary label, to be predicted by the model.
An example is provided below:

| study_id | binary_label |
|----------|--------------|
| 001 | 1 |
| 078 | 0 |
| 376 | 1 |

### Model folder
This folder should contain the models that have been produced during training and are to be tested on the test set

### Model csv
The model csv file contains the names of the models, the CNN used to create te models, and their hyperparameters.
A script is available in the ```post_training_analysis``` folder that is able to produce the required file.

## CNN output
The CNNs will upload most of their results to their wandb project as specified in the
[parameter file](#hyperparameter-csv). The results that are saved locally are the models it has produced,
along with a '.csv' file that contains the hyperparameter details of the produced models. On wandb, various graphs are
plotted that detail the progression of the models, alongside ROC-AUC plots of training and validation results in addition
to tables detailing the hyperparameters of each model. Furthermore, a csv file containing the prediction results will
also be produced. This csv file will present the predicted values, probability values, and image names of each study id
that has been used during testing.
Loading

0 comments on commit 0d1b8e8

Please sign in to comment.