Skip to content

Academic Project to study a Computer Vision task in Deep Learning using CNN

Notifications You must be signed in to change notification settings

HaileyTQuach/Medical-Image-Classification-using-CNN

Repository files navigation

[ Duration ]

- September 22nd, 2023 to December 5th, 2023 (2023 FALL, Concordia University)

[ Team ]

NAME
Hyun Soo Kim
Matthew Armstrong
Phuong Thao Quach
Suin Kang
Zarren Ali

[ High level description/presentation of the project ]

Convolutional Neural Networks in Image Classification

This project delves into the application of Convolutional Neural Networks (CNNs), a pivotal Machine Learning model in the realm of computer vision, particularly for image classification tasks. CNNs excel in learning and extracting features from images, enabling the accurate classification of new, similar images.

Objectives and Tasks

The project is divided into two primary tasks:

  1. CNN Encoder for Human Tissue Image Classification: This involves training a CNN encoder on a dataset of human tissue images to classify colon cancer (dataset 1). The outcomes are visualized using t-SNE (t-Distributed Stochastic Neighbor Embedding), a technique for high-dimensional data visualization.
  2. Feature Extraction and Model Evaluation Across Datasets: Utilizing the trained CNN encoder from the first task and a pre-trained CNN encoder from ImageNet, features are extracted from two additional datasets: a prostate cancer dataset (dataset 2) and an animal faces dataset (dataset 3). The project then focuses on training supervised machine learning models to evaluate and compare the performance of these CNN encoders across diverse datasets.

Challenges and Solutions

Training a CNN encoder presents specific challenges, such as the need for large, diverse image datasets and the computational demands of training complex models. To overcome these, strategies like data augmentation and the use of advanced hardware for faster processing are employed.

Evaluation Metrics

The performance of the models for both tasks is assessed using key metrics such as precision, recall, f1-score, support, and accuracy.


[ Description on how to obtain the Dataset from an available download link ]

Download links for the datasets required for this assignment are provided below. The first three links lead to the project-required unprocessed data. The datasets that follow were generated through feature extraction using both our pretrained ResNet18 model (trained in task 1) and a pretrained Resnet18 model using IMAGENET weights. The final three hyperlinks lead to sampled datasets 1, 2, and 3, each comprising 100 images. The classes are distributed evenly within each class for these sampled datasets (this is an approximation, however, since 100 images must be sampled for three classes, so one class must have one more image).


[ Requirements to run your Python code (libraries, etc) ]

To successfully run the Python code in this repository, several libraries and dependencies need to be installed. The code primarily relies on popular Python libraries such as NumPy, Matplotlib, Pandas, Seaborn, and Scikit-Learn for data manipulation, statistical analysis, and machine learning tasks.

For deep learning models, the code uses PyTorch, along with its submodules such as torchvision and torch.nn. Ensure that you have the latest version of PyTorch installed, which can handle neural networks and various related functionalities.

Additionally, the project uses the Orion library, an asynchronous hyperparameter optimization framework. This can be installed directly from its GitHub repository using the command !pip install git+https://github.com/epistimio/orion.git@develop and its related profet package with !pip install orion[profet].

Here is a comprehensive list of all the required libraries:

  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • Scikit-Learn
  • PyTorch (along with torch.nn, torch.optim, torch.utils.data, etc.)
  • Torchvision (including datasets, models, transforms)
  • Orion (including the profet package)
  • Argparse (for parsing command-line options)
  • TSNE (from Scikit-Learn for dimensionality reduction techniques)
  • KNeighborsClassifier, GridSearchCV (from Scikit-Learn for machine learning models)
  • RandomForestClassifier (from Scikit-Learn for machine learning models)
  • Classification metrics from Scikit-Learn (confusion_matrix, classification_report, etc.)

For visualization and data analysis, Matplotlib and Seaborn are extensively used. Ensure all these libraries are installed in your environment to avoid any runtime errors.

To install these libraries, you can use pip (Python's package installer). For most libraries, the installation can be as simple as running pip install library-name. For specific versions or sources, refer to the respective library documentation.


[ Instructions on how to train/validate your model ]

All notebooks were written in Google Colab and are intended for use in Google Colab only.

Task 1: Train the ResNet-18 model from the scratch, Test and Perform t-SNE on Dataset 1

Open the notebook - "task1_training_testing.ipynb".

CAUTION: Every dataset is available via gdownin the notebook. However, depending on which dataset (original with 6000 images vs. sample with 100 images) you wish to use, read the instruction carefully in the notebook and adjust the codes accordingly (comment/uncomment)

  • How To Train?
    • Run the required libraries
    • Run the cell section 1. Data Loading and Preprocessing - By default, the sample dataset (100 images) will be loaded.
    • Run the cell section 2. Training for training and validation

  • How To Test?
    • No need to upload anything; the test run dataset is available for download via gdown.
    • Make sure you run the 1. Data Loading and Preprocessing part.
    • Pretrained model from Task 1 resnet18_model_98.pth is available via gdown in 3.Testing block.
    • Move the pth file to the same directory as the notebook.
    • Run the cell section 3. Testing.
    • Run the cell section 4. Feature extracion and t-SNE visualization.

Task 2: Feature Extraction and Classification on Datasets 2 and 3

  • For Feature Extraction and tSNE: Run the notebook titled "Task2_Feature_Extraction.ipynb". If you want to save the extracted datasets as csv, run the code under "Save dataset to csv file". If not, leave these code blocks out.
  • For KNN classification: Run the notebook titled "Task2_KNN.ipynb".
  • For RF classification: Run the notebook titled "Task2_RF.ipynb".

[ Instructions on how to run the pre-trained model on the provided sample test dataset ]

All notebooks were written in Google Colab and are intended for use in Google Colab only.

To run the pre-trained models on the provided sample test datasets, follow the instructions below for each notebook:

  • For Task 1, open the notebook titled "task1_training_testing.ipynb", follow instructions on the following code cells. The instructions in the actual notebook might differ. If that is the case, follow the instructions in the actual notebook.
    • Beside the sample dataset submitted in .zip file, it is already available via gdown, so you do not have to upload anything on your end.
    • Pretrained model from Task 1 resnet18_model_98.pth is available via gdown in 3.Testing block, like image below.

    image

    image image
  • For Task 2, open the notebook titled "Task2_Feature_Extraction.ipynb", run the code cells one by one following instructions on the below code cells. The instructions in the actual notebook might differ. If that is the case, follow the instructions in the actual notebook.
    • All the sample datasets are downloaded via gdown in the notebook, so you do not have to upload anything on your end.


    image image image

About

Academic Project to study a Computer Vision task in Deep Learning using CNN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •