- Clone this git repo:
git clone https://github.com/beelabhmc/pollen_id
- Install anaconda
- Create the conda environment:
conda env create -f environment.yml
(this must be done in thepollen_id
directory) - Install pytorch (see here). This is not included in the
environment.yml
file because it is platform dependent. - Activate the conda environment:
conda activate pollen_id
- Download all the pollen slide images from the google drive folder (they may download as multiple .zip files, you will need to unzip and combine them into one folder manually)
- Move those files into the
pollen_id
folder.- Note:
pollen_id
should be the parent folder with each of the indivdual species folders directly inside of it
- Note:
├── pollen_slides
│ ├── Acmispon glaber
│ ├── Amsinckia intermedia
│ ⋮
│ ├── Sambucus nigra
│ └── Solanum umbelliferum
- Download the model.yml.gz file (this is used for edge detection during pollen segmentation)
- After downloading, unzip it, and make sure it is in the root
pollen_id
folder and calledmodel.yml
- After downloading, unzip it, and make sure it is in the root
- Run the intake data script
python intake_data.py
- This will go through all the images in the
pollen_slides
folder and create a database that categorizes them based on their folder and file name
- This will go through all the images in the
- Run the pollen extraction script
python extract_pollen.py
- This extracts the individual pollen grains from each pollen slide image and stores them in a new folder
pollen_grains
- This extracts the individual pollen grains from each pollen slide image and stores them in a new folder
Now you are read to run the machine learning code.
This repository contains code for training and testing two different networks. A CNN that uses transfer learning with ResNet 50 for feature extraction, and a SNN (which also happens to use ResNet 50 for feature extraction).
The CNN is stored in simple_cnn_classifier.py and the SNN is in meta_learning.py. While these are both python files, they contain cells similar to a jupyter notebook as described here. They can be run as normal python files, but for the best developer experience they should be opened in VS Code with the python and jupyter extensions installed. Both files contain comments and documentation in the form of Markdown cells so they are not described here.
- Copy the model.yml file into
server/api/models
and rename it toedge_detection_model.yml
- Copy your trained network
.pth
file intoserver/api/models
and give it a useful name.- You will need to update the filename that the server reads at the bottom of classify_pollen.py. If you change the nextwork structure, you will also need to update the network architecture in classify_pollen.py as well.
- If you change the number of classes, you will also need to update the index to class mapping in classify_pollen.py
The front end of the server is hosted on github pages, and will automatically be updated on each commit
The backend ML api is contained in the server
folder. To run the server, you will need to have the conda environment activated and then run python server.py
from the server
folder. The required libraries for the server are the same as in requirements.txt. PyTorch should also be installed on the server following the directions above.