This is Team Epoch's top 8% solution to the HMS - Harmful Brain Activity Classification competition.
A technical report is included in this repository.
This section contains the steps that need to be taken to get started with our project and fully reproduce our best submission on the private leaderboard. The project was developed on Windows 10/11 OS on Python 3.10.13 on Pip version 23.2.1.
Models were trained on machines with the following specifications:
- CPU: AMD Ryzen Threadripper Pro 3945WX 12-Core Processor / AMD Ryzen 9 7950X 16-Core Processor
- GPU: NVIDIA RTX A5000 / NVIDIA RTX Quadro 6000 / NVIDIA RTX A6000
- RAM: 96GB / 128GB
- OS: Windows 10/11
- Python: 3.10.13
- Estimated training time: 3-6 hours per model on these machines.
For running inference, a machine with at least 32GB of RAM is recommended. We have not tried running the inference on a machine with less RAM using all the test data that was provided by DrivenData.
Make sure to clone the repository with your favourite git client or using the following command:
git clone TODO: UPDATE(...)
You can install the required python version here: Python 3.10.13
Install the required packages (on a virtual environment is recommended) using the following command: A .venv would take around 7GB of disk space.
pip install -r requirements.txt
TODO: Explanation
-
train.py
: This file is used to train a model.train.py
reads a configuration file fromconf/train.yaml
. This configuration file contains the model configuration to train with additional training parameters such as test_size and a scorer to use. The model selected in theconf/train.yaml
can be found in theconf/model
folder where a whole model configuration is stored (from preprocessing to postprocessing). When training is finished, the model is saved in thetm
directory with a hash that depends on the specific pre-processing, pretraining steps + the model configurations.- Command line arguments
- CUDA_VISIBLE_DEVICES: The GPU to use for training. If not specified it uses DataParallel to train on multiple GPUs. If you have multiple GPUs, you can specify which one to use.
-
submit.py
: This file does inference on the test data from the competition given trained model or an ensemble of trained models. It reads a configuration file fromconf/submit.yaml
which contains the model/ensemble configuration to use for inference. Model configs can be found in theconf/model
folder and ensemble configs in theconf/ensemble
folder. Theconf/ensemble
folder specifies the models (conf/model
) to use for the ensemble and the weights to use for each model. Thesubmit.py
(For DrivenData) Any additional supplied trained models /scalers (.pt / .gbdt / .scaler) should be placed in the tm
directory.
When these models were trained, they are saved with a hash that depends on the specific pre-processing, pretraining steps + the model configurations.
In this way, we ensure that we load the correct saved model automatically when running submit.py
.
For reproducing our best submission, run submit.py
. This will load the already configured submit.yaml
file and
run the inference on the test data from the competition. submit.yaml
in configured to what whe think is our best and our
most robust solution:
If you get an error of that the path was not found of a model. Please ensure that you have the correct trained model in the tm
directory.
If you don't have the trained models, you can train them 1 by 1 using train.py
and the conf/train.yaml
file.
Quality checks are performed using pre-commit hooks. To install these hooks, run:
pre-commit install
To run the pre-commit hooks locally, do:
pre-commit run --all-files
Documentation is generated using Sphinx.
To make the documentation, run make html
with docs
as the working directory. The documentation can then be found in docs/_build/html/index.html
.
Here's a short command to make the documentation and open it in the browser:
cd ./docs/;
./make.bat html; start chrome file://$PWD/_build/html/index.html
cd ../