This repository hosts ApisTox dataset, for applications of data analysis and ML in ecotoxicology and agrochemistry.
Paper is freely available (open access) on Scientific Data, and preprint is available on ArXiv.
Dataset and code are released under CC-BY-NC-4.0 license.
Final dataset file is outputs/dataset_final.csv
. For dataset splits, see
outputs/splits
directory.
Raw input data is in raw_data
directory. Other datasets from this area are
in other_sources
directory (we do not recommend using them).
Setup virtual environment:
- Poetry (recommended), run
make install
orpoetry install --no-root
- venv, run
pip install requirements.txt
Scripts:
- recreate dataset:
python create_dataset.py
- split dataset:
python create_dataset_splits.py
- create analyses and plots:
python analyze_dataset.py