Plankton ML

This repository contains code and configuration for processing and analysing images of plankton samples. It's experimental, serving as much as a proposed template for new projects than as a project in itself.

It's a companion project to an R-shiny based image annotation app that is not yet released, written by researchers and data scientists at the UK Centre for Ecology and Hydrology in the early stages of a collaboration that was placed on hold.

Installation

Environment and package installation

Using pip

Create a fresh virtual environment in the repository root using Python >=3.12 and (e.g.) venv:

python -m venv venv

Next, install the package using pip:

python -m pip install .

Most likely you are interested in developing and/or experimenting, so you will probably want to install the package in 'editable' mode (-e), along with dev tools and jupyter notebook functionality

python -m pip install -e .[all]

Using conda

Use anaconda or miniconda to create a python environment using the included environment.yml

conda env create -f environment.yml
conda activate cyto_ml

Next install this package without dependencies:

python -m pip install --no-deps -e .

exiftool

We use exiftool to write basic metadata (latitude/longitude of observation, plus timestamp) into individual plankton images extracted from the larger "collage" format that the FlowCam microscope exports them in.

Guidance for installing exiftool

Ubuntu: sudo apt install libimage-exiftool-perl Centos: sudo yum install libimage-exiftool-perl Or in an environment without root access:

git clone https://github.com/exiftool/exiftool.git
export PATH=$PATH:exiftool

Object store connection

.env contains environment variable names for S3 connection details for the JASMIN object store. Fill these in with your own credentials. If you're not sure what the AWS_URL_ENDPOINT should be, please reach out to one of the project contributors listed below.

Running tests

pytest or py.test

Visualisation

Streamlit app based off the text embeddings for EIDC catalogue metadata one

streamlit run src/cyto_ml/visualisation/app.py

The demo should automatically open in your browser when you run streamlit. If it does not, connect using: http://localhost:8501.

Object Store API

See the Object Store API project - RESTful interface to manage a data collection held in s3 object storage.

Data Version Control

DVC with s3 condensed walkthrough as part of the LLM evaluation project - complete this up to dvc remote modify... to set up the s3 connection.
Tutorial: versioning data and models: What's next?
Importing external data: Avoiding duplication - is it this pattern?

DAG / pipeline elements

Contributors

Jo Walsh Alba Gomez Segura Ezra Kitson

Contributing

Please see CONTRIBUTING.md

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.dvc		.dvc
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
scripts		scripts
src/cyto_ml		src/cyto_ml
tests		tests
.dvcignore		.dvcignore
.env		.env
.flake8		.flake8
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
DVC.md		DVC.md
LICENSE		LICENSE
PIPELINES.md		PIPELINES.md
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plankton ML

Installation

Environment and package installation

Using pip

Using conda

exiftool

Object store connection

Running tests

Contents

Catalogue creation

Feature extraction

Running Jupyter notebooks

Visualisation

Object Store API

Data Version Control

Contributors

Contributing

About

Contributors 4

Languages

License

NERC-CEH/plankton_ml

Folders and files

Latest commit

History

Repository files navigation

Plankton ML

Installation

Environment and package installation

Using pip

Using conda

exiftool

Object store connection

Running tests

Contents

Catalogue creation

Feature extraction

Running Jupyter notebooks

Visualisation

Object Store API

Data Version Control

Contributors

Contributing

About

Resources

License

Stars

Watchers

Forks

Contributors 4

Languages