Skip to content

Utilizing cutting-edge optimization methods to detect anomalies in medical and biological datasets

Notifications You must be signed in to change notification settings

alecruces/MEBMedBio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Minimum Enclosing Ball for Anomaly Detection on Biological Data

Analyzing biological data for anomaly detection using Minimum Enclosing Ball (MEB) and variations of the Frank-Wolfe algorithm.

MEB2

Keywords

Anomaly detection, biological data


Table of Contents

  1. About the Project
  2. Key Features
  3. Key Results
  4. Data Overview
  5. Methodology
  6. Screenshots and Graphs
  7. Technologies Used
  8. Setup & Installation
  9. Usage
  10. Contributing
  11. License
  12. Contact

About the Project

This project explores the Minimum Enclosing Ball (MEB) problem and its applications in anomaly detection across biological datasets. We implement three Frank-Wolfe algorithm variants: Pairwise Frank-Wolfe, Blended Pairwise Conditional Gradient (BPCG), and a (1+ϵ)-approximation using Away-Steps. These methods are evaluated for computational efficiency, convergence rates, and anomaly detection performance in biological contexts.

Key Features

  • Anomaly Detection: Identification of anomalies in biological datasets.
  • Efficient Algorithms: Implementation of three Frank-Wolfe-based algorithms, each offering unique advantages in convergence and computational performance.
  • Biological Data Application: Practical application in detecting anomalies in datasets like breast cancer, gene expression, vertebral pathology, and maternity risk.

Key Results

  • Convergence: All three algorithms demonstrated linear convergence on the MEB problem.
  • Best Performance: The (1+ϵ)-approximation (Yildirim 2008) showed superior computational performance, especially on high-dimensional datasets.
  • Recall Metrics: Focused on recall as the benchmark for anomaly detection, achieving optimal recall values for most datasets.

Data Overview

This study uses four biological datasets focused on anomaly detection. Links to each dataset:

Methodology

We apply three variants of the Frank-Wolfe algorithm to solve the MEB problem:

  • Pairwise Frank-Wolfe: An efficient, projection-free algorithm.
  • Blended Pairwise Conditional Gradients (BPCG): An optimized variant that minimizes swap steps.
  • (1+ϵ)-Approximation with Away-Steps: An approximation method designed for efficient handling of large datasets.

Each algorithm is evaluated on convergence rates, computational time, and accuracy metrics (primarily recall) in detecting anomalies.

Screenshots and Graphs

  1. MEB for Maternity Risk dataset

MEB

  1. Computational Time and Iterations (Table)
    Summary table comparing computational time and iterations for each algorithm across datasets.

    Dataset Algorithm Time (ms) Iterations
    Breast Cancer PFW 664.50 1,484
    Breast Cancer BPCG 647.54 2,998
    Breast Cancer MEB(A) 49.15 44
    Gene Expression PFW 369,959.76 100,000
    Gene Expression BPCG 397,067.57 100,000
    Gene Expression MEB(A) 2,116.30 44

Technologies Used

🛠️ Emphasizing the primary tools and libraries utilized.

  • Python: Main programming language.
  • Optimization Algorithms: Implemented variations of the Frank-Wolfe algorithm.
  • nbviewer Link: View the Jupyter Notebook

Setup & Installation

Clone the repository and install dependencies:

# Clone the repository
git clone https://github.com/username/MEBMedBio.git

# Navigate to the project directory
cd MEBMedBio

# Install dependencies
pip install -r requirements.txt

Files

  • Code: MEB_BPCG.ipynb
  • Report: Report.pdf
  • Presentation: Presentation.pdf

Usage

The repository includes the following files:

  • MEB_BPCG.ipynb: Jupyter notebook with the full workflow, from data preprocessing to algorithm implementation and evaluation.
  • Report.pdf: Detailed report on methodology, algorithmic analysis, and findings.
  • Presentation.pdf: Summary presentation of key findings and insights.

To run the project, open MEB_BPCG.ipynb in Jupyter Notebook or view it on nbviewer.

Contributing

Contributions are welcome! Please see the contributing guidelines for more details.

About

Utilizing cutting-edge optimization methods to detect anomalies in medical and biological datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published