Skip to content

Latest commit

 

History

History
62 lines (30 loc) · 5.71 KB

README.md

File metadata and controls

62 lines (30 loc) · 5.71 KB

GRPM System 2.0

The GRPM system is an advanced tool designed for the integration and analysis of genetic polymorphism data corresponding to specific biomedical domains. It consists of five modular components that facilitate data retrieval, merging, analysis, and the incorporation of GWAS data.

medrxiv Manuscript DOI Open In Colab

Introduction

The GRPM system is a Python-based framework designed for the construction of a comprehensive dataset of human genetic polymorphisms associated with nutrition. By integrating data from multiple sources and utilizing MeSH ontology as semantic retrieval tool, this workflow enables researchers to investigate genetic variants with significant associations to specified biomedical subjects. The primary objective of developing this resource was to support nutritionists in exploring gene-diet interactions and implementing personalized nutrition strategies.

Graphical Abstract

Installation

You can visualize and query the developed datasets by installing our package via:

pip install git+https://github.com/johndef64/GRPM_system.git

Example queries are available in the tests directory and test.ipynb. Open In Colab

Workflow Description

The workflow is composed of five distinct modules, each executing a crucial function to assist in the integration and analysis of genetic polymorphism data associated with nutrition. The modules are outlined below:

No. Module Description Notebook
1. Dataset Builder Retrieves and integrates data from the LitVar and PubMed databases in a structured format. Open In Colab
2. MeSH Term Selection Extracts a coherent MeSH lists to query the GRPM Dataset starting from simple biomedial terms collections (NLP based). Open In Colab
3. Dataset Querying Exexute MeSH query in the GRPM dataset, extracting a subset of matching entities, and generates a data report. Open In Colab
4. Gene Prioritization Analyzes retrieved data and computes gene interest index to filter significative results. Open In Colab
5. GWAS Data Integration Merges GWAS data, associating phenotypes and potential risk/effect alleles with the GRPM data (BioBERT based). Open In Colab

To reproduce our pipeline, execute each module individually by selecting the "Open in Colab" option. Ensure that all necessary dependencies and files are imported. Google Drive synchronization is available.

Each Jupyter notebook includes commands to download and install the necessary dependencies for execution.

GRPM system: Integrating Genetic Polymorphism Data with PMIDs and MeSH Terms to Retrieve Genes and rsIDs for Biomedical Research Fields. GRPM Dataset: pcg, protein coding genes; rna, RNA genes; pseudo, pseudogenes; in parentheses, dataset shape.

Usage

Comprehensive instructions for the usage of each module are found within the respective Jupyter Notebooks provided. Follow the guidelines closely and install the necessary Python packages specified for each module.

Updates

The GRPM Dataset accessible on Zenodo represents a version of LitVar1, which has since been deprecated and replaced by LitVar2. Module 1 (Dataset Builder) has been updated for compatibility with LitVar2. The other modules in the pipeline remain operational using the original GRPM Dataset as available on Zenodo.

Requirements

All requirements are outlines in requirements.txt and setup.py