Skip to content

aa9gj/Bone_proteogenomics_manuscript

Repository files navigation

DOI

Code accompanying the manuscript "Long read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors in disease"

The full text can be found in Abood et al. 2024, AJGH

Purpose

We present a novel generalizable approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode

Data availability

  1. Processed and input data is found in DOI
  2. Raw long-read sequencing data is found in GSE224588

How to use this repository

  1. Use setup_r_env.R to set up the R environment with all the needed packages.
  2. The repo is broken down into three major sections:
  • sQTL_colocalization_analysis: This directory contains code needed to replicate Bayesian colocalization analysis with Coloc. Please refer to the README.md within directory for further information
    • Step 0: Perform bayesian colocalization analysis using summary statistics from the latest BMD GWAS with summary statistics from sQTL data for all 49 GTEx tissues.
  • Reference_transcriptome_generation: This directory contains code to generate the reference transcriptome from long-read RNAseq data. Please refer to the README.md within directory for further information
    • Isoseq analysis: from raw reads to isoform classification
    • Step 1: Perform analyses on outputs from SQANTI and cDNA_cupcake
  • sQTL_to_isoform_mapping
    • Step 2: Characterize full-length isoforms (known and novel) containing the colocalized junctions
    • Step 3: Add effect size and direction of effect to colocalized junctions
    • Step 4: Annotate lead sQTLs and their proxy, follow with positional and enrichment analyses
    • Step 5: Differential analyses (DE and DIU) using tappAS
    • Step 6: Integrating multiple datasets from the literature and within our analyses to prioritize the isoforms for experimental validation
    • Step 7: ORF analyses including: NMD and truncation analysis was performed using a beta version of Biosurfer