Skip to content

Latest commit

 

History

History
208 lines (125 loc) · 6.85 KB

README.rst

File metadata and controls

208 lines (125 loc) · 6.85 KB

flexynesis_manuscript

All publication material relevant for the manuscript describing the flexynesis software package

Project Folder

Accessible from Hulk/Beast/Max: /data/local/buyar/arcas/multiomics_integration/flexynesis_manuscript_work/

The ./raw folder contains the original dataset downloaded from a source such as Cbioportal/TCGA/PharmacoGx/DepMAP. The ./prepared folder contains data prepared as input to flexynesis.

Environment

Install flexynesis

mamba create -n flexynesisenv python==3.11 snakemake
mamba activate flexynesisenv
pip install flexynesis

Install other packages

guix package --manifest=guix.scm --profile=./manuscript

Activate environment

source ./manuscript/etc/profile
mamba activate flexynesisenv

Datasets used in the manuscript

Below is a description of the datasets used in the manuscript and how to prepare them for analysis with flexynesis

Downloaded Datasets

Go to /data/local/buyar/arcas/multiomics_integration/flexynesis_manuscript_work/datasets:

The ./raw folder contains:

  • CCLE.rds: downloaded from Zenodo.
  • GDSC2.rds: downloaded from Zenodo.
  • lgggbm_tcga_pub.tar.gz: downloaded from cbioportal.
  • brca_metabric.tar.gz: downloaded from cbioportal.
  • depmap: downloaded from depmap portal.
  • nbl_target_2018_pub.tar.gz: downloaded from cbioportal.
  • GDCData: TCGA cohort datasets for 33 cancer types downloaded using the TCGABiolinks package (See GitHub).
  • prot-trans: protein sequence embeddings obtained from prot-trans-xl-uniref50 model on uniprot sequences.
  • describeProt: protein level sequence/structure/function features from describeprot database (Download here).

PREPARED datasets used as input to flexynesis

cd datasets
SRC='../flexynesis_manuscript/src/'

The ./prepared folder contains:

  • ccle_vs_gdsc: Drug response data from cell lines from CCLE and GDSC2 datasets. Command:
Rscript ${SRC}/prepare_data.gdsc_vs_ccle.R raw/
  • lgggbm_tcga_pub_processed: Merged cohorts of LGG + GBM samples. Command:
Rscript ${SRC}/prepare_data.LGG_GBM.R ${SRC}/src/get_cbioportal_data.R
  • brca_metabric_processed: METABRIC dataset processed.
Rscript ${SRC}/prepare_data.metabric.R ${SRC}/get_cbioportal_data.R
  • single_cell_bonemarrow: CITE-Seq dataset from Seurat. Command:
Rscript ${SRC}/prepare_data.cite_seq.R
  • tcga_vs_ccle: TCGA tumors and CCLE cell lines from 3 different cancer types: lung cancer, glioma, and breast cancer
Rscript ${SRC}/prepare_data.tcga_vs_ccle_finetuning.R ${SRC}/
  • neuroblastoma_target_vs_depmap: neuroblastoma patient samples (TARGET study) and cell lines (depmap). Command:
Rscript ${SRC}/prepare_data.neuroblastoma_finetuning.R ${SRC}/get_cbioportal_data.R ./raw/depmap/ ${SRC}/utils.R
  • tcga_cancertype: TCGA cancer cohort for ~21 cancer types 100 samples per each cohort. Command:
Rscript ${SRC}/prepare_data.tcga_cancertype.R ${SRC}/utils.R ./raw/TCGA
  • depmap_gene_dependency: Dataset for gene-dependency prediction in cell lines. Consists of depmap gene expression + prottrans embeddings + describeprot features. Command:
Rscript ${SRC}/prepare_data.depmap.R ${SRC}/utils.R ./raw/depmap/ ./raw/prot-trans/embeddings.protein_level.csv ./raw/uniprot2hgnc.RDS ./raw/describePROT/9606_value.csv

Figures

How to reproduce figures:

Go to /data/local/buyar/arcas/multiomics_integration/flexynesis_manuscript_work/analyses:

Activate guix environment: .. code-block:: bash

source ../flexynesis_manuscript/manuscript/etc/profile

Figure 1: single-task figures

Rscript ../flexynesis_manuscript/src/figures_single_task.R ../flexynesis_manuscript/src/utils.R single_multi_experiments

Figures 2 and 3: multi-task figures

Rscript ../flexynesis_manuscript/src/figures_multitask.R ../flexynesis_manuscript/src/utils.R single_multi_experiments

Figure 4: unsupervised clustering (tcga cancer types)

Rscript ../flexynesis_manuscript/src/figures_tcga_unsupervised.R ../flexynesis_manuscript/src/utils.R ./unsupervised_cancertype/

Figure 5: cross-modality prediction of cell line dependency probabilities

Rscript ../flexynesis_manuscript/src/figures_depmap.R ../datasets/prepared/depmap_gene_dependency/ depmap_analysis/output/

Figure 6: demonstration of fine-tuning

Rscript ../flexynesis_manuscript/src/figures_finetuning.R ../flexynesis_manuscript/src/utils.R finetuning/

Figure 7: marker analysis

Rscript ../flexynesis_manuscript/src/figures_marker_analysis.R ../flexynesis_manuscript/src/utils.R marker_analysis/output/

Figure 8: benchmark summary

Rscript ../flexynesis_manuscript/src/figures_benchmarks.R benchmarks/output

Documentation

Flexynesis documentation is built and served on bimsbstatic.

  1. Navigate to /data/bimsbstatic/public/akalin/buyar/flexynesis
  2. Run mkdocs build => this generates a website in ./site
  3. The documentation is served at https://bimsbstatic.mdc-berlin.de/akalin/buyar/flexynesis/site/

Manuscript