This repository contains data and processing code for use in a project examining gene expression patterns in autoimmune/rheumatic diseases.
The processing code for each dataset (or compendium in the case of sle-wb
) is contained within each subdirectory (if applicable).
For more information on our data processing strategy, see sle-wb/README.md
.
Within this repository, we obtain recount2 data through the recount
bioconductor package, further process it, and apply PLIER
.
The recount2 data and results are too large to be stored with Git LFS, so we have placed them on figshare. DOI: 10.6084/m9.figshare.5716033.v4.
This version is current as of 978c379
.
Citations:
Collado-Torres L, Nellore A, Kammers K, et al. Reproducible RNA-seq analysis using recount2. Nature Biotechnology, 2017. doi: 10.1038/nbt.3838.
Mao W, Chikina M. Pathway-Level Information ExtractoR (PLIER): a generative model for gene expression data. bioRxiv, 2017. doi: 10.1101/116061
Two GPA (Wegener's) datasets are included in this repository:
- NARES -- a dataset that consists of nasal brushings from patients with GPA with or without a history of nasal disease.
- GSE18885 -- a blood (fractions) dataset; we use submitter-processed data from GEO.
Citations:
Grayson PC, Steiling K, Platt M, et al. Defining the Nasal Transcriptome in Granulomatosis with Polyangiitis. Arthritis & Rheumatology, 2015. doi: 10.1002/art.39185.
Cheadle C, Berger AE, Andrade F, et al. Transcription of PR3 and Related Myelopoiesis Genes in Peripheral Blood Mononuclear Cells in Active Wegener’s Granulomatosis. Arthritis & Rheumatism, 2010. doi: 10.1002/art.27398.
See sle-wb
for more information (including citations).
GSE26975
is a dataset that includes the following isolated cell type populations: healthy neutrophils, normal density neutrophils from patients with lupus, and low density granulocytes (LDGs) from patients with lupus.
Citation:
Villanueva E, Yalavarthi S, Berthier CC, Hodgin JB et al. Netting neutrophils induce endothelial damage, infiltrate tissues, and expose immunostimulatory molecules in systemic lupus erythematosus. J Immunol. 2011. doi: 10.4049/jimmunol.1100450
Two datasets:
Citations:
Paugh BS, Broniscer A, Qu C, et al. Genome-wide analyses identify recurrent amplifications of receptor tyrosine kinases and cell-cycle regulatory genes in diffuse intrinsic pontine glioma. J Clin Oncol. 2011;29(30):3999-4006.
Buczkowicz P, Hoeman C, Rakopoulos P, et al. Genomic analysis of diffuse intrinsic pontine gliomas identifies three molecular subgroups and recurrent activating ACVR1 mutations. Nat Genet. 2014;46(5):451-6.
GSE37382 and GSE37418 are medulloblastoma data that were processed via refine.bio (using SCANfast
).
Citation:
Northcott PA, Shih DJ, Peacock J, et al. Subgroup-specific structural variation across 1,000 medulloblastoma genomes. Nature. 2012;488(7409):49-56.
Robinson G, Parker M, Kranenburg TA, Lu C et al. Novel mutations target distinct subgroups of medulloblastoma. Nature. 2012 Aug 2;488(7409):43-8. (
GSE37418
)
All the dependences for this processing pipeline are included on a Docker image. This can be obtained by installing Docker and pulling the appropriate tagged images from Dockerhub:
The Docker image used for microarray data processing is tagged v1
.
docker pull jtaroni/multi-plier:v1
For the Dockerfile and a list of user-installed R packages, see docker/v1
.
The R scripts in isolated-cell-pop
, NARES
, and the sle-wb
pipeline were run in the jtaroni/multi-plier:v1
container as of 28a1249
.
The Docker image used for microarray data processing is tagged recount
.
docker pull jtaroni/multi-plier:recount
For the Dockerfile and a list of user-installed R packages, see docker/recount
.
The Rscripts in recount2/
were run in the jtaroni/multi-plier:recount
container as of 978c379
.
We use Salmon and tximport for our RNA-seq processing pipeline.
The Docker image used for building a Salmon index and quantification with Salmon:
docker pull combinelab/salmon:0.9.1
Following quantification with Salmon, we summarize to the gene-level using tximport in the following Docker image (docker/summarize_tx/Dockerfile
):
docker pull jtaroni/summarize_tx:3.4.3
This repository is dual licensed as BSD 3-Clause (source code) and CC0 1.0 (figures, documentation, and our arrangement of the facts contained in the underlying data).