A code pool for RNA expression analysis in Woodman Lab.

Installation

Without SSH key:

devtools::install_git(
    url='http://gitlab.mdanderson.edu/XLi23/expr.git',
)

With SSH key set up:

devtools::install_git(
    url='git@gitlab.mdanderson.edu:XLi23/expr.git',
    quiet=FALSE
)

How to set up SSH key?

Usage

View package vignette with browseVignettes("expr") in Rstudio console, or here

Run pipeline

The expression analysis pipeline saves all outputs in your current working directory. The initial input should be formatted as an RData file, consisting:

object named "gene_expressions" - the expression matrix or data frame with genes as row names and sample names as column names
object named "sampleAttr" - the sample meta data table, samples by row. ID columns are required for both sample and patient.

Follow below code to generate an example RData file.

gene_expressions = read.csv(gzfile(system.file("extdata/expressions.csv.gz",package = "expr"),'rt'),row.names = 1,check.names = F)
RNA_sample_info<-read.csv(system.file("extdata/RNA_sample_info.csv",package = "expr"),header=T,stringsAsFactors = F,check.names = F)
RNA_clinic_info<-read.csv(system.file("extdata/RNA_clinic_info.csv",package = "expr"),header=T,stringsAsFactors = F,check.names = F)
sampleAttr=expr::table_org(list(RNA_sample_info,RNA_clinic_info))
save(gene_expressions,sampleAttr,file = "meso.RData")

To run the pipeline, execute:

runExprPPL()

Development

This package is documented with roxygen2. Before pushing a commit, 'Cmd + Shift + B' to generate document, then 'Cmd + Shift + B' to build the package. Keep new functions going!
Package tests are as good as how you write it. Please kindly report issues under "issue" tab every time an error is encountered.

Notes 11/13/2023

functions

prepare_clean_RNA_sample_info_and_protein_expressions
prepare_unsupervised_data
unsupervised_analysis

Added line in prepare_clean_RNA_sample_info_and_protein_expressions function to ensure expression column names are identical to sample info sample IDs so that otherwise function error out.
Param name in prepare_clean_RNA_sample_info_and_protein_expressions changed tolerent_library_size_factor to tolerant_library_size_factor.
Match.arg renders error when arg has a length of 1 (when running function line by line). Changed to select first element in a vector.
gg3D (needed for unsupervised_analysis) seems to be difficult to install (XQuarts needed for MAC), used plotly instead.
The guide argument in scale_*() cannot be FALSE. This was deprecated in ggplot2 3.3.4. Fixed by adding legend.position="none"
Fixed the error that setting any of analysis to FALSE fails the unsupervised_analysis function.
1. grid.arrange line for organizing figures was removed. See function manual for arranging plots.
2. Results were sperated into plots and analysis responses in the function output.

vignette

The package vignette showcase expr in making RNA analysis a pipeline. Added a k-mean clustering measure to auto-detect batch effect as an alternative for manual visualization. Manual confirmation still recommended.
The package vignette can be easily customized to generate one-click html or pdf report for user datasets.

Notes 2/1/2023

functions

fixed error: in prepare_clean_RNA_sample_info_and_protein_expressions(), if gene_id_col="rowname", sequenced_RNA_samples returns NA at line 53.

Notes 2/15/2023

Added exported functions

NbClust
consensus_immunedeconvolute
estimate_bestNumberofClusters
map_clusters
consensusCluster
gsea
MRNsurr
getHeatMapAnnotation
reassignNA
table_org
zscoreData
changeColNames
getFill
t_test2

Edits

getHeatmapAnnotations: added param track - for selecting meta data of interest as track input.
Moved “protein coding ensemble to symbol” file and “hg19 ncbi protein coding gene info” to data/ as RData to reduce package size.

Fixed

In function estimate_bestNumberofClusters(), index_for_NbClust<- c(…,“rubbin”,..)should be “rubin”.
In function map_clusters(): used apply(cluster_table,1,max) instead of rowMaxs(cluster_table), to reduce package dependencies.
In function multiCluster(): fixed below error by individual indices (adding param “allow1=T” in sourced mcl_clusters() ).
1. Error in if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + : missing value where TRUE/FALSE needed”

Notes 3/8/2023

Added functions

plotBoxPlot
future_consensusCluster (unexported; multicore)

Added pipeline components

exprCleanUp.Rmd
exprMain.Rmd
exprlongitudinal.Rmd
runPPL.R
settingsUI/app.R

Edits

In function consensusCluster(), changed cutFun inputs to character vector.
In function NbClust(), added plotOP option to control plot output.
Reduced sizes of external files via gz or RData.

Fixed

In function estimate_bestNumberofClusters() Fixed below error by res=res[!sapply(res,is.null)]

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 0

In function consensusCluster() Fixed below error by using tryCatch:

Error in rep(1:k, times = sapply(apclust_clusters@clusters, length)) : invalid 'times' argument.

Notes 3/28/2023

Functions

Added unexported partial correlation graph functions (ridge and/or glasso).
Added functions for “zoomed-in pathway view” app
Changes in zscoreData functions for data.frame objects

Edits Removed example dataset.

Pipeline

Added forceCorrection option. Introduced changes in settingsUI/app.R, exprCleanUp.Rmd, exprMain.Rmd and runPPL.R.
Added steps to save GSEA and DGE results in main module and longitudinal module.
Changed default number of clusters to NA in main module.
Fixed "clean_batchexamined_logRNA.RData" dataset output for main module.
Added text annotation to specify splitting threshold.
Added “zoomed-in pathway view” app.

Fixed

Fixed error in apclusterCluster: When k=4 but apcluster separated all samples into 5 clusters, parameter times do not match length of x. Fixed by x=1:length(apclust_clusters@clusters).
Fixed error in multiCluster: unable to find an inherited method for function ‘affinMult’ for signature ‘"rbfkernel", "numeric"’. Fixed by: 1) Line505 tryCatch; 2) Line516 !is.na(tempCut[samp1]==tempCut[samp2])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Installation

Usage

Run pipeline

Development

Notes 11/13/2023

Notes 2/1/2023

Notes 2/15/2023

Notes 3/8/2023

Notes 3/28/2023

Files

README.md

Latest commit

History

README.md

File metadata and controls

Installation

Usage

Run pipeline

Development

Notes 11/13/2023

Notes 2/1/2023

Notes 2/15/2023

Notes 3/8/2023

Notes 3/28/2023