A code pool for RNA expression analysis in Woodman Lab.
Without SSH key:
devtools::install_git(
url='http://gitlab.mdanderson.edu/XLi23/expr.git',
)
With SSH key set up:
devtools::install_git(
url='[email protected]:XLi23/expr.git',
quiet=FALSE
)
View package vignette with browseVignettes("expr")
in Rstudio console, or here
The expression analysis pipeline saves all outputs in your current working directory. The initial input should be formatted as an RData file, consisting:
- object named "gene_expressions" - the expression matrix or data frame with genes as row names and sample names as column names
- object named "sampleAttr" - the sample meta data table, samples by row. ID columns are required for both sample and patient.
Follow below code to generate an example RData file.
gene_expressions = read.csv(gzfile(system.file("extdata/expressions.csv.gz",package = "expr"),'rt'),row.names = 1,check.names = F)
RNA_sample_info<-read.csv(system.file("extdata/RNA_sample_info.csv",package = "expr"),header=T,stringsAsFactors = F,check.names = F)
RNA_clinic_info<-read.csv(system.file("extdata/RNA_clinic_info.csv",package = "expr"),header=T,stringsAsFactors = F,check.names = F)
sampleAttr=expr::table_org(list(RNA_sample_info,RNA_clinic_info))
save(gene_expressions,sampleAttr,file = "meso.RData")
To run the pipeline, execute:
runExprPPL()
- This package is documented with roxygen2. Before pushing a commit, 'Cmd + Shift + B' to generate document, then 'Cmd + Shift + B' to build the package. Keep new functions going!
- Package tests are as good as how you write it. Please kindly report issues under "issue" tab every time an error is encountered.
functions
- prepare_clean_RNA_sample_info_and_protein_expressions
- prepare_unsupervised_data
- unsupervised_analysis
- Added line in
prepare_clean_RNA_sample_info_and_protein_expressions
function to ensure expression column names are identical to sample info sample IDs so that otherwise function error out. - Param name in
prepare_clean_RNA_sample_info_and_protein_expressions
changedtolerent_library_size_factor
totolerant_library_size_factor
. Match.arg
renders error when arg has a length of 1 (when running function line by line). Changed to select first element in a vector.- gg3D (needed for
unsupervised_analysis
) seems to be difficult to install (XQuarts needed for MAC), used plotly instead. - The
guide
argument inscale_*()
cannot beFALSE
. This was deprecated in ggplot2 3.3.4. Fixed by addinglegend.position="none"
- Fixed the error that setting any of analysis to
FALSE
fails theunsupervised_analysis
function.- grid.arrange line for organizing figures was removed. See function manual for arranging plots.
- Results were sperated into plots and analysis responses in the function output.
vignette
- The package vignette showcase expr in making RNA analysis a pipeline. Added a k-mean clustering measure to auto-detect batch effect as an alternative for manual visualization. Manual confirmation still recommended.
- The package vignette can be easily customized to generate one-click html or pdf report for user datasets.
functions
- fixed error: in
prepare_clean_RNA_sample_info_and_protein_expressions()
, ifgene_id_col="rowname"
, sequenced_RNA_samples returns NA at line 53.
Added exported functions
- NbClust
- consensus_immunedeconvolute
- estimate_bestNumberofClusters
- map_clusters
- consensusCluster
- gsea
- MRNsurr
- getHeatMapAnnotation
- reassignNA
- table_org
- zscoreData
- changeColNames
- getFill
- t_test2
Edits
- getHeatmapAnnotations: added param track - for selecting meta data of interest as track input.
- Moved “protein coding ensemble to symbol” file and “hg19 ncbi protein coding gene info” to
data/
as RData to reduce package size.
Fixed
- In function
estimate_bestNumberofClusters()
, index_for_NbClust<- c(…,“rubbin”,..)should be “rubin”. - In function
map_clusters()
: usedapply(cluster_table,1,max)
instead ofrowMaxs(cluster_table),
to reduce package dependencies. - In function
multiCluster()
: fixed below error by individual indices (adding param “allow1=T” in sourcedmcl_clusters()
).Error in if ((res[ncP - min_nc + 1, 15] <= resCritical[ncP - min_nc + : missing value where TRUE/FALSE needed”
Added functions
- plotBoxPlot
- future_consensusCluster (unexported; multicore)
Added pipeline components
- exprCleanUp.Rmd
- exprMain.Rmd
- exprlongitudinal.Rmd
- runPPL.R
- settingsUI/app.R
Edits
- In function
consensusCluster()
, changedcutFun
inputs to character vector. - In function
NbClust()
, addedplotOP
option to control plot output. - Reduced sizes of external files via gz or RData.
Fixed
- In function
estimate_bestNumberofClusters()
Fixed below error byres=res[!sapply(res,is.null)]
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 2, 0
- In function
consensusCluster()
Fixed below error by using tryCatch:
Error in rep(1:k, times = sapply(apclust_clusters@clusters, length)) : invalid 'times' argument.
Functions
- Added unexported partial correlation graph functions (ridge and/or glasso).
- Added functions for “zoomed-in pathway view” app
- Changes in zscoreData functions for data.frame objects
Edits Removed example dataset.
Pipeline
- Added forceCorrection option. Introduced changes in
settingsUI/app.R
,exprCleanUp.Rmd
,exprMain.Rmd
andrunPPL.R
. - Added steps to save GSEA and DGE results in main module and longitudinal module.
- Changed default number of clusters to NA in main module.
- Fixed "clean_batchexamined_logRNA.RData" dataset output for main module.
- Added text annotation to specify splitting threshold.
- Added “zoomed-in pathway view” app.
Fixed
- Fixed error in
apclusterCluster
: When k=4 but apcluster separated all samples into 5 clusters, parametertimes
do not match length of x. Fixed byx=1:length(apclust_clusters@clusters)
. - Fixed error in
multiCluster
:unable to find an inherited method for function ‘affinMult’ for signature ‘"rbfkernel", "numeric"’
. Fixed by: 1) Line505tryCatch
; 2) Line516!is.na(tempCut[samp1]==tempCut[samp2])