An R package designed for subtyping FF / FFPE patient samples profiled by the NanoString platform. It's versatile and can also handle RNAseq and microarray datasets for CMS classification.
An example data set is provided to explain how to work with NanoCMSer
R package and NanoCMSer
function.
To install the NanoCMSer
package use the following code:
devtools::install_github("atorang/NanoCMSer")
To run the demo below, you need to first load the package and the dataset:
library("NanoCMSer")
data("exprs.test")
Now you can utilize an elastic-net model for data classification. The function takes five arguments:
-
data
: A numeric matrix or data frame containing expression levels. It's recommended to use raw count data. Rows represent genes, and columns represent samples. -
sample_type
: A character string specifying the type of samples. Accepted values aretumorFF
for fresh frozen patient samples,tumorFFPE
for formalin-fixed paraffin-embedded patient samples, andmodels
for human in vitro models including cell lines, primary cultures, and organoids. -
gene_names
: A character string specifying the gene annotation used in the data. Accepted values areensembl
,symbol
, andentrez
. The default value isensembl
, which is recommended to mitigate the risk of missing values due to gene symbol updates. The current version encompasses all previous versions of gene symbols up to 2024. -
perform_log2
: A logical value determining whether data needs log2-transformation (TRUE
) or if the data is already log2-transformed (FALSE
). The default isperform_log2 = TRUE
. -
impute
: A logical value. Ifimpute = TRUE
, missing genes (up to 10% of the genes utilized in the classifiers) will be imputed using a trained linear regression model. Ifimpute = FALSE
, the classifier will generate an error message in the event of missing genes. The default isimpute = TRUE
.
res <- NanoCMSer(data=exprs.test,
sample_type="tumorFFPE",
perform_log2=TRUE,
gene_names="ensembl",
impute=TRUE)