title

tags

authors

affiliations

date

bibliography

Varistran: Anscombe's variance stabilizing transformation for RNA-seq gene expression data

RNA-seq

gene expression

variance stabilizing transformation

R package

name	orcid	affiliation
Paul Francis Harrison	0000-0002-3980-268X	1

name	index
Monash Bioinformatics Platform, Monash University	1

8 March 2017

paper.bib

Summary

RNA-seq measures RNA expression levels in a biological sample using high-throughput cDNA sequencing, producing counts of the number of reads aligning to each gene. Noise in RNA-seq read count data is commonly modelled as following a negative binomial distribution, where the variance is a quadratic function of the expression level. However many statistical, machine learning, and visualization methods work best when the noise in a data set has equal variance. Varistran is an R package that uses Anscombe's [-@Anscombe1948] variance stabilizing transformation for the negative binomial distribution to transform RNA-seq count data, so that the noise has equal variance across all measured gene expression levels. The transformed data may be treated as log2 transformed gene expression levels, but with variability reduced at low read counts. Varistran also includes a function to open a Shiny report with simple diagnostic visualizations, including a plot to assess how effective the variance stabilization was, a biplot of samples and genes, and a heatmap. This allows defective samples, sample mislabling, and batch effects to be easily identified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paper.md

paper.md

Summary

References

Files

paper.md

Latest commit

History

paper.md

File metadata and controls

Summary

References