forked from microbiome/tutorials
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCompositionAmplicondata2.Rmd
executable file
·213 lines (152 loc) · 6.83 KB
/
CompositionAmplicondata2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
title: "Microbiome composition for Amplicon data"
author: "Leo Lahti, Sudarshan Shetty et al. `r Sys.Date()`"
bibliography:
- bibliography.bib
output:
BiocStyle::html_document:
number_sections: no
toc: yes
toc_depth: 4
toc_float: true
self_contained: true
thumbnails: true
lightbox: true
gallery: true
use_bookdown: false
highlight: haddock
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{microbiome tutorial - composition}
%\usepackage[utf8]{inputenc}
%\VignetteEncoding{UTF-8}
-->
Also see [phyloseq barplot examples](http://joey711.github.io/phyloseq/plot_bar-examples.html).
Check the [core microbiome page](http://microbiome.github.io/microbiome/CoremicrobiotaAmplicon.html) which shows how to read the your files into R and make a phyloseq object.
Here, an example data is used which is not available in the package currently due to size limitations. We will directly strat from a ready phyloseq object. Please test your own data and give your feedback on the [issues page](https://github.com/microbiome/microbiome/issues).
```{r composition-amplicon-example1, warning=FALSE, message=FALSE, eval=FALSE}
# Example data
library(microbiome)
# Try another theme
# from https://github.com/hrbrmstr/hrbrthemes
# you can install these if you don't have it already.
# devtools::install_github("hrbrmstr/hrbrthemes")
library(hrbrthemes)
library(gcookbook)
library(tidyverse)
library(dplyr)
data("DynamicsIBD") #This data is currently unavialable due to size limitations
ps1 <- DynamicsIBD
colnames(tax_table(ps1))
```
As you can see the taxonomic classification is just lablled as "Rank1" ... "Rank7". We need to change this to proper designation and also do some formatting of the data. This can be a useful example for understanding simple file processing in R.
```{r, composition-amplicon-example1a, eval=FALSE}
# First change the column names of the taxonomy table in phyloseq to following:
colnames(tax_table(ps1)) <- c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species" )
tax_table(ps1)[tax_table(ps1)[,"Kingdom"]== "NA", "Kingdom" ] <- "Unidentified_Kingdom"
tax_table(ps1)[tax_table(ps1)[,"Phylum"]== "p__", "Phylum" ] <- "p__Unidentified_Phylum"
# make a dataframe for taxonomy information.
taxic <- as.data.frame(ps1@tax_table)
otu.df <- abundances(ps1)
# make a dataframe for OTU information.
otu.df <- as.data.frame(otu.df)
# check the rows and columns
# head(otu.df)
# Add the OTU ids from OTU table into the taxa table at the end.
taxic$OTU <- row.names.data.frame(otu.df)
# You can see that we now have extra taxonomy levels.
colnames(taxic)
# convert it into a matrix.
taxmat <- as.matrix(taxic)
# convert into phyloseq compaitble file.
new.tax <- tax_table(taxmat)
# incroporate into phyloseq Object
tax_table(ps1) <- new.tax
```
# Composition barplots
The compositon plots can be shown either as barplots or heatmaps. Both examples are show below.
## Plot relative abundance
Now we can improve the plot further.
Let's try at Family level.
```{r composition-amplicon-example3, fig.width=12, fig.height=5, out.width="400px", fig.show="hold", warning=FALSE, message=FALSE, eval=FALSE}
library(phyloseq)
# for example purpose we will remove samples with less than 10000
ps2 = prune_samples(sample_sums(ps1)>=2000, ps1)
# To speed up the example we will use only those OTUs that are detected 100 times and present in 50% of the samples.
pseq2 <- microbiome::core(ps2, detection = 100, prevalence = .5)
# Improve the plotting
tax_table(pseq2)[tax_table(pseq2)[,"Family"]== "f__", "Family" ] <- "f__Unclassified Family"
# We will also remove the "f__" patterns for cleaner labels
tax_table(pseq2)[,colnames(tax_table(pseq2))] <- gsub(tax_table(pseq2)[,colnames(tax_table(pseq2))],pattern="[a-z]__",replacement="")
# merge at family level.
pseq.fam <- aggregate_taxa(pseq2, "Family")
p.fam <- plot_composition(pseq.fam, sample.sort = NULL, otu.sort = NULL,
x.label = "ibd_subtype", plot.type = "barplot", verbose = FALSE)
print(p.fam)
```
## Plot relative abundance
```{r composition-amplicon-example4, fig.width=12, fig.height=5, out.width="400px", fig.show="hold", warning=FALSE, message=FALSE, eval=FALSE}
pseq.famrel <- transform(pseq.fam, "compositional")
p.famrel <- plot_composition(pseq.famrel, sample.sort = NULL, otu.sort = NULL,
x.label = "ibd_subtype", plot.type = "barplot", verbose = FALSE)
print(p.famrel)
# further improvements can be done as follows
p.famrel <- plot_composition(pseq.famrel,
sample.sort = NULL,
otu.sort = NULL,
x.label = "ibd_subtype",
plot.type = "barplot",
verbose = FALSE) +
guides(fill = guide_legend(ncol = 1)) +
scale_y_percent() +
labs(x = "Samples",
y = "Relative abundance (%)",
title = "Relative abundance data",
subtitle = "Subtitle",
caption = "Caption text.") +
theme_ipsum(grid="Y")
print(p.famrel)
```
Average by group
```{r composition-amplicon-example5, fig.width=12, fig.height=5, out.width="400px", fig.show="hold", warning=FALSE, message=FALSE, eval=FALSE}
# Averaged by group
p <- plot_composition(pseq.famrel,
average_by = "ibd_subtype") +
guides(fill = guide_legend(ncol = 1)) +
scale_y_percent() +
labs(x = "Samples",
y = "Relative abundance (%)",
title = "Relative abundance data",
subtitle = "Subtitle",
caption = "Caption text.") +
theme_ipsum(grid="Y")
print(p)
```
# Heatmap composition
```{r composition-amplicon-example6, fig.width=12, fig.height=5, out.width="400px", fig.show="hold", warning=FALSE, message=FALSE, eval=FALSE}
pseq.famlog <- transform(pseq.fam, "log10")
p.famrel.heatmap <- plot_composition(pseq.famlog,
sample.sort = NULL,
otu.sort = NULL,
x.label = "ibd_subtype",
plot.type = "heatmap",
verbose = FALSE)
print(p.famrel.heatmap)
```
# Plot taxa prevalence
We use the Dynamics IBD data set from [Halfvarson J., et al. Nature
Microbiology, 2017](http://www.nature.com/articles/nmicrobiol20174) as
downloaded from [Qiita ID
1629](https://qiita.ucsd.edu/study/description/1629). This function
allows you to have an overview of OTU prevalences alongwith their
taxonomic affiliations. This will aid in checking if you filter OTUs
based on prevalence, then what taxonomic affliations will be lost.
```{r plot_prev1, fig.height=6, fig.width=8, eval=FALSE}
# We will use the ps1 object we created previously.
print(ps1)
# Use sample and taxa subset to speed up example
p0 <- subset_samples(ps1, sex == "male" & timepoint == 1)
# For the available taxonomic levels
plot_taxa_prevalence(p0, "Phylum")
```