forked from microbiome/tutorials
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcore_venn.rmd
executable file
·159 lines (114 loc) · 4.8 KB
/
core_venn.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
title: "Shared core with venn diagram"
author: "Sudarshan Shetty, Leo Lahti, et al."
bibliography:
- bibliography.bib
output:
BiocStyle::html_document:
number_sections: no
toc: yes
toc_depth: 4
toc_float: true
self_contained: true
thumbnails: true
lightbox: true
gallery: true
use_bookdown: false
highlight: haddock
---
<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{microbiome tutorial - atlas}
%\usepackage[utf8]{inputenc}
-->
In may instances one would wish to compare core taxa that are shared between multiple groups. Here, we demonstrate how this can be achieved by `microbiome` and [eulerr](https://cran.r-project.org/web/packages/eulerr/vignettes/introduction.html). The test data is stored in the [microbiomeutilities](https://microsud.github.io/microbiomeutilities/articles/microbiomeutilities.html) R package and the original source of data is [Zackular et al., 2014: The Gut Microbiome Modulates Colon Tumorigenesis](http://mbio.asm.org/content/4/6/e00692-13).
Load packages.
```{r pkg}
#install.packages("eulerr") # If not installed
library(eulerr)
library(microbiome)
#devtools::install_github('microsud/microbiomeutilities')
library(microbiomeutilities)
```
Get data
```{r data1, warning=FALSE, message=FALSE}
data("zackular2014")
pseq <- zackular2014
```
We will aim to identify shared core taxa between nationalities.
```{r check-var}
# simple way to count number of samples in each group
table(meta(pseq)$DiseaseState, useNA = "always")
```
There are 30 CRC and Healthy samples while 28 CRC samples.
```{r fix-pseq}
# convert to relative abundances
pseq.rel <- microbiome::transform(pseq, "compositional")
```
Make a list of DiseaseStates.
```{r get-grps}
disease_states <- unique(as.character(meta(pseq.rel)$DiseaseState))
print(disease_states)
```
Write a `for loop` to go through each of the `disease_states` one by one and combine identified core taxa into a list.
```{r get-core}
list_core <- c() # an empty object to store information
for (n in disease_states){ # for each variable n in DiseaseState
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(pseq.rel, DiseaseState == n) # Choose sample from DiseaseState by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in atleast 90% samples
prevalence = 0.75)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each DiseaseState.
list_core[[n]] <- core_m # add to a list core taxa for each group.
#print(list_core)
}
```
```{r printlist, eval=FALSE}
print(list_core)
```
```{r plot-venn, fig.width=6, fig.height=6, fig.align='center'}
# Specify colors and plot venn
# supplying colors in the order they appear in list_core
mycols <- c(nonCRC="#d6e2e9", CRC="#cbf3f0", H="#fcf5c7")
plot(venn(list_core),
fills = mycols)
```
```{r printlist-2, eval=TRUE}
print(list_core)
```
Here, we see that the ids are only names without taxonomic information. This is because the rownames of `otu_table` and `tax_table` are usually IDs or sequences.
The `format_to_besthit()` function in `microbiomeutilites` can be useful which add best taxonomic information that is available as demonstrated below.
```{r}
# use the pseq.rel object created at the begening of this tutorial.
taxa_names(pseq.rel)[1:5]
```
The taxa names are ids.
```{r}
# first remove "d__"
taxa_names(pseq.rel) <- gsub( "d__","", taxa_names(pseq.rel))
taxa_names(pseq.rel)[1:5]
# format names
pseq.rel.f <- format_to_besthit(pseq.rel)
# check names
taxa_names(pseq.rel.f)[1:5]
```
Run the `for loop` to go through each of the `disease_states` one by one and combine identified core taxa into a list.
```{r get-core-2}
list_core <- c() # an empty object to store information
for (n in disease_states){ # for each variable n in DiseaseState
#print(paste0("Identifying Core Taxa for ", n))
ps.sub <- subset_samples(pseq.rel.f, DiseaseState == n) # Choose sample from DiseaseState by n
core_m <- core_members(ps.sub, # ps.sub is phyloseq selected with only samples from g
detection = 0.001, # 0.001 in atleast 90% samples
prevalence = 0.75)
print(paste0("No. of core taxa in ", n, " : ", length(core_m))) # print core taxa identified in each DiseaseState.
list_core[[n]] <- core_m # add to a list core taxa for each group.
#print(list_core)
}
```
This is more useful information.
```{r}
print(list_core)
```
Note: There are limitations to the number of groups that can be plotted with the `eulerr::venn()`. Check [eulerr](https://cran.r-project.org/web/packages/eulerr/vignettes/introduction.html) documentation for more detailed information.