Stability.Rmd

---
title: "Microbiome stability analysis"
author: "Leo Lahti, Sudarshan Shetty et al."
bibliography: 
- bibliography.bib
output:
  BiocStyle::html_document:
    number_sections: no
    toc: yes
    toc_depth: 4
    toc_float: true
    self_contained: true
    thumbnails: true
    lightbox: true
    gallery: true
    use_bookdown: false
    highlight: haddock
---
<!--
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{microbiome tutorial - stability}
  %\usepackage[utf8]{inputenc}
  %\VignetteEncoding{UTF-8}  
-->


Get example data - [HITChip Atlas of 130 genus-like taxa across 1006 healthy western adults](http://www.nature.com/ncomms/2014/140708/ncomms5344/full/ncomms5344.html). A subset of 76 subjects have also short time series available for temporal stability analysis:

```{r bistability, message=FALSE, warning=FALSE, error=FALSE}
# Load the example data
set.seed(134)
library(microbiome)
library(dplyr)
data(atlas1006)

# Rename the example data
pseq <- atlas1006

# Focus on specific subset
pseq <- pseq %>% subset_samples(DNA_extraction_method == "r")

# Use relative abundances
pseq <- microbiome::transform(pseq, "compositional")

# Merge rare taxa to speed up examples
pseq <- aggregate_rare(pseq, level = "Genus", detection = .1/100, prevalence = 10/100)

# For cross-sectional analysis, use only the baseline time point:
pseq0 <- baseline(pseq)
```


## Intermediate stability quantification

It has been reported that certain microbial groups exhibit bi-stable
abundance distributions with distinct peaks at low and high
abundances, and an instable intermediate abundance range. Instability
at the intermediate abundance range is hence one indicator of
bi-stability. [Lahti et al. 2014](http://www.nature.com/ncomms/2014/140708/ncomms5344/full/ncomms5344.html) used straightforward correlation analysis to quantify how the distance from the intermediate abundance region (50% quantile) is associated with the observed shifts between consecutive time points. 

```{r bistability2, warning=FALSE, error=FALSE, message=FALSE}
# Here we use the (non-baseline) phyloseq object that contains time series.
intermediate.stability <- intermediate_stability(pseq, output = "scores")
```


## Bimodality quantification

Check the [bimodality page](Bimodality.html) for more examples on bimodality indicators.

Bimodality of the abundance distribution provides another (indirect)
indicator of bistability, although other explanations such as sampling
biases etc. should be controlled. Multiple bimodality scores are
available.

Multimodality score using [potential analysis with bootstrap](http://www.nature.com/ncomms/2014/140708/ncomms5344/full/ncomms5344.html)


```{r bimodality2, message=FALSE, warning=FALSE}
# Bimodality is better estimated from abundances at log scale (such as CLR)
pseq0.clr <- microbiome::transform(pseq0, "clr")

set.seed(4433)
# In practice, it is recommended to use more bootstrap iterations than in this example
bimodality.score <- bimodality(pseq0.clr, method = "potential_analysis",
                               bs.iter = 20, peak.threshold = 10,
			       min.density = 10)
```


## Comparing bimodality and intermediate stability

The analysis suggests that bimodal population distribution across individuals is often associated with instable intermediate abundances within individuals. The specific bi-stable groups in the upper left corner were suggested to constitute bistable tipping elements of the human intestinal microbiota in [Lahti et al. Nat. Comm. 5:4344, 2014](http://www.nature.com/ncomms/2014/140708/ncomms5344/full/ncomms5344.html):

```{r bimodalitybistability, message=FALSE, warning=FALSE}

taxa <- taxa(pseq0)
df <- data.frame(group = taxa,
intermediate.stability = intermediate.stability[taxa],
bimodality = bimodality.score[taxa])
theme_set(theme_bw(20))

library(ggrepel)

p <- ggplot(df,
  aes(x = intermediate.stability, y = bimodality, label = '')) + #Doesn't add labels to points
  geom_text_repel() +
  geom_point()

#Labels only those that have bimodality > 0.4 and intermediate stability < 0, adjusts the placement of labels
p <- p + geom_text(aes(label = ifelse(df$bimodality > 0.6 | df$intermediate.stability < -0.25, group, '')), vjust = "inward", hjust = "inward")

print(p)
```


## Tipping point detection

Identify potential minima in cross-sectional population data as
tipping point candidates. 

```{r stability-tipping, message=FALSE, warning=FALSE}
# Log10 abundance for a selected taxonomic group
# Pick the most bimodal taxa as an example
tax  <- names(which.max(bimodality.score))

# Detect tipping points at log10 abundances
x <- abundances(microbiome::transform(pseq, "clr"))[tax,]

# Bootstrapped potential analysis to identify potential minima
# in practice, use more bootstrap iterations
potential.minima <- potential_analysis(x, bs.iter = 10)$minima

# Same with earlywarnings package (without bootstrap ie. less robust)
# library(earlywarnings)
# res <- livpotential_ews(x)$min.points

# Identify the potential minimum locations as tipping point candidates
tipping.samples <- sapply(potential.minima, function (m) {names(which.min(abs(sort(x) - m)))})
tipping.point <- abundances(pseq)[tax, tipping.samples]
print(tipping.point)

# Illustrate the detected tipping point
plot(density(x), main = tax)
abline(v = x[tipping.samples])
```