ENVS2018 kioloa results markdown 2024.Rmd

---
title: "ENVS2018 Environmental Science Field School: Kioloa analyses"
author: "Shoshana Rapley, Saul Cunningham"
date: "17 September 2024"
output:
  html_document:
    toc: true
    number_sections: true
    toc_depth: 3
    toc_float:
      collapsed: true
    theme: cerulean
    highlight: pygments
    code_folding: hide
editor_options: 
  chunk_output_type: console
knit: (function(inputFile, encoding) { rmarkdown::render(inputFile, encoding = encoding, output_file = file.path(dirname(inputFile), 'ENVS2018 Kioloa Report Results 2024.html')) })
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
```

# Background

The purpose of the Kioloa week report is to report on the patterns of biodiversity we observed. In particular, you should examine the composition and structure of different plant communities at Kioloa, whether animal (bird) communities also differ, and consider how these patterns relate to the underlying soils and geology.

We are excited to share the whole-class survey results with you all, because there are some really interesting patterns here. Keep in mind when writing the reports that we want you to use this data summary as source material. We are not expecting that you do further, more complex analyses. You can focus on the summary data that we have provided. You may also use the figures/graphs from this markdown in your report.

Also you are not required to report on every piece of data. Given that we have more than 100 species in the list and 14 habitat measures, this would dilute the message of the report. Keep your focus on the data that show patterns of interest. Try to pick a narrative from these data that you find interesting and that you can explain within the constraints of the word count. 

# Markdown

This document is called a 'markdown'. It's a way to share code, annotations and results all in one place. 

If you are interested in the code you can unfold the chunks by clicking the 'Show' icon. There is also an option on the top right of the document called 'Code' that allows you to show or hide all code chunks. Notes in the code itself that start with a hash explain what the next line of code achieves.  

```{r}
# Install and load required packages
pacman::p_load(dplyr, gdata, ggplot2, ggrepel, gt, gtExtras, janitor, lme4, lubridate, 
               rstudioapi, readxl, sjmisc, tidyr, tidyverse, vegan, viridis)
```

# Vegetation communities

We collected vegetation data using the biometric tool methodology. This included compositional elements (e.g. overstory trees) and functional elements (e.g. woody debris). 

## Floral composition

Let's start by looking at a matrix of the floral composition. 

Before analysing the data we made a few corrections to what we believe were data entry errors (details can be provided on request). We then organised the data in a way that allowed us to identify the some different plant communities. 

Here is how to read the table: Each row is a plant species and each column is a site.  In the cells we have marked o (overstorey) m (mid-storey) and g (ground layer) to show where the species occurs.

The vertical bands of colour should help you to see the six plant communities. Species that help define particular communities are highlighted with blocks of colour. The ordering from top to bottom is arranged to make the associations clearer (to arrange the blocks) so it is not perfectly arranged into overstory, mid-storey understorey. The functional group in the first column helps you see what kind of plant it is, and the “observation” in the last column explains how the species relate to the plant communities that we identified.

When you report on the analysis remember to look at where these sites occur in relation to each other and how this relates to the underlying soil, the topography and the distance from the ocean. Also examine how the species composition relates to the structural characteristics that are described by the biometric survey.

```{r}
# read in the data 
comp_data <- read_excel("data/ENVS2018 kioloa flora composition 2024.xlsx", skip = 1)

# create a pretty table using the package 'gt'
comp_table <- gt(comp_data) %>%
  tab_options(data_row.padding = px(1)) %>%
  # add header
  tab_header(title = "Floral composition", 
             subtitle = "ENVS2018 Student data Kioloa 2024") %>%
  # replace NAs with blanks
  tab_style(cell_text(style = "italic"),locations = cells_body(columns = 2)) %>%
  sub_missing(missing_text = " ") %>%
   # add column colours for community type (using a 6 colour viridis scale with transparency)
  tab_style(style = list(cell_fill(color = "#44015433")),
    locations = cells_body(columns = 3:5)) %>%
  tab_style(style = list(cell_fill(color = "#41448733")),
    locations = cells_body(columns = 6:8)) %>%  
  tab_style(style = list(cell_fill(color = "#2A788E33")),
    locations = cells_body(columns = 9:12)) %>%
  tab_style(style = list(cell_fill(color = "#22A88433")),
    locations = cells_body(columns = 13:18)) %>%
  tab_style(style = list(cell_fill(color = "#7AD15133")),
    locations = cells_body(columns = 19:22)) %>%
  tab_style(style = list(cell_fill(color = "#FDE72533")),
    locations = cells_body(columns = 23:26)) %>%
  # add row/column colours for clusters (using 6 colour viridis scale)
  tab_style(style = list(cell_fill(color = "#440154FF"), cell_text(color = "white")),
    locations = cells_body(columns = 3:5, rows = c(1, 44:49))) %>%
  tab_style(style = list(cell_fill(color = "#414487FF"), cell_text(color = "white")),
    locations = cells_body(columns = 6:8, rows = c(2:4, 26, 38:43))) %>%  
  tab_style(style = list(cell_fill(color = "#2A788EFF"), cell_text(color = "white")),
    locations = cells_body(columns = 9:12, rows = 50:54)) %>%
  tab_style(style = list(cell_fill(color = "#22A884FF"), cell_text(color = "white")),
    locations = cells_body(columns = 13:18, rows = 3:14)) %>%
  tab_style(style = list(cell_fill(color = "#7AD151FF"), cell_text(color = "white")),
    locations = cells_body(columns = 19:22, rows = c(3:4, 15:30))) %>%
  tab_style(style = list(cell_fill(color = "#FDE725FF"), cell_text(color = "black")),
    locations = cells_body(columns = 23:26, rows = c(3:4, 32:36))) %>%
  # add new community grouping names
  tab_spanner(label = "Casuarina forest", columns = 3:5) %>%
  tab_spanner(label = "Coastal Eucalypt forest", columns = 6:8) %>%
  tab_spanner(label = "Grassland/paddock", columns = 9:12) %>%
  tab_spanner(label = "Lower slope forest", columns = 13:18) %>%
  tab_spanner(label = "Moist gully rainforest", columns = 19:22) %>%
  tab_spanner(label = "Upper slope forest", columns = 23:26) %>%
  # change tally colour
  tab_style(style = list(cell_fill(color = "lightgrey")),
    locations = cells_body(columns = 27)) %>%
 # change font size
  tab_style(style = cell_text(size = px(12)), locations = cells_column_labels()) %>%
  tab_style(style = cell_text(size = px(10)), locations = cells_body()) %>%
  tab_style(style = cell_text(size = px(11)), locations = cells_column_spanners())

comp_table

```


**CAS: Casuarina forest**
This community has a *Casuarina glauca* overstory. It is without any other canopy tree species and is missing most of the shrubs that are common elsewhere (e.g. *Acacia* species). There are also some shrub and forb species found here that were not seen in other sites.

**CEF: Coastal Eucalypt forest**
This is the only community where *Eucalyptus botryoides* is common in the overstory (though *Corymbia maculata* and *Eucalytptus pilularis* occur also). The mid-storey is characterised by *Acmena smithii* and a mix of other species including *Acacia* species. The ground layer has abundant sedges.

**P: Grassland/paddock**
This community is characterised by the absence of overstorey and mid-storey. The ground layer has grasses, sedges, rushes and some forbs. The species that were abundant here and not elsewhere were exotic forbs.

**LSF: Lower slope forest**
This community has a canopy of *Corymbia maculata*, *Eucalyptus pilularis* and in one site also *Eucalyptus tereticornis*. The mid-storey has many shrubs and small trees including *Acacia* species. *Acacia maidenii* was found in this community but not elsewhere. There are a range of mid and ground layer species that are characteristic of rainforest and upper slope forest sites that are missing from this forest type (such as *Acmena smithii*, *Livistona australis*, and the rainforest-y species).

**RF: Moist gully rainforest**
This community has an emergent overstorey of *Corymbia maculata* and *Eucalyptus pilularis* (like much of the forest in the area) but it also has a number of shorter overstorey and mid-storey species not common in the other communities. The ground layer has climbers (*Hibbertia* species, *Kennedia rubicunda*, *Cissus*, *Cynanchum elegans*) and ferns. There are also a number of species found here that also occur in upper slope forest such as *Acmena smithii*, *Livistona australis*, *Prostanthera sp.*.

**USF: Upper slope forest**
This community has an overstory of *Corymbia maculata* and *Eucalyptus pilularis* and a well-developed mid-storey that includes many shrubs and short trees that are also found in the rainforest. There are also a few species that occur here but not in the other communities.

## Biometric values

Now that we have our floristic community groupings, let's take a look at the biometric data you collected. We will highlight cells for a few key variables based on their values to make it easier to see what's going on. 

```{r}
# read in data
biometric_data <- read_excel("data/ENVS2018 kioloa biometric 2024.xlsx") %>%
  # drop team column
  select(-1)

# make table in gt
biometric_table <- gt(biometric_data) %>%
  tab_options(data_row.padding = px(1)) %>%
  # change font size
  tab_style(style = cell_text(size = px(12)), locations = cells_column_labels()) %>%
  tab_style(style = cell_text(size = px(10)), locations = cells_body()) %>%
  # add header
  tab_header(title = "Biometric data", 
             subtitle = "ENVS2018 Student data Kioloa 2023") %>%
  # format percentages to display %
  fmt_percent(columns = 3:13, decimals = 1) %>%
  # add data grouping labels
  tab_spanner(label = "Percent cover (transects)", columns = 3:10) %>%
  tab_spanner(label = "Tree cover", columns = 11:13) %>%
  tab_spanner(label = "Feature counts", columns = 14:16) %>%
  # add row colours for community type (using a 7 colour viridis scale with transparency)
  tab_style(style = list(cell_fill(color = "#44015433")),
    locations = cells_body(rows = 1:3, columns = 1:2)) %>%
  tab_style(style = list(cell_fill(color = "#41448733")),
    locations = cells_body(rows = 4:6, columns = 1:2)) %>%  
  tab_style(style = list(cell_fill(color = "#2A788E33")),
    locations = cells_body(rows = 7:10, columns = 1:2)) %>%
  tab_style(style = list(cell_fill(color = "#22A88433")),
    locations = cells_body(rows = 11:16, columns = 1:2)) %>%
  tab_style(style = list(cell_fill(color = "#7AD15133")),
    locations = cells_body(rows = 17:20, columns = 1:2)) %>%
  tab_style(style = list(cell_fill(color = "#FDE72533")),
    locations = cells_body(rows = 21:24, columns = 1:2)) %>%
  # colour columns according to values
  # Grass > 40%
   tab_style(cell_fill(color = "#F9E3D6"),
    locations = cells_body(columns = Grass, rows = Grass > .4)) %>%
   tab_style(cell_fill(color = "#F2F2F2"),
    locations = cells_body(columns = Grass, rows = Grass < .4)) %>%
  # Leaf litter > 30%
   tab_style(cell_fill(color = "#F9E3D6"),
    locations = cells_body(columns = Litter, rows = Litter > .3)) %>%
   tab_style(cell_fill(color = "#F2F2F2"),
    locations = cells_body(columns = Litter, rows = Litter < .3)) %>%
  # Midstory > 15%
   tab_style(cell_fill(color = "#F9E3D6"),
    locations = cells_body(columns = Midstory, rows = Midstory > .15)) %>%
   tab_style(cell_fill(color = "#F2F2F2"),
    locations = cells_body(columns = Midstory, rows = Midstory < .15)) %>%
  # Midstory + over > 60%
   tab_style(cell_fill(color = "#F9E3D6"),
    locations = cells_body(columns = Cover, rows = Cover > .6)) %>%
   tab_style(cell_fill(color = "#F2F2F2"),
    locations = cells_body(columns = Cover, rows = Cover < .6)) %>%
  # CWD length > 100m 
   tab_style(cell_fill(color = "#F9E3D6"),
    locations = cells_body(columns = CWD, rows = CWD > 100)) %>%
   tab_style(cell_fill(color = "#F2F2F2"),
    locations = cells_body(columns = CWD, rows = CWD < 100)) %>%
  # add footnotes for the above
    tab_footnote(footnote = "Highlighted grass cover > 40%",
    locations = cells_column_labels(columns = Grass)) %>%
    tab_footnote(footnote = "Highlighted leaf litter > 30%",
    locations = cells_column_labels(columns = Litter)) %>%
    tab_footnote(footnote = "Highlighted midstory cover > 15%",
    locations = cells_column_labels(columns = Midstory)) %>%
    tab_footnote(footnote = "Highlighted combined mid and overstory cover > 60%",
    locations = cells_column_labels(columns = Cover)) %>%
    tab_footnote(footnote = "Highlighted coarse woody debris where the total length > 100m",
    locations = cells_column_labels(columns = CWD))

biometric_table

```

## Visualisation

That's given us a good overview of how the biometric features vary by community type. Let's now summarise and visualise the data as boxplots. A boxplot shows the mean (middle line of the box), quartiles (the box and whiskers) and outliers (dots) and are useful for comparing quantitative data among groups. 

```{r}
biometric_data2 <- biometric_data %>%
  # clean names
  clean_names() %>%
  # convert percentages to 0-100
  mutate(across(3:13, ~ .x * 100)) %>%
  # order the factors (by appearance) to match the table colour palette
  mutate(community = as_factor(community))

# break into ground cover data and convert to long format for graphing
cover <- select(biometric_data2, -c(11:16)) %>%
  pivot_longer(cols = 3:10, names_to = "feature", values_to = "percent") %>%
  mutate(feature = as_factor(feature))

# create box plot for cover
ggplot(cover) +
  geom_boxplot(aes(community, percent, fill = community)) +
  scale_fill_viridis_d() +
  facet_wrap(~feature) +
  theme_classic() +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.x=element_blank())

```

Good. We can see there are generally low values for cryptogam, rock and bare ground, so let's remove those to make the others easier to read. 

```{r}
cover2 <- cover %>% 
  filter(feature %in% c("cover", "forb", "grass", "litter", "shrub"))

# create box plot for cover
ggplot(cover2) +
  geom_boxplot(aes(community, percent, fill = community)) +
  scale_fill_viridis_d() +
  facet_wrap(~feature) +
  theme_classic() +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.x=element_blank())

```

Interesting!

Now let's look at the mid and overstory cover, plus the combined mid and overstory (called in the plot 'cover'). The combined mid and overstory metric gives us an indication of how much light will get through to the floor (which in turn affects which plants can grow there).

```{r}
# break into tree cover data and convert to long format for graphing
tree <- select(biometric_data2, -c(3:10, 14:16)) %>%
  pivot_longer(cols = 3:5, names_to = "feature", values_to = "percent") %>%
  mutate(feature = as_factor(feature))

# create box plot for cover
ggplot(tree) +
  geom_boxplot(aes(community, percent, fill = community)) +
  scale_fill_viridis_d() +
  facet_wrap(~feature) +
  theme_classic() +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.x=element_blank())

```

Now let's look at the count biometric data: trees, trees with hollows and length of CWD.

```{r}
# break into cover data and convert to long format for graphing
count <- select(biometric_data2, -c(3:13)) %>%
  pivot_longer(cols = 3:5, names_to = "feature", values_to = "count") %>%
  mutate(feature = as_factor(feature))

# create box plot for count
ggplot(count) +
  geom_boxplot(aes(community, count, fill = community)) +
  scale_fill_viridis_d() +
  facet_wrap(~feature) +
  theme_classic() +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.x=element_blank())


```

Great! You can use these graphs in your report. 

## Summary Statistics

And we'll calculate the following summary statistics for all features: mean and standard deviation. You can use these numbers in the report.

```{r}
summary_stats <- biometric_data %>% clean_names() %>%
  pivot_longer(cols = 3:16, names_to = "feature", values_to = "value") %>%
  group_by(community, feature) %>%
  summarise(mean = mean(value),
            sd = sd(value)) %>%
  pivot_wider(names_from = 2, values_from = 3:4) %>%
  sjmisc::rotate_df(cn=TRUE) %>% slice(-1) %>%
  rownames_to_column() %>%
  rename("Stat" = 1) %>%
  separate_wider_delim(Stat, "_", names = c("Stat", "Variable"),
                       too_many = "merge") %>%
  arrange(Variable) %>%
  mutate_at(c(3:8), as.numeric)

summary_stats_table <- gt(summary_stats) %>%
  tab_options(data_row.padding = px(1)) %>%
  # change font size
  tab_style(style = cell_text(size = px(12)), locations = cells_column_labels()) %>%
  tab_style(style = cell_text(size = px(10)), locations = cells_body()) %>%
  # add header
  tab_header(title = "Biometric data summary statistics", 
             subtitle = "ENVS2018 Student data Kioloa 2023") %>%
  # drop trailing zeros
  fmt_number(drop_trailing_zeros = TRUE) %>%
  # format percentages to display %
  fmt_percent(rows = c(1,3,5,9,11,15,17,19,21,23,25), decimals = 1) %>%
  # add column colours
  tab_style(style = cell_fill(color = "#44015433"),
    locations = cells_body(columns = 3)) %>%
  tab_style(style = cell_fill(color = "#41448733"),
    locations = cells_body(columns = 4)) %>%
  tab_style(style = cell_fill(color = "#2A788E33"),
    locations = cells_body(columns = 6)) %>%  
  tab_style(style = cell_fill(color = "#22A88433"),
    locations = cells_body(columns = 5)) %>%  
  tab_style(style = cell_fill(color = "#7AD15133"),
    locations = cells_body(columns = 7)) %>%
  tab_style(style = cell_fill(color = "#FDE72533"),
    locations = cells_body(columns = 8)) %>%
  # arrange columsn per other tables
  cols_move(columns = 5, after = 6)

summary_stats_table

```

# Bird surveys

We conducted expert bird surveys of all the plots. The observer (Sho, Julian or Belinda) recorded all birds seen and heard in a five-minute period in the following distance categories: <25m, 25-50m, 50-100m, >100m and overhead. We also asked you to estimate how many species you could identify during the survey. 

## Site visits

Let's start by visualising and summarising the data to get our heads around it. Each site was surveyed twice, so let's look at each visit for each site. 

```{r}
# read in the data and clean names 
birds_expert <- read_excel("data/ENVS2018 kioloa bird data 2024.xlsx", sheet = 1) %>%
  clean_names() %>%
  # replace NAs with zeros
  replace(is.na(.), 0) %>%
  # sum counts by species for inside (0-50m) and outside (50+m) the site, and total
  rowwise() %>%
  mutate(count_in = sum(x25 + x25_50),
         count_out = sum(x50_100 + x100 + overhead),
         count_total = sum(count_in + count_out)) %>%
  select(!c(8:12)) %>%
  mutate(present_in = ifelse(count_in>0, 1, 0),
         present_out = ifelse(count_out>0, 1, 0),
         present_total = ifelse(count_total>0, 1, 0))

# summarise by site for richness
birds_richness <- birds_expert %>% group_by(site, visit) %>%
  mutate(visit = as.factor(visit)) %>%
  summarise(richness_in = sum(present_in>0),
            richness_out = sum(present_out>0),
            richness_total = sum(present_total>0)) %>%
  arrange(site)

# plot site visits
ggplot(birds_richness)+
  geom_point(aes(factor(site), richness_total, colour = visit)) +
  geom_line(aes(factor(site), richness_total)) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle=45)) +
  xlab("site")

# calculate differences
birds_alpha <- birds_richness %>% group_by(site) %>%
  mutate(diff = abs(richness_total - lag(richness_total))) %>%
  na.omit()

alpha_diff <- data.frame(mean = mean(birds_alpha$diff),
                        min = min(birds_alpha$diff),
                        max = max(birds_alpha$diff),
                        stdev = sd(birds_alpha$diff))

print(alpha_diff)

```

The difference between visits was on average 4 bird species.

How about the difference in diversity detected between each visit? I.e. the beta diversity. 

```{r}
birds_beta <- data.frame()
sites <- unique(birds_expert$site)

for (i in 1:24) {
visit1 <- filter(birds_expert, site == sites[i] & visit == 1)
visit2 <- filter(birds_expert, site == sites[i] & visit == 2)

diff_a <- setdiff(visit1$common_name, visit2$common_name)
diff_b <- setdiff(visit2$common_name, visit1$common_name)

out <- data.frame(site = sites[i], beta = length(diff_a) + length(diff_b))
birds_beta <- rbind(birds_beta, out)
}

beta_diff <- data.frame(mean = mean(birds_beta$beta),
                        min = min(birds_beta$beta),
                        max = max(birds_beta$beta),
                        stdev = sd(birds_beta$beta))

differences <- rbind(alpha_diff, beta_diff) %>%
  add_column(type = c("alpha", "beta"), .before = "mean")

print(differences)

```

Although the total difference between the visits was small, the beta diversity was greater! There was always differences in the diversity between visits (the minimum is 7) and the mean beta diversity was 14. 

This is part of why we do repeat visits to site- to capture the diversity. Visits may differ by time of day (e.g. 6am or 7am) or weather. 

Interestingly, this is the same as last year - where the mean difference between observers was also 4, and the mean beta diversity was also 14. 

## Species richness

From here on we will use the total richness detected (i.e. total number of unique species across both visits).

Let's summarise the species richness by sites. We will just do this for birds 'in' the site (i.e. within 50m, since then we can compare these data with the vegetation). Let's colour these by floristic community. 


```{r}
# filter to birds within 50m
birds_insite <- birds_expert %>% 
  filter(present_in>0) %>%
  group_by(site) %>%
  summarise(richness = length(unique(common_name)))

# format for us in the package 'vegan'
birds_vegan <- birds_expert %>%
  select(c(4,6,8)) %>%
  pivot_wider(names_from = common_name, values_from = count_in, values_fn = sum, values_fill = 0) %>%
  column_to_rownames(var = "site")

# create table of sites and communities
communities <- data.frame(site = c(1, 2, 4,
                                   3, 5, 7,
                                   8, 9, 10, 12,
                                   6, 11, 13, 14, 16, 18,
                                   15, 17, 19, 20,
                                   21, 22, 23, 24),
                          community = c(rep("CAS", 3),
                                        rep("CEF", 3),
                                        rep("P", 4),
                                        rep("LSF", 6),
                                        rep("RF", 4),
                                        rep("USF", 4))) %>%
  mutate(community = as_factor(community))

# assign communities to bird data
birds_insite2 <- left_join(communities, birds_insite) %>%
  mutate(community = as_factor(community)) %>%
  arrange(community)
# set the plots in the right order for plotting
birds_insite2$site <- factor(birds_insite2$site, levels = birds_insite2$site)

# plot by community
ggplot(birds_insite2) +
  geom_boxplot(aes(community, richness, fill = community))+
  scale_fill_viridis_d()+
  theme_minimal() +
  theme(axis.text.x = element_blank())

# summary statistics
birds_summary <- birds_insite2 %>% group_by(community) %>%
  summarise(mean_richness = mean(richness),
            stdev = sd(richness))

print(birds_summary)

# test if there is a difference in richness by community
m1 <- aov(richness ~ community,
         data = birds_insite2)
summary (m1)

TukeyHSD(m1)

plot(TukeyHSD(m1), las = 2, cex.axis=0.75)
```

Interesting! The moist gully rainforest had the highest species richness, with an average of 16 species. The paddock (unsuprisingly) had the lowest. 

There was a significant difference in richness between the community types. Based on the post-hoc Tukey's test (which tests for pairwise difference) we found that the differences were between rainforest and casuarina, rainforest and paddock, and lower slope forest and paddock. The Tukey test plot shows us the direction and strength of the pairwise differences; e.g. in the 'RF-P' pairing, rainforest has many more species than paddock, and since this crosses the zero, this is a significant difference. 

**Please note**, you do not have to use the statistical analyses in your report unless you feel confident doing so. If not, stick to the summary data and interpretation of the tables and figures. 

## Beta diversity

But as we know, richness doesn't tell us everything. What about beta diversity between the sites and communities?

```{r}
# select all birds in sites
species <- birds_expert %>%
  filter(present_in==1) %>%
  # add community
  left_join(communities)

# for loop of birds only found in that site
sites_beta <- data.frame()
sites <- unique(birds_expert$site)

for (i in 1:24) {
site_test <- species %>% filter(site == sites[i])
site_other <- species %>% filter(!site == sites[i])

diff_site <- setdiff(site_test$common_name, site_other$common_name)

if(length(diff_site) == 0){ 
  next 
  }
out <- data.frame(site = sites[i], unique = diff_site)
sites_beta <- rbind(sites_beta, out)
}

print(sites_beta)

```

Cool. Half of the sites had at least one unique bird species found no where else. The sites with two unique birds were site 9 (the paddock, with magpie lark and nankeen kestrel) and site 14 (with eastern rosella and satin bowerbird). Satin bowerbird was recorded at other sites during the bird survey but outside the 50m area. I suspect the satin bowerbird was found inside site 14 because it was foraging on fruiting native cherry *Exocarpos cupressiformis*.

Let's now look at the unique diversity across the floristic communities. 

```{r}
# for loop of birds only found in that community
comm_beta <- data.frame()
comm <- as.character(unique(communities$community))

for (i in 1:6) {
comm_test <- species %>% filter(community == comm[i])
comm_other <- species %>% filter(!community == comm[i])

diff_comm <- setdiff(comm_test$common_name, comm_other$common_name)

if(length(diff_comm) == 0){ 
  next 
  }
out <- data.frame(community = comm[i], unique = diff_comm)
comm_beta <- rbind(comm_beta, out)
}

# summarise unique birds
comm_beta_sum <- comm_beta %>% 
  mutate(community = as_factor(community)) %>%
  group_by(community) %>% 
  summarise(unique_species = length(unique))

# plot unique birds by community
ggplot(comm_beta_sum)+
  geom_point(aes(unique_species, community, colour = community), size = 5) +
  scale_colour_viridis_d()+
  xlim(0,6)+
  theme_minimal()

# print the unique birds
print(comm_beta)

```

Very cool! There were a total of 20 bird species only found in one community type. The lower slope forest and the paddock had the greatest number of unique species (n=5). 

We only detected Mistletoebird in the lower slope eucalypt forest. As the name suggests, Mistletoebirds specalise on mistletoe fruit. Mistletoes are hemiparasitic plants that grow on the branches of other plants. There are approximately 100 species of mistletoe in Australia and these often have only one or two host trees. Mistletoes are an important element of eucalypt forests and woodlands because they provide fruit (for frugivorous birds) and, more importantly, they provide pockets of cover that are good for nesting in. Of the Australian birds that construct nests in trees, roughly two-thirds of them will preferentially nest in mistletoe. Mistletoe can have a bad reputation in popular culture because of the misconception that they kill trees. In some cases, mistletoe loads become high and this places additional water stress on the host tree; but the underlying cause of this issue is habitat fragmentation. It is possible that we detected Mistletoebird in site 13 because of the presence of Mistletoe. Interestingly, this happened last year too - where mistletoebird was only detected in site 13. 

Despite the apparent 'lack' of 'habitat' in the paddocks, there are a number of birds that are found in the paddock and nowhere else. Different birds have different requirements, some of which include open grassland areas and some are resilient to degraded landscapes. For example, the Australasian Pipit is a grassland insectivore that is mostly usually on the ground in open pasture or native grassland. It nests on the ground in tussocks. 

We only found Topknot piegon in the rainforest, which is unsurprising since they are a rainforest species (as well as wet sclerophyll) especially found along creeks and rivers. They are largely frugivores (eating fruit and berries, but also seeds), and can eat (and disperse) hard to digest fruits. They are an important fruit disperser for rainforest plants. 

These are just a couple of examples of how unique species relate to the vegetation and landscape. You could look into more of the unique species' ecology for your report.

## Diversity indices

We can also calculate indices for diversity aside from richness. For instance, Simpsons's diversity index takes into account abundance as well as richness. Why do we care about abundance? Some species are not evenly distributed in the landscape, even if they are found in more than one place. This metric helps us take that into account.

```{r}
# calculate simpson's diversity index for each site
communities <- communities %>%
  mutate(site = as.character(site))

simpson <- as.data.frame(diversity(birds_vegan, index = "simpson")) %>%
  rownames_to_column() %>%
  rename(site = 1,
         simpsons = 2) %>%
  left_join(communities)

simpson2 <- diversity(birds_vegan, index = "simpson")

# plot simpsons by community
ggplot(simpson) +
  geom_boxplot(aes(community, simpsons, fill = community))+
  scale_fill_viridis_d()+
  theme_minimal() +
  theme(axis.text.x = element_blank())+
  ylim(c(0.6, 1))

# test difference
simpson_aov <- aov(simpson2 ~ community, data = communities)
anova(simpson_aov)

```

Note the y-axis is limited to a minimum of 0.6 to make this easier to read. 

The moist gully rainforest had the highest average Simpson's diversity index, which means it is more biodiverse in terms of both species richness and abundance. There was no significant in Simpson's diversity index among the communities.  

## Bird communities

**(This section is interesting but you don't have to include it in your report)**

So far we have looked at the bird richness by the floristic communities. But what about the bird communities? That is, which species tend to be together due to shared habitat features they prefer (or avoid). 

We'll use a constrained ordination to determine bird communities. An ordination examines communities across space and a contrained ordination examines communities across specific environmental variables (in our case, biometric data). The type we are doing is called 'Canonical Correspondence Analysis' (CCA). 

For more information on how this works you can use the same tutorial I did to produce this: https://rpubs.com/an-bui/vegan-cheat-sheet

```{r}
# exclude paddocks because way out on their own
birds_vegan2 <- birds_vegan %>% 
  rownames_to_column() %>%
  filter(!rowname %in% c("P1","P2","P3")) %>%
  column_to_rownames()

# CCA
birdCCA <- cca(birds_vegan2 ~ Forb + Grass + Litter + Midstory + Overstory + 
                 Cover + Trees + Hollows + CWD, data = biometric_data)

# extracting the bird community data
CCA_bird <- as.data.frame(birdCCA$CCA$v) %>%
  rownames_to_column()%>%
  rename(bird = rowname)

# plot
ggplot() +
  geom_point(data = CCA_bird, aes(x = CCA1, y = CCA2), color = "darkblue")+
  geom_text_repel(data = CCA_bird, aes(x = CCA1, y = CCA2, label = bird),
                  nudge_y = -0.05) +
  theme_minimal()

```

Awesome! Let's read this figure. You can essentially ignore the axes here (arbitrary numbers in ordination space). What we are interested in is the grouping of birds. What this figure shows us is how birds cluster in the landscape according to the vegetation features (at least the ones we measured). And it looks pretty good! Not all the birds are labelled because the ones in the middle are so overlapping (read: likely to co-occur) that they didn't get shown. 

There are plenty of groupings I would expect to see here, for example Magpie, Willie Wagtail, Jacky Winter, Pipit, Nankeen Kestrel and Magpie-lark are all grassland birds, which were detected together in the paddock/grassland sites. 

Likewise, Topknot Pigeon, Eastern Yellow Robin, Rose Robin and Superb Lyrebird all prefer rainforest. The Grey Butcherbird is an oddity here, as they aren't a rainforest specalist. 

It is very interesting that the two fairywren species are similarly distributed in ordination space - this indicates we found them in similar vegetation. 

## Student estimates of bird richness

**(This section is interesting but you don't have to include it in your report)**

And out of interest, how well did you do at estimating the bird richness at each site? Let's compare average expert and student counts. These are total (i.e. all within 50m and >50m). 

```{r}
# read in data
birds_student <- read_excel("data/ENVS2018 kioloa bird data 2024.xlsx", sheet = 2) %>%
  clean_names() %>%
  mutate(estimate = as.numeric(estimate)) %>%
  select(3:4) %>%
  rename(student = 2) %>%
  pivot_longer(2, names_to = "observer", values_to = "birds")

# combine expert and student
birds_compare <- birds_richness %>% 
  select(c(1,5)) %>% 
  rename(expert = 2) %>%
  pivot_longer(2, names_to = "observer", values_to = "birds") %>%
  rbind(birds_student)
  
# plot expert and student
ggplot(birds_compare)+
  geom_boxplot(aes(site, birds, fill = observer), alpha = .8,
               position=position_dodge()) +
  theme_minimal()+
  theme(axis.text.x = element_text(angle=45))
  

```

In general, you underestimated the total number of species- which is to be expected! It takes a lot of practice and patience to learn bird calls. I hope you enjoyed coming out on the bird surveys with us and I appreciated your enthusiasm and quick learning. 

## Acoustic data

**(This section is interesting but you don't have to include it in your report)**

You also surveyed the sites using acoustic meters. We collected data for two hours around dusk and dawn for one night. One of the benefits of acoustic meters is they can help sample diversity that otherwise would be missed in the time constraints of a five-minute count. The other benefit was to teach you how to use this kit and read sonograms! Let's see what we detected. First we'll look at the effect of time of day.

```{r}
# read in 
acoustic <- read_excel("data/ENVS2018 kioloa acoustic data 2024.xlsx", sheet = 1) %>%
  clean_names() %>%
  mutate(site = as.character(site))

acoustic_rich <- acoustic %>% group_by(site,time) %>%
  summarise(richness = length(unique(common_name))) %>%
  left_join(communities) %>%
  mutate(type = "acoustic")

# plot by site and dusk/dawn
ggplot(acoustic_rich)+
  geom_boxplot(aes(time, richness, fill = community))+
  facet_wrap(~community)+
  scale_fill_viridis_d()+
  theme_minimal()

# test difference dusk and dawn
tod <- aov(richness ~ time, data = acoustic_rich)
anova(tod)

```

Interestingly, some sites recorded higher richness at dusk than dawn. Note, this also includes frog diversity. 

There was no difference in the richness recorded between dusk and dawn. 

How did the richness detected on the acoustic meters compare with the bird surveys?

```{r}
# combine acoustic and expert bird counts
acoustic_compare <- select(birds_richness, c(1,3)) %>%
  mutate(site = as.character(site)) %>%
  rename(richness = 2) %>%
  mutate(type = "expert") %>%
  left_join(communities) %>%
  full_join(select(acoustic_rich, !2))

# plot comparison
ggplot(acoustic_compare) +
  geom_boxplot(aes(type, richness, fill = community))+
  scale_fill_viridis_d()+
  facet_wrap(~community)+
  theme_minimal()

# test difference
aco <- aov(richness ~ type, data = acoustic_compare)
anova(aco)

```

Wow! We detected significantly more bird species on the acoustic meters than in the 5-minute expert counts (only considering the species within the 50m radius). 

A caveat here is that we simply used 'loudness' of the acoustic data as a proxy for distance, so the two aren't directly comparable. For example, the Masked Lapwing heard on the acoustic meter might have been further away than 50m because their call carries. We could have tested this by playing a known volume tone at 50m from the meters to figure out how 'loud' something needs to be to be in the site.

You can imagine the addition of time to a bird survey as a curve of diminishing returns- you add more time, you get more species, but at a decaying rate of increase over time. So we could achieve higher richness counts by doing say, a 15-minute or even 1-hour bird survey (imagine that!), but that would not be a good use of people-time. This is where acoustic meters can come in handy. 

Another important use of acoustic meters is to detect rare things that don't call often. If you were tasked with finding a Night Parrot, you would have to spend 1000s of hours in the remote spinifex desert to potentially hear a short call; which is why agencies like Parks and NGOs such as AWC instead deploy acoustic meters and then use AI to scan through in search of the target species. 

How about the beta diversity between the acoustic and expert counts?

```{r}
# for loop of birds only found in that community
aco_beta <- data.frame()
species2 <- species %>% filter(present_in==1)

for (i in 1:24) {
uni_exp <- species2 %>% filter(site == sites[i])
uni_aco <- acoustic %>% filter(site == sites[i])

diff1 <- setdiff(uni_exp$common_name, uni_aco$common_name) %>%
  as.data.frame() %>%
  rename(unique = 1) %>%
  mutate(source = "expert",
         site = sites[i])
diff2 <- setdiff(uni_aco$common_name, uni_exp$common_name)%>%
  as.data.frame() %>%
  rename(unique = 1) %>%
  mutate(source = "acoustic",
         site = sites[i])

aco_beta <- rbind(aco_beta, diff1, diff2)
}

aco_beta2 <- aco_beta %>%
  mutate(site = as.character(site)) %>%
  group_by(site, source) %>%
  summarise(beta = length(unique(unique))) %>%
  left_join(communities)

# plot difference
ggplot(aco_beta2)+
  geom_boxplot(aes(source, beta, fill=community))+
  scale_fill_viridis_d()+
  facet_wrap(~community)+
  theme_minimal()

# test difference
aco_b <- aov(beta ~ source, data = aco_beta2)
anova(aco_b)

```

While we found unique species between both methods, there were significantly more unique species found on the acoustic meters that we didn't detect during the expert counts (if we only consider expert counts within the site- this is important because if we include birds outside of the sites we actually detect more birds with the expert surveys. The people can hear further than the units). 

Aside from the study sites, were there any unique species found only on the expert counts or acoustic meters across the whole campus?

```{r}
b_exp <- setdiff(species2$common_name, acoustic$common_name) %>%
  as.data.frame() %>%
  rename(unique = 1) %>%
  mutate(source = "expert")

b_aco <- setdiff(acoustic$common_name, species2$common_name)%>%
  as.data.frame() %>%
  rename(unique = 1) %>%
  mutate(source = "acoustic")

aco_beta3 <- rbind(b_exp, b_aco)

print(aco_beta3)
  
```

Cool! Some of these species were detected during the bird surveys but not within the site itself (i.e. heard at a distance instead of within 50m). For example, we heard Red Wattlebird and Brown Cuckoo-dove during the expert counts but not in the sites. This may be an indication that the acoustic meters are hearing "too far". An excellent example of this is the sea-eagle, which nest just outside site 11. We heard them on the expert counts but did not count them within the sites. 

What about out of the sites? Let's sum up the total richness of birds recorded across the whole campus from both methods

```{r}
# all bird species 
species_expert <- data.frame(species = unique(birds_expert$common_name),
                             type = "expert")
species_acoustic <- data.frame(species = unique(acoustic$common_name),
                             type = "acoustic")

species_all <- rbind(species_expert, species_acoustic) %>%
  mutate(type = ifelse(duplicated2(species, bothWays=TRUE)==TRUE,"both",type)) %>%
  #remove the frog
  filter(!species == "Common Eastern Froglet") %>%
  distinct()

# make gt table
species_all %>% 
  group_by(type) %>% 
  gt() %>%
  tab_options(data_row.padding = px(1)) %>%
  # change font size
  tab_style(style = cell_text(size = px(12)), locations = cells_column_labels()) %>%
  tab_style(style = cell_text(size = px(10)), locations = cells_body())
  
species_all2 <- species_all %>%
  group_by(type) %>%
  summarise(count = length(species))

# plot
ggplot(species_all2,aes(1,count,fill=type))+
  geom_bar(stat="identity")+
  scale_fill_viridis_d()+
  theme_minimal()+
    theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

```

We detected a total of 84 bird species (up from 76 in 2023). Of these, 54 were detected on both the acoustic meters and in the bird surveys, where 18 were only detected on the expert counts and 12 in the acoustics. 

Species that were uniquely recorded on the acoustic meters (and not in the total expert counts including out of sites) were: yellow-rumped thornbill, white-eared honeyeater, bassian thrush and swamphen. These 5 species were added to the total site richness thanks to the acoustic meter! Note: brush cuckoo and pacific koel were not verified by expert ears and instead were included by AI - I retained these due to likelihood of occurrence, but I'm still not 100% if these are "real" records. 

The acoustic meter was our only record of *Litoria quiritatus*, the screaming tree frog - which was only recently split from the bleating tree frog based on citizen science data submitted to the Australian Museum's Frog ID app. 

Excitingly, we got our first ever record of Green Catbird during the expert counts! Green Catbird is a rainforest specialist related to bowerbirds, which have a distinct cat-like "yowling" call. Like the bowerbirds, they are frugivores and important seed dispersers. 

# Other wildlife results

**(This section is interesting but you don't have to include it in your report)**

You don't have to report on this section, but we wanted to show you some interesting findings. 

## Cameras

Let's have a look at how our camera results compared with previous years. 

```{r}
# groups
mammals <- c("Eastern Grey Kangaroo", "Swamp Wallaby", "Red-necked Wallaby", 
             "Short-beaked Echidna", "Common Wombat",
             "Brush-tailed Possum", "Krefft's Glider", 
             "Black Rat", "Bush Rat", "Rat sp", "Swamp Rat", "House Mouse",
             "Long-nosed Bandicoot", "White-footed Dunnart", "Brown Antechinus",
             "Cat", "Red Fox", "Pig" , "Sheep", "Cow")

# read in data
camera <- read_excel("data/ENVS2018 kioloa data 2017-2024.xlsx", sheet = 2) %>%
  clean_names() %>%
  distinct() %>%
  #remove skinks
  filter(!common_name == "Delicate Sun-skink") %>%
  mutate(taxa = ifelse(common_name %in% mammals, "mammal", "bird")) %>%
  filter(taxa == "mammal") %>%
  mutate(common_name = factor(common_name, levels = mammals))

# plot mammals
ggplot(camera)+
  geom_point(aes(year, common_name, colour = common_name), size = 4)+
  scale_colour_viridis_d()+
  theme_minimal()+
  theme(legend.position= "none")+
  theme(axis.title.y=element_blank())+
  scale_y_discrete(limits=rev)+
  scale_x_continuous(n.breaks = 7)+
  coord_fixed(ratio = 0.6)

```

Here we can see which species were seen in which years. The occurrence of these species might relate to environmental conditions (e.g. drought). Note: the cameras were not placed in rainforest and coastal locations in 2017-2020, only upper and lower slope eucalypt forest and paddock. Also note there was a bushfire in the 2019/2020 Summer. No data was collected in 2021 (covid-19). 

## Herpetofauna

Let's do the same thing with the reptiles and frogs (from tin/tile and opportunistic records).

```{r}
# read in data
herpsp <- c("Eastern Small-eyed Snake", "Red-bellied Black Snake", "Diamond Python",
            "Jacky Dragon", "Blue-tongued Lizard", "Delicate Sun-skink", 
            "Weasel Skink", "Skink sp", "Lace Monitor",
            "Whistling Tree Frog", "Jervis Bay Tree Frog",
            "Peron's Tree Frog", "Blue Mountains Tree Frog", 
            "Eastern Dwarf Tree Frog", "Southern Green Stream Frog",
            "Screaming Tree Frog", "Striped Marsh Frog", "Tyler's Toadlet", 
            "Common Eastern Froglet", "Haswell's Froglet")

herps <- read_excel("data/ENVS2018 kioloa data 2017-2024.xlsx", sheet = 1) %>%
  clean_names() %>%
  select(2:3) %>%
  # remove mice
  filter(!common_name == "House Mouse") %>%
  distinct() %>%
  mutate(common_name = factor(common_name, levels = herpsp))

# plot
ggplot(herps)+ 
  geom_point(aes(year, common_name, colour = common_name), size = 4)+
  scale_colour_viridis_d()+
  theme_minimal()+
  theme(legend.position= "none")+
  theme(axis.title.y=element_blank())+
  scale_y_discrete(limits=rev)+
  scale_x_continuous(n.breaks = 7)+
  coord_fixed(ratio = 0.6)

```

This is really an indication of the survey effort for herpetofauna! In 2018-2020 we only had tin/tile (AKA substrate) checks (which were done by students instead of expert observers) whereas in 2022 we started recording incidental sightings and doing the pond frog surveys. 

We recorded two new species for ENVS2018 in 2024: lace monitor (*Varanus varius*) and the Blue Mountains Tree Frog (*Litoria citropa*).

I wonder how these results compare with the frog diversity in the pre-readings?

## Birds over time

Lastly, let's look at how the bird diversity has changed over the years (remember you do not need to include this in your report). 

Note: the casuarina forest and rainforest sites were added in 2022. 

```{r}
# read in name changes
site_names <- read_excel("data/KCC new and old gps.xlsx") %>%
  select(1:2) %>%
  rename(old_name = 1,
         new_name = 2)

# vector of old site names
old <- unique(site_names$old_name)

# vector of new site names
new <- 1:24

# read in bird data since 2017
birds_years <- read_excel("data/ENVS2018 kioloa data 2017-2024.xlsx", sheet = 4) %>%
  clean_names() %>%
  rename(old_name = site) %>%
  left_join(site_names) %>%
  # rename sites to be consistent with new names
  mutate(site = ifelse(old_name %in% old, new_name, old_name)) %>%
  # remove birds outside of survey points
  filter(radius == "0-50") %>%
  select(c(site, year, species)) %>%
  # filter to remove sites no longer surveyed
  filter(site %in% new)

# summarise by year and site
birds_year_sum <- birds_years %>% group_by(year, site) %>%
  summarise(count = length(unique(species))) %>%
  mutate(year = as.factor(year),
         site = as.factor(site)) %>%
  arrange(site)

# plot by year
ggplot(birds_year_sum)+
  geom_boxplot(aes(year, count, group=year, fill = year))+
  scale_fill_viridis_d()+
  theme_minimal()

```

And over the years between sites.

```{r}
# plot by site
ggplot(birds_year_sum)+
  geom_point(aes(year, count, colour = site))+
  geom_line(aes(year, count, group = site, colour = site))+
  scale_colour_viridis_d()+
  theme_minimal()
```

And lastly, some summary data

```{r}
#summary
birds_year_sum %>%
  group_by(year) %>%
  summarise(mean = mean(count),
            max = max(count),
            min = min(count),
            sd = sd(count)) %>% 
  gt() %>%
  tab_style(style = cell_text(size = px(12)), locations = cells_body())
```

I hope you enjoyed reading this data summary. Good luck with the report writing!

# Session information

```{r}
# Display version information for R, OS, and packages
sessionInfo()
```