Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review add more covariates #32

Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -88,28 +88,29 @@ Instead of fitting these one by one manually, we tried stepwise model selection:
form_richness <- formula(
observed ~
log(total_count)
+ Landgebruik_MBAG
+ Diepte
+ SWCvol
+ Landgebruik_MBAG:Diepte
+ SWCvol:Diepte
+ (1 | Cmon_PlotID)
+ landgebruik_mbag
+ diepte
+ swc_vol
+ landgebruik_mbag:diepte
+ swc_vol:diepte
+ (1 | cmon_plot_id)
)

variables_no_collinearity <- c(
"total_count",
"Landgebruik_MBAG",
"Diepte",
"Cmon_PlotID",
"Cdensity",
"C_N_stockbased",
"SWCvol",
"pH_KCl",
"BD",
"mv_mTAW"
"landgebruik_mbag",
"diepte",
"cmon_plot_id",
"c_density",
"cn_stockbased",
"swc_vol",
"ph_kcl",
"bd",
"mv_m_taw"
)

data_subset <- data_subset[complete.cases(data_subset[ , variables_no_collinearity]), ]
data_subset <- data_subset[
complete.cases(data_subset[ , variables_no_collinearity]), ]

cat("Fitting Poisson model")

Expand All @@ -135,7 +136,7 @@ extend the model with additional predictors and fit the full model
```{r extend-model-additional-predictors-{{group}}-{{primerset}}-{{unit}}}
form_richness_full <- update(
form_richness,
. ~ . + Cdensity + C_N_stockbased + SWCvol + pH_KCl + BD + mv_mTAW
. ~ . + c_density + cn_stockbased + swc_vol + ph_kcl + bd + mv_m_taw
)

m_richness_full <- glmmTMB(
Expand Down
73 changes: 69 additions & 4 deletions source/rmarkdown/analyses_diversity/analyses_diversity.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,14 @@ library(purrr)
library(glmmTMB)
library(marginaleffects)
library(performance)
library(MuMIn)

mbag_bodem_folder <- "G:/Gedeelde drives/PRJ_MBAG/4c_bodembiodiversiteit" # nolint
```

# Inlezen data


```{r inlezen}
diversiteit <- readr::read_csv(
file.path(
Expand All @@ -48,6 +50,14 @@ metadata <- readr::read_csv(
"MBAG_stratfile_v2_cleaned_12.csv")
) %>%
janitor::clean_names() %>%
rename(
ph_kcl = p_h_k_cl,
swc_grav = sw_cgrav,
swc_vol = sw_cvol,
cn_stockbased = c_n_stockbased,
c_density = cdensity,
n_density = ndensity
) %>%
mutate(
landgebruik = factor(
landgebruik_mbag,
Expand All @@ -58,10 +68,40 @@ metadata <- readr::read_csv(
diepte = gsub("_|/", "-", diepte) |> factor()
)

combined <- diversiteit %>%
inner_join(metadata, by = "sample")
```

In `mbag_combined_dataframe.csv` zitten de samples zonder taxa niet in (na te vragen)!

```{r echo=TRUE}
sum(diversiteit$observed == 0)
```

Deze nulwaarnemingen moeten we terug toevoegen (observed wordt dan 0, maar Shannon en Simpson zijn dan niet gedefinieerd).
Copy link
Collaborator

@slambrechts slambrechts Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hansvancalster this is not the case for the Nematoda - 18s - asv data from ILVO, since that is an unusual case see #32 (comment)

and in general, I realize this can differ per primerset (primers that target more than one group vs group-specific), and sample (for some samples total read count across everything a primer captures is zero, in which case I think we should assume the eDNA pipeline in the lab failed). I will open an issue for this and investigate per primerset

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this in an issue. One quick question: could a total read count for a sample be zero because it is completely denuded of everything the primer targets? I'm thinking of pesticide misuse and other forms of pollutions. If that could be the case, we should be careful in attributing it to a failed eDNA pipeline. Are there ways to know this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I know that many people (e.g. ILVO and the LUCAS project) resequence a sample if it has less than 50K total reads across everything a primer captures, but I have often wondered whether total read count is really random in general? I will continue the discussion in https://github.com/slambrechts/INBO_eDNA_metabarcoding_BODEM/issues/238


```{r}
groups <- diversiteit %>%
distinct(primerset, group, unit)

samples <- metadata %>%
distinct(sample)

#nrow(groups) * nrow(samples)
samples_groups <- expand_grid(
samples, groups)

combined <- samples_groups %>%
full_join(
diversiteit,
by = c("sample", "primerset", "group", "unit")) %>%
mutate(
observed = ifelse(is.na(observed), 0, observed),
total_count = ifelse(is.na(total_count), 0, total_count)
) %>%
inner_join(metadata, by = join_by(sample))
```



```{r}
glimpse(combined)
```
Expand Down Expand Up @@ -102,6 +142,21 @@ combined %>%
)
```

Aantal samples met nul OTUs:

```{r}
combined %>%
group_by(group, primerset, unit) %>%
mutate(
observed0 =
ifelse(observed == 0, "geen OTU", "minstens 1 OTU")
) %>%
count(observed0) %>%
pivot_wider(names_from = observed0, values_from = n) %>%
kable()
```


# Analyses

## Geobserveerde soortenrijkdom
Expand Down Expand Up @@ -137,8 +192,18 @@ pmap(

#clipr::write_clip(rmd)

knit_child(text = rmd, quiet = TRUE) %>%
cat()
execute_code <- function(rmd) {
# Extract R code from the rmd content
r_code <- knitr::purl(text = rmd, quiet = TRUE)
eval(parse(text = r_code), envir = .GlobalEnv)
}

if (interactive()) {
execute_code(rmd = rmd)
} else {
knit_child(text = rmd, quiet = TRUE) %>%
cat()
}
```

## Shannon index
Expand Down