-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review add more covariates #32
Review add more covariates #32
Conversation
The eDNA pipeline succeeded for all
@hansvancalster is this because potential collinearity between land-use category and the physicochemical variables could mask the effects of these variables in the model? I assume we can test this in #28? Or should we not model this across all land-use categories, regardless of potential collinearity, because relationships between physicochemical variables and biodiversity may vary across land-use categories, making stratified analysis more ecologically meaningful? Updated combined dataframe: v2 Silke updated the combined dataframe with data from the new primers (inseKP) we tested, which includes
Updated metadata: v2_cleaned_13 Also, we updated the metadata to
|
- exclude Moeras
I see the role of these variables more as "controlling for their effects". There is in most cases substantial overlap of ranges of observed values across land-use types, but the mean / bulk of the distribution may differ. For instance, the "natuurgraslanden" are mainly at low pH whereas other types are at somewhat higher pH. When adding pH, we can make predictions for land-use type and depth conditional on a specific value of pH, making the comparison between land use types and depth categories more reliable in the sense of being closer to the analogue of a designed experiment where we could have excluded such not-of-interest factors by design.
It certainly should be explored whether there could be important collinearity issues, but also the output of the models can signal this through diagnostic checks.
That was indeed the line of thought I had when proposing this. But on second thought, I think this is of low priority and I prefer sticking to the current model formulation for which I wrote the rationale in my first answer in this comment. |
I will knit the Rmd file first to check if everything works as expected before I merge. |
Ok merci, laat gerust weten indien te zwaar voor op een laptop, dan kunnen we het op de HPC lopen |
fd07eb5
into
add_SWCvol_Cdensity_etc_to_model_observed_richness
This is not only the case for the |
But this is not the case for |
sum(diversiteit$observed == 0) | ||
``` | ||
|
||
Deze nulwaarnemingen moeten we terug toevoegen (observed wordt dan 0, maar Shannon en Simpson zijn dan niet gedefinieerd). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hansvancalster this is not the case for the Nematoda - 18s - asv
data from ILVO, since that is an unusual case see #32 (comment)
and in general, I realize this can differ per primerset
(primers that target more than one group vs group-specific), and sample
(for some samples total read count across everything a primer captures is zero, in which case I think we should assume the eDNA pipeline in the lab failed). I will open an issue for this and investigate per primerset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for documenting this in an issue. One quick question: could a total read count for a sample be zero because it is completely denuded of everything the primer targets? I'm thinking of pesticide misuse and other forms of pollutions. If that could be the case, we should be careful in attributing it to a failed eDNA pipeline. Are there ways to know this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I know that many people (e.g. ILVO and the LUCAS project) resequence a sample if it has less than 50K total reads across everything a primer captures, but I have often wondered whether total read count is really random in general? I will continue the discussion in https://github.com/slambrechts/INBO_eDNA_metabarcoding_BODEM/issues/238
Some things need further consideration: