From b36d98736713d2663f905c06ba1df1279fcc4f4e Mon Sep 17 00:00:00 2001 From: JustGag <158193589+JustGag@users.noreply.github.com> Date: Thu, 19 Sep 2024 22:10:30 -0400 Subject: [PATCH] Update main.tex Adjustments --- papers/Gagnon_Kebe_Tahiri/main.tex | 32 +++++++++++++----------------- 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/papers/Gagnon_Kebe_Tahiri/main.tex b/papers/Gagnon_Kebe_Tahiri/main.tex index 3781e769b1..1ad4a1fc9b 100644 --- a/papers/Gagnon_Kebe_Tahiri/main.tex +++ b/papers/Gagnon_Kebe_Tahiri/main.tex @@ -1,5 +1,5 @@ \begin{abstract} -Cumacea (crustaceans: Peracarida) are vital indicators of benthic health in marine ecosystems. This study investigated the influence of environmental (i.e., biological or ecosystemic), climatic (i.e., meteorological or atmospheric), and geographic (i.e., spatial or regional) attributes on their genetic variability in the Northern North Atlantic, focusing on Icelandic waters. We analyzed mitochondrial sequences of the 16S rRNA gene from 62 Cumacea specimens. Using the \textit{aPhyloGeo} software, we correlated these sequences with relevant variables such as latitude (decimal degree) at the start of sampling, wind speed (m/s) at the start of sampling, O\textsubscript{2} concentration (mg/L), and depth (m) at the start of sampling. +Cumacea (crustaceans: Peracarida) are vital indicators of benthic health in marine ecosystems. This study investigated the influence of environmental (i.e., biological or ecosystemic), climatic (i.e., meteorological or atmospheric), and geographic (i.e., spatial or regional) attributes on their genetic variability in the Northern North Atlantic, focusing on Icelandic waters. We analyzed mitochondrial sequences of the 16S rRNA gene from 62 Cumacea specimens. Using the \textit{aPhyloGeo} software, we compared these sequences with relevant variables such as latitude (decimal degree) at the start of sampling, wind speed (m/s) at the start of sampling, O\textsubscript{2} concentration (mg/L), and depth (m) at the start of sampling. Our analyses revealed variability in most spatial and ecosystem attributes, reflecting the diversity of ecological requirements and benthic habitats. The most common Cumacea families, Diastylidae and Leuconidae, suggest adaptations to various marine environments. Phylogeographic analysis showed a divergence between specific genetic sequences and with wind speed (m/s) at the start of sampling and O\tsubscript{2} concentration (mg/L). This indicates potential local adaptation to these fluctuating conditions. @@ -13,7 +13,7 @@ \section{Introduction}\label{introduction} Cumacea, a crustacean taxon within Peracarida, provide major indicators of marine ecosystem health due to their sensitivity to environmental fluctuations \citep{stransky_diversity_2010} and their contribution to benthic food webs \citep{rehm2009cumacea}. Despite their ecological importance, deep-sea benthic invertebrates’ evolutionary history and dynamics remain uncharted, notably in the North Atlantic \citep{jennings_phylogeographic_2014}. Interpreting these deep-sea organisms' genetic distribution and demography is central for predicting their response to climate change \citep{jennings_phylogeographic_2014}. -Considering the current climate emergency, this study aims to analyze the influence of ecological (climatic and environmental) and geographic attributes on the genetics of Cumacea in the Northern North Atlantic. Specifically, we will examine whether there is an adaptation between the genetic structure of the 16S rRNA mitochondrial gene region of Cumacea species sampled and their habitat attributes in the Northern North Atlantic. If so, we will determine which attributes diverge most with a specific genetic sequence (i.e., a window) and identify the potential associated protein. Our approach includes confirming different {phylogeographic models}\footnote{Phylogeographic models are computational tools that analyze relationships between the genetic structures of populations and their geographic distributions. In our case, by incorporating regional, biological, and atmospheric characteristics, we can interpret their impact on the genetic distribution of Cumacea species,} and updating a Python package (currently in beta), \textit{aPhyloGeo}, to simplify these analyses. +Considering the current climate emergency, this study aims to analyze the influence of ecological (climatic and environmental) and geographic attributes on the genetics of Cumacea in the Northern North Atlantic. Specifically, we will examine whether there is an adaptation between the genetic structure of the 16S rRNA mitochondrial gene region of Cumacea species sampled and their habitat attributes in the Northern North Atlantic. If so, we will determine which attributes diverge most with a specific genetic sequence (i.e., a window) and further explore the potential associated protein using bioinformatics tools to interpret its biological significance. Our approach includes confirming different {phylogeographic models}\footnote{Phylogeographic models are computational tools that analyze relationships between the genetic structures of populations and their geographic distributions. In our case, by incorporating regional, biological, and atmospheric characteristics, we can interpret their impact on the genetic distribution of Cumacea species,} and updating a Python package (currently in beta), \textit{aPhyloGeo}, to simplify these analyses. This paper is organized as follows: Section \autoref{related-works} reviews pertinent studies on the biodiversity and biogeography of deep-sea benthic invertebrates; Section \autoref{contribution} summarizes the aims and contributions of this study, highlighting aspects relating to the conservation and adaptation of marine invertebrates to climate change; Section \autoref{materials-methods} describes the data collection, sampling procedures, and genetic analyses; Section \autoref{metrics} describes the metrics used to evaluate the phylogeographic models; Section \autoref{results} presents the results; finally, Section \autoref{conclusion} discusses their implications for future research and conservation efforts. @@ -23,7 +23,7 @@ \section{Related Works}\label{related-works} However, the relationship between genetics and the environment is complex, involving gene-environment interactions and natural selection factors, which makes it difficult to identify clear causal relationships \citep{balkenhol_identifying_2009}. In addition, the distinction between the direct and indirect effects of the environment on genetics poses other challenges \citep{manel_perspectives_2010, balkenhol_landscape_2019}. The restrictions of available methods for measuring genetic and ecological attributes, combined with logistical constraints, often limit the scope of such studies \citep{manel_perspectives_2010, shafer_widespread_2013}. This complexity may explain why the environment and genetics of Cumacea have been less studied, even though they are essential for interpreting how these deep-sea invertebrates adapt to fluctuating environmental conditions. \section{Our Contribution}\label{contribution} -Our study focuses on the genetic fluctuation of the 16S rRNA mitochondrial gene in cumacean populations in response to variations in their habitat, which has been little explored in previous studies \citep{grassle1992deep, rex2000latitudinal}. We aim to refine the natural selection hypothesis by identifying the specific genetic region of our gene with the highest mutation rate and the potentially associated protein. By linking mitochondrial sequences of the 16S rRNA gene to habitat parameters using phylogeographic analyses, we can better interpret the selection effect on this genetic sequence of cumaceans that confers survival advantages in the extreme environments of the northern North Atlantic. This approach enables us to detect adaptive mutations and their functional outcomes using bioinformatics tools, to gain a better insight into how natural selection proceeds at the molecular level in these challenging habitats. This represents a major advance over previous research, which has often struggled to integrate genetic and biological data in the context of deep-sea invertebrates \citep{etter1990population, vrijenhoek2009cryptic}. +Our study focuses on the genetic fluctuation of the 16S rRNA mitochondrial gene in cumacean populations in response to variations in their habitat, which has been little explored in previous studies \citep{grassle1992deep, rex2000latitudinal}. We aim to refine the natural selection hypothesis by identifying the specific genetic region with the highest mutation rate and the potentially associated protein using bioinformatics tools, such as protein structure modeling and functional annotation databases, to expose the potential functions that this protein might have on the adaptation of cumaceans to habitat fluctuations. By linking mitochondrial sequences of the 16S rRNA gene to habitat parameters using phylogeographic analyses, we can better interpret the selection effect on this genetic sequence of cumaceans that confers survival advantages in the extreme environments of the northern North Atlantic. This approach enables us to detect adaptive mutations and their functional outcomes using bioinformatics tools, to gain a better insight into how natural selection proceeds at the molecular level in these challenging habitats. This represents a major advance over previous research, which has often struggled to integrate genetic and biological data in the context of deep-sea invertebrates \citep{etter1990population, vrijenhoek2009cryptic}. Using robust analytical methods such as dissimilarity calculations and phylogenetic reconstructions, we provide new insights into the genetic adaptation of marine Cumacea. Unlike previous studies, which have encountered difficulties establishing a link between genetics and the environment \citep{manel2003landscape, balkenhol2009statistical}, our results provide a better understanding of evolutionary dynamics in aquatic ecosystems. @@ -35,7 +35,7 @@ \section{Materials and Methods}\label{materials-methods} \begin{figure}[htbp] \centering \includegraphics[width=0.7\textwidth]{diagram.drawio.png} - \caption{Flow chart summarizing the Materials and Methods section workflow. Six different colors highlight the blocks. The first block (blue) represents our database. The second block (red) is data pre-processing, where we remove attributes. The third and fourth blocks (orange) implement the \textit{aPhyloGeo} software and its parameters for our phylogeographic analyses (see in the second step of the section \autoref{aPhyloGeo-software}). The fifth block (grey) calculates phylogenetic tree comparison distances. The sixth block (yellow) compares the distances between the phylogenetic trees produced. The seventh block (purple) identifies regions with high mutation rates based on the results of the tree comparisons. *See YAML files on \href{https://github.com/tahiri-lab/aPhyloGeo}{GitHub} for more details on these parameters. \label{fig:fig1}} + \caption{Flow chart summarizing the Materials and Methods section workflow. Six different colors highlight the blocks. The first block (blue) represents our database. The second block (red) is data pre-processing, where we remove attributes. The third and fourth blocks (orange) implement the \textit{aPhyloGeo} software and its parameters for our phylogeographic analyses (see in the second step of the section \autoref{aPhyloGeo-software}). The fifth block (grey) calculates phylogenetic tree comparison distances. The sixth block (yellow) compares the distances between the phylogenetic trees produced. The seventh block (purple) identifies regions with high mutation rates based on the results of the tree comparisons. *See YAML file on \href{https://github.com/tahiri-lab/aPhyloGeo}{GitHub} for more details on these parameters. \label{fig:fig1}} \end{figure} \subsection{Description of the data} @@ -215,7 +215,7 @@ \subsection{Metrics}\label{metrics} \subsubsection{Robinson-Foulds distance}\label{RF} The Robinson-Foulds (RF) distance calculates the distance between phylogenetic trees built in each sliding window ($T_1$) and the attributes (see the list in the first step of the section \autoref{aPhyloGeo-software}) trees ($T_2$) \citep{koshkarov_phylogeography_2022}. This measurement is used to evaluate the topological differences between the two sets of trees (see Equation \eqref{eq:rf} and \autoref{lst:robinsonFoulds}). -For example, it evaluates the topological differences between a phylogenetic trees built within certain sliding windows defined by the user (see the second step of section \autoref{aPhyloGeo-software}) and geographic trees built with latitude data (DD) at the start of sampling. A high numerical distance between a specific window and other windows considered in the RFD analysis implies that the habitat feature impacts this particular DNA sequence and, consequently, the phylogenetic relationships of cumacean species. +For example, it evaluates the topological differences between phylogenetic trees built within certain sliding windows defined by the user (see the second step of section \autoref{aPhyloGeo-software}) and geographic trees built with latitude data (DD) at the start of sampling. A high distance between a specific window and other windows considered in the RFD analysis implies that the habitat feature impacts this particular DNA sequence and, consequently, the phylogenetic relationships of cumacean species. \begin{equation}\label{eq:rf} \text{RF}(T_1, T_2) = | \Sigma(T_1) \Delta \Sigma(T_2) | @@ -333,15 +333,13 @@ \subsubsection{Euclidean distance}\label{euclidean} \end{lstlisting} \subsubsection{Least-Squares distance}\label{LS} -The Least squares (LS) distance measures the topological dissimilarity between the two sets of trees (($T_1$ and $T_2$) by comparing the sum of squared differences between the phylogenetic distances of each pair of branch terminal leaves (i.e. the “tips” or cumacean species) in the two sets of trees (see Equation \eqref{eq:ls} and \autoref{lst:LeastSquare}). - -This metric allows us to understand how these different habitat attributes influence the topological structure of the phylogenetic trees. Thus, the LS distance reveals the impact, high or low, of a habitat attribute on changes in branch position and length. +The Least squares (LS) distance measures the topological dissimilarity between the two sets of trees (($T_1$ and $T_2$) by comparing the sum of squared differences between the phylogenetic distances of each pair of branch terminal leaves (i.e. the “tips” or cumacean species) in the two sets of trees (see Equation \eqref{eq:ls} and \autoref{lst:LeastSquare}). This metric allows us to understand how these different habitat attributes influence the topological structure of the phylogenetic trees. A high value indicates a structural discrepancy between the genetic tree and the tree built from an attribute. \begin{equation}\label{eq:ls} - d_{\text{LS}}(T_1, T_2) = \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(d_T1(i,j) - d_T2(i,j)^2 + d_{\text{LS}}(T_1, T_2) = \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(d_T1(i,j) - d_T2(i,j))^2 \end{equation} -where $d_{\text{LS}}(T_1, T_2)$ is the Least-Square distance between the two sets of trees ($T_1$ and $T_2$), and $d_T1(i,j)$ and $\d_T1(i,j)$, the distance between leaves $i$ and $j$ in $T_1$ and $T_2$, respectively. +where $d_{\text{LS}}(T_1, T_2)$ is the Least-Square distance between the two sets of trees ($T_1$ and $T_2$), and $d_T1(i,j)$ and $\d_T2(i,j)$, the distance between leaves $i$ and $j$ in $T_1$ and $T_2$, respectively. %\autoref{lst:LeastSquare}. \begin{lstlisting}[label=lst:LeastSquare, language=Python, caption=Python script for calculating the LSD using the ete3 package in the aPhyloGeo package] @@ -433,27 +431,25 @@ \section{Results}\label{results} \begin{figure}[] \centering \includegraphics[width=0.7\textwidth]{figure5.png} - \caption{Analysis of fluctuations in four distance metrics using multiple sequence alignment (MSA): a) Least-Squares distance, b) Robinson-Foulds distance, c) normalized Robinson-Foulds distance, and d) Euclidean distance. These distance variations are studied to establish their correlation with the variation in wind speed (m/s) at the start of sampling. \label{fig:fig6}} + \caption{Analysis of fluctuations in four distance metrics using multiple sequence alignment (MSA): a) Least-Squares distance, b) Robinson-Foulds distance, c) normalized Robinson-Foulds distance, and d) Euclidean distance. These distance variations are studied to establish their dissimilarity with the variation in wind speed (m/s) at the start of sampling. \label{fig:fig6}} \end{figure} \begin{figure}[] \centering \includegraphics[width=0.7\textwidth]{figure6.png} - \caption{Analysis of fluctuations in four distance metrics using multiple sequence alignment (MSA): a) Least-Squares distance, b) Robinson-Foulds distance, c) normalized Robinson-Foulds distance, and d) Euclidean distance. These distance variations are studied to establish their correlation with variation of O\textsubscript{2} concentration (mg/L) at the sampling sites. \label{fig:fig7}} + \caption{Analysis of fluctuations in four distance metrics using multiple sequence alignment (MSA): a) Least-Squares distance, b) Robinson-Foulds distance, c) normalized Robinson-Foulds distance, and d) Euclidean distance. These variations in distance are studied to establish their dissimilarity with the variation in Otextsubscript{2} concentration (mg/L) at the sampling sites. \label{fig:fig7}} \end{figure} -The correlation between the genetic sequences and two attributes, one climatic (wind speed (m/s) at the start of sampling) and the other environmental (O\textsubscript{2} concentration (mg/L)) is presented in Figure \ref{fig:fig6} and Figure \ref{fig:fig7}. All the attributes given in the first step of the \autoref{aPhyloGeo-software} section were analyzed and their script and figure will be soon available in the $img$ and $script$ python file on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub}. However, only these two attributes showed the most interesting mutation rate. Using the four metrics mentioned in section \autoref{metrics}, we noticed that the Euclidean distance is particularly sensitive to our data, manifesting considerable sequence variation at the position in MSA 520-529 amino acids (aa) (Euclidean distance: 0.8 < x < 0.9; Figure \ref{fig:fig6}d) and 1190-199 aa (Euclidean distance: 1.2 < x < 1.3; Figure \ref{fig:fig7}d). This implies that these genetic sites are subject to selection pressures or evolutionary changes, due to biological (O\textsubscript{2} concentration (mg/L)) and meteorological conditions (wind speed (m/s) at the start of sampling). These results align with our study's aim to identify the genetic region of cumaceans with the highest mutation rate linked to a specific habitat attribute. +The divergence between the genetic sequences and two attributes, one climatic (wind speed (m/s) at the start of sampling) and the other environmental (O\textsubscript{2} concentration (mg/L)) is presented in Figure \ref{fig:fig6} and Figure \ref{fig:fig7}. All the attributes given in the first step of the \autoref{aPhyloGeo-software} section were analyzed and their script and figure will be soon available in the $img$ and $script$ python file on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub}. However, only these two attributes showed the most interesting mutation rate. Using the four metrics mentioned in section \autoref{metrics}, we noticed that the Euclidean distance is particularly sensitive to our data, manifesting considerable sequence variation at the position in MSA 520-529 amino acids (aa) (Euclidean distance: 0.8 < x < 0.9; Figure \ref{fig:fig6}d) and 1190-199 aa (Euclidean distance: 1.2 < x < 1.3; Figure \ref{fig:fig7}d). This implies that these genetic sites are subject to selection pressures or evolutionary changes, due to biological (O\textsubscript{2} concentration (mg/L)) and meteorological conditions (wind speed (m/s) at the start of sampling). These results align with our study's aim to identify the genetic region of cumaceans with the highest mutation rate linked to a specific habitat attribute. These results provide important insight into the genetic adaptation of cumaceans to their environment. These results need to be analyzed in greater depth to certify their involvement, especially in contrast with \citep{uhlir_adding_2021}, which investigated similar topics of environmental and climatic effects on cumaceans distribution and genetics. A more in-depth analysis of the results is available on \href{https://github.com/tahiri-lab/Cumacea_aPhyloGeo}{GitHub} in the supplementary file. \section{Conclusion}\label{conclusion} -This study examines the effects of meteorological, regional, and ecosystemic attributes on the genetics of cumaceans in the waters surrounding Iceland. Our main objective is to determine whether there is a correlation between precise genetic information of the 16S rRNA mitochondrial gene region (i.e., window) of cumacean species and their habitat attributes. In particular, we aim to identify the attribute most correlated with a specific genetic sequence and the potentially associated protein. - -We meticulously curated relevant attributes from the IceAGE project data, the bold systems database, and the \citep{uhlir_adding_2021} study. Some attributes have been excluded for lack of relevance, low variance (threshold ≤ 0.1; e.g., salinity, $S^2 = 0.02146629$), abundant missing data (> 95\%), or high inter-correlation (threshold > 0.9; Lat_start_end: $r = 0.99966582$; Long_start_end: $r = 0.99999794$; Depth_start_end: $r = 0.99985791$) to guarantee the robustness of our analysis. Utilizing this refined dataset, we integrated phylogeographic studies using \textit{aPhyloGeo} software (\autoref{lst:main}), allowing a comprehensive analysis of potential correlations between the genetics of cumacean species and their habitat characteristics. In addition to data distribution representations, DNA sequence analyses have identified specific genetic windows with high mutation rates in response to atmospheric and biological attributes such as wind speed (m/s) at the start of sampling (Position in MSA: 520-529 aa; Euclidean distance: 0.8 < x < 0.9; Figure \ref{fig:fig6}d) and O\textsubscript{2} concentration (mg/L) (Position in MSA: 1190-1199 aa; Euclidean distance: 1.2 < x < 1.3; Figure \ref{fig:fig7}d). These results imply variable genetic sites that could contribute to the evolutionary acclimatization of cumaceans to their fluctuating environments. +This study examines the effects of meteorological, regional, and ecosystemic attributes on the genetics of cumaceans in the waters surrounding Iceland. Our main objective is to determine whether there is a divergence between precise genetic information of the 16S rRNA mitochondrial gene region (i.e., window) of cumacean species and their habitat attributes. In addition to data distribution representations (see Figure \ref{fig:fig2}, Figure \ref{fig:fig3}, Figure \ref{fig:fig4} and Figure \ref{fig:fig5}), DNA sequence analyses have identified specific genetic windows with high mutation rates in response to atmospheric and biological attributes such as wind speed (m/s) at the start of sampling (Position in MSA: 520-529 aa; Euclidean distance: 0.8 < x < 0.9; Figure \ref{fig:fig6}d) and O\textsubscript{2} concentration (mg/L) (Position in MSA: 1190-1199 aa; Euclidean distance: 1.2 < x < 1.3; Figure \ref{fig:fig7}d). These results imply variable genetic sites that could contribute to the evolutionary acclimatization of cumaceans to their fluctuating environments and that some cumacean species have diverged from other populations and have genetically adapted to both attributes. -The novelty in our research lies in the exhaustive correlation between habitat attributes and genetic mutability in cumaceans, particularly in identifying genetic windows associated with habitat fluctuations, which has not been widely investigated in previous studies \citep{manel2003landscape, vrijenhoek2009cryptic}. In this case, our integrated method identifies specific genetic regions sensitive to ecosystemic and atmospheric variations. Thus, by seeking to determine which of these two attributes is most closely correlated with their genetic sequence, the potential evidence of proteins related to one of these variable DNA sequences will enable us to depict the functional effects of this genetic adaptation. +The novelty in our research lies in the exhaustive divergence between habitat attributes and genetic mutability in cumaceans, particularly in identifying genetic windows associated with habitat fluctuations, which has not been widely investigated in previous studies \citep{manel2003landscape, vrijenhoek2009cryptic}. In this case, our integrated method identifies specific genetic regions sensitive to ecosystemic and atmospheric variations. Thus, by seeking to determine which of these two attributes diverges most with the DNA sequences, the eventual identification of proteins linked to one of these variable DNA sequences will make it possible to represent its functional effects in responses to habitat changes. Our future research will focus on verifying the prediction of this protein and assessing its role in the physiological adaptation of cumaceans to fluctuating conditions, add a link between genetic data and ecological function. -Interpreting how marine invertebrates genetically adapt to variations in their habitat can help us better predict their responses to climate change and advance conservation plans to protect them. Identifying the specific attributes that influence genetic variability of Cumacea can contribute to the designation and supervision of marine protected areas, assuring they include habitats crucial to the survival and acclimatization of these species. Thus, our results can inform the management of fishing and seabed mining companies by revealing ecologically vulnerable areas where these disturbances can seriously affect benthic biodiversity. +Interpreting how marine invertebrates genetically adapt to variations in their habitat can help us better predict their responses to climate change and advance conservation plans to protect them. Identifying the specific attributes that influence genetic variability of Cumacea can contribute to the designation and supervision of marine protected areas, assuring they include habitats crucial to the survival and acclimatization of these species. Thus, our results can inform the management of fishing and seabed mining companies by revealing ecologically vulnerable areas where these disturbances can seriously affect benthic biodiversity. Furthermore, our results provide essential knowledge to guide future studies on the genetic adaptation of Cumacea and other invertebrates to ecological and regional variability. Based on these findings, future research should focus on additional ecosystemic and meteorological attributes, such as nutrient accessibility, water pH, ocean currents, and the degree of human disturbance, to further improve the interpretation of the complex interactions between genetics and the environment. Broadening the scope of application to other marine species, not just marine invertebrates, and diverse geographic regions would allow us to generalize the results more effectively. With this in mind, longitudinal study models on these species could reflect long-term climatic and biological fluctuations and improve our knowledge of the dynamics of genetic acclimatization.