Skip to content

Commit

Permalink
Update main.tex
Browse files Browse the repository at this point in the history
Metrics corrections
  • Loading branch information
JustGag authored Sep 22, 2024
1 parent febb034 commit 2523dc4
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions papers/Gagnon_Kebe_Tahiri/main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ \subsection{Metrics}\label{metrics}
\subsubsection{Robinson-Foulds distance}\label{RF}
The Robinson-Foulds (RF) distance calculates the distance between phylogenetic trees built in each sliding window ($T_1$) and the attributes trees ($T_2$) (see the list in the first step of the section \autoref{aPhyloGeo-software}) \citep{tahiri2018new, koshkarov_phylogeography_2022}. This measurement is used to evaluate the topological differences between the two sets of trees (see Equation \eqref{eq:rf} and \autoref{lst:robinsonFoulds}).

For example, it evaluates the number of division differences between phylogenetic trees built within certain user-defined sliding windows (see the second step of section \autoref{aPhyloGeo-software}) and geographic trees built with latitude data (DD) at the start of sampling. A high distance between a specific window and other windows considered in the RF distance analysis implies that the habitat feature has little to no impact on this particular DNA sequence and that this attribute cannot explain the genetic divergences observed at this DNA site.
For example, it evaluates the number of division differences between phylogenetic trees built within certain user-defined sliding windows (see the second step of section \autoref{aPhyloGeo-software}) and geographic trees built with latitude data (DD) at the start of sampling \citep{robinson_comparison_1981}. A high distance between a specific window and other windows considered in the RF distance analysis implies that the habitat feature has little to no impact on this particular DNA sequence and that this attribute cannot explain the genetic divergences observed at this DNA sequence \citep{briand2020generalized}.

\begin{equation}\label{eq:rf}
\text{RF}(T_1, T_2) = | \Sigma(T_1) \Delta \Sigma(T_2) |
Expand Down Expand Up @@ -264,7 +264,7 @@ \subsubsection{Robinson-Foulds distance}\label{RF}
\subsubsection{Normalized Robinson-Foulds distance}\label{RFnorm}
The normalized Robinson-Foulds (nRF) distance scales the RF distance to account for the size variations in the trees (number of clades; i.e., a group of species with a common origin), allowing a more equitable comparison. It scales the distance to a range between 0 and 1. In our context, the distance has been normalized by $2n-6$, where $n$ represents the number of taxa (see Equation \eqref{eq:rf_norm} and the last line of code in \autoref{lst:robinsonFoulds}).

Since the size of the environmental trees constructed with O\textsubscript{2} concentration (mg/L) data differs from those of the other attributes due to missing data, this normalized metric allows us to compare its dissimilarity with the phylogenetic trees in a fairer way. It reveals the relative influence of O\textsubscript{2} concentration (mg/L) on cumacean phylogenetic relationships, independent of tree size. A high value of this metric between a specific window and other windows considered in the nRF distance analysis does not allow us to conclude that there is a correlation between this DNA sequence and the attribute. It may indicate a topological dissimilarity between the habitat attribute tree and the gene trees at that position in the DNA sequence alignments. \citep{tahiri2018new)
Since the size of environmental trees constructed with O\textsubscript{2} concentration data (mg/L) differs from that of other attributes due to missing data, this nRF distance allows us to compare its dissimilarity with the phylogenetic trees in a fairer way \citep{tahiri2018new, koshkarov_phylogeography_2022}. It reveals the relative influence of O\textsubscript{2} concentration (mg/L) on cumacean phylogenetic relationships, independent of tree size \citep{tahiri2018new, koshkarov_phylogeography_2022}. A high value of this metric between a specific window and other windows considered in the nRF distance analysis does not allow us to conclude that there is a correlation between this DNA sequence and the attribute. It may indicate a topological dissimilarity between the habitat attribute tree and the gene trees at that position in the DNA sequence alignments.

\begin{equation}\label{eq:rf_norm}
\text{RF}_{\text{norm}}(T_1, T_2) = \frac{| \Sigma(T_1) \Delta \Sigma(T_2) |}{| \Sigma(T_1) | + | \Sigma(T_2) |}
Expand Down Expand Up @@ -333,7 +333,7 @@ \subsubsection{Euclidean distance}\label{euclidean}
\end{lstlisting}

\subsubsection{Least-Squares distance}\label{LS}
The Least-Squares (LS) distance measures the sum of the squares of the differences between the phylogenetic distances of the leaf pairs between the two sets of trees ($T_1$ and $T_2$) (see Equation \eqref{eq:ls} and \autoref{lst:LeastSquare}). As with Euclidean distance, the distance between each pair of leaves in the genetic trees is compared with that of the habitat attribute trees \citep{czarna2006topology, balaban2020apples}. This metric allows us to measure the topological dissimilarity between the two sets of trees and to understand how these different habitat attributes influence the topological structure of phylogenetic trees. A high value between a specific window and other windows considered in the LS distance analysis indicates a structural discrepancy between this DNA sequence and the tree built from a habitat attribute \citep{balaban2020apples}. Furthermore, we cannot conclude with certainty that there is a correlation between them since genetic variations in this window are inconsistent with variations in this habitat parameter \citep{czarna2006topology, balaban2020apples}.
The Least-Squares (LS) distance measures the sum of the squares of the differences between the phylogenetic distances of the leaf pairs between the two sets of trees ($T_1$ and $T_2$) (see Equation \eqref{eq:ls} and \autoref{lst:LeastSquare}). As with Euclidean distance, the distance between each pair of leaves in the genetic trees is compared with that of the habitat attribute trees \citep{czarna2006topology, balaban2020apples}. This metric allows us to measure the topological dissimilarity between the two sets of trees and to understand how these different habitat attributes influence the topological structure of phylogenetic trees. A high value between a specific window and other windows considered in the LS distance analysis indicates a structural discrepancy between this DNA sequence and the tree built from a habitat attribute. Furthermore, we cannot conclude with certainty that there is a correlation between them since genetic variations in this window are inconsistent with variations in this habitat parameter.

\begin{equation}\label{eq:ls}
d_{\text{LS}}(T_1, T_2) = \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}(d_T1(i,j) - d_T2(i,j))^2
Expand Down

0 comments on commit 2523dc4

Please sign in to comment.