DecentTree is a cross-platform command-line program and also a C++ library for inferring phylogenetic trees (in newick format) from distance matrices (supplied in phylip distance matrix formats), or alignments (supplied in fasta format), using well-known algorithms, such as Neighbour Joining, BIONJ and UPGMA.
If you use DecentTree in a publication, please cite:
W. Wang, J. Barbetti, T.K.F Wong, B. Thornlow, R. Corbett-Detig, Y. Turakhia, R. Lanfear, B.Q. Minh (2023) DecentTree: scalable Neighbour-Joining for the genomic era. Bioinformatics, 39, btad536. https://doi.org/10.1093/bioinformatics/btad536
The distance-based tree reconstruction in DecentTree was originally derived from IQ-TREE2 and and then optimized and parallelized for large datasets with more than 10,000 sequences.
DecentTree implements several distance-based phylogenetic algorithms:
- UPGMA (Sokal and Michener, 1958),
- Neighbor Joining (or NJ) (Saitou and Nei 1987),
- BioNJ (Gascuel, 1997), and
- Unweighted NJ (or UNJ) (Gascuel 2002).
It includes vectorized versions of NJ, BIONJ, and UNJ. It also includes optimized versions of UPGMA, NJ and BIONJ, all of which use the same techniques as RapidNJ (Simonsen, et al., 2008). All the algorithms are implemented in header files, as template classes, allowing the caller to choose between single or double precision, all implement a standard interface, accepting either Phylip distance matrix files or in-memory matrices, and all are parallelized using OpenMP.
The vectorized versions make use of the VectorClass library. The command-line application allows users to select the algorithm and thread count. DecentTree also makes use of custom vectorized sequence comparison functions (which also available via headers), that are called in parallel, to calculate distance matrices.
The (header only) template classes can be used in other projects. (For full functionality it will be necessary to link with OpenMP - for parallelization - and compile with Agner Fog's VectorClass library - for vectorization).
For additional documentation, see:
- UPGMA_Matrix, used for Unweighted Neighbor Joining is an implementation of the UPGMA algorithm of "Evaluating Systematic Relationships", by Rober R. Sokal, and Charles D. Michener (1958), (in the University of Kansas Science Bulletin).
- NJMatrix, used for Neighbor Joining (NJ), is an implementation based on the Neighbor Joining (NJ) algorithm outlined in "The neighbor-joining method: a new method for reconstructing phylogenetic trees", by Naurya Saitou and Masatoshi Nei (1987).
- UNJMatrix, is an implementation based on the Unweighted NJ (or UNJ) algorithm, described in "Concerning the NJ algorithm and its unweighted version, UNJ", by Olivier Gascuel (1997).
- BIONJMatrix, is an implementation based on the BIONJ algorithm described in "BIONJ: An Improved Version of the NJ Algorithm Based on a Simple Model of Sequence Data", by Olivier Gascuel (2009), in Molecular Biology and Evolution. 1997 14:685-695. See: http://www.lirmm.fr/~w3ifa/MAAS/BIONJ/BIONJ.html
- BIONJ2009, is based on a program, written by Olivier Gascuel and Hoa Sien Cuong, that was obtained from http://www.lirmm.fr/~w3ifa/MAAS/BIONJ/BIONJ.html
- BoundingMatrix, used for BIONJ-R and NJ-R is based on "Inference of Large Phylogenies using Neighbour-Joining", by Martin Simonsen, Thomas Mailund, Christian N. S. Pedersen. From: Communications in Computer and Information Science (Biomedical Engineering Systems and Technologies: 3rd International Joint Conference, BIOSTEC 2010,2 Revised Selected Papers), volume 127, pages 334-344, Springer Verlag, 2011. See: https://birc.au.dk/software/rapidnj/.
- LLVM's OpenMP library is used for parallelization (if DecentTree is built with a compiler than has Open MP support). (The omp.h header, based on that of LLVM, is included to deal with compilation issues that sometims arise when building on Macintosh)
- Agner Fog's vectorclass library is used for vectorization (in the NJ-V, BIONJ-V, UPGMA-V algorithms, implemented by VectorNJ, VectorBIONJ and VectorizedUPGMA_Matrix template classes). See: https://github.com/vectorclass.
- zlib (https://zlib.net) is a free library of compression algorithms. (DecentTree is built with version 1.2.7).
- gzstream, C++ iostream classes wrapping the zlib compression library. Copyright (C) 2001 Deepak Bandyopadhyay, Lutz Kettner
- The heapsort algorithm implemented in parallel_heapsort.h, is loosely based on Floyd's TreeSort3 (Algorithm 245, Communications of the ACM, Volume 7, Number 12, December 1964) which in turn was based on J.W.J.William's Heapsort (Algorithm 232, Communications of the aCM, Volume 7, Number 6, June 1964).
- The mergesort algorithm implemented in parallel_mergesort.h, is loosely based on Algorithm S (Straight Mergesort), from volume 3 of "The Art of Computer Programming", by Donald E. Knuth (only the rational arithmetic used to split partitions evenly is original).
- The timing functions in timeutil.h were originally developed for IQ-TREE2. See: https://github.com/iqtree/iqtree2.
- The progress bar functionality in progress.h and progess.cpp was originally developed for IQ-TREE. See: https://github.com/iqtree/iqtree2.