trajectory-likelihood

This is a reference implementation of several methods to calculate alignment gap probabilities using several different approximations to continuous-time Markov chains models of sequence evolution with insertions and deletions of geometrically distributed lengths (the "general geometric indel" model of De Maio, 2020; a special case of the "long indel" model of Miklós, Lunter & Holmes, 2004).

The methods implemented by this code include

trajectory enumeration, as described by Miklós, Lunter & Holmes in the paper A "Long Indel" model for evolutionary sequence alignment (2004);
manual moment-matching, as described by De Maio in the paper The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment (2020);
renormalized moment-matching, as described by Holmes in the paper A Model of Indel Evolution by Finite-State, Continuous-Time Machines (2020).

Notes on the implementation of trajectory enumeration from MLH 2004

The code includes an implementation of Algorithm 1 from the MLH 2004 paper, which computes the likelihood of a state trajectory in an arbitrary continuous-time discrete-state Markov chain, with event times integrated out.

For clarity, this code currently makes the following simplifications relative to the full "Long Indel" model described in the paper:

Currently it only calculates the gap likelihoods for internal chop zones (i.e. not the regions at the ends of the sequence, nor the probability of deleting/inserting the entire sequence). This is equivalent to considering only infinitely long sequences.
Currently it does not use a mixture of geometric distributions for the indel lengths, just a single geometric distribution.

Generalizing the code to cover these cases would be relatively straightforward, and only involve changing the calculation of zone exit and transition rates. The "work" of the implementation is largely in counting trajectories and calculating their probabilities.

Installation

The Boost library is required for the command-line tool (though not if you just want to link to the C++ code).

Type make to compile the command-line tool.

Running

./trajeclike --mu .049 --y .543 --time 1 --maxlen 20 --maxevents 3 -v1

The values of the mu and y parameters here are (from the MLH 2004 paper) the maximum likelihood fit of a single-component geometric indel distribution to protein alignments in the HOMSTRAD database.

To compare to simulation results, run with the --simulate switch (note that this simulates directly from the underlying continuous-time Markov chain)

./trajeclike --mu .049 --y .543 --time 1 --maxlen 20 --simulate -v1

For more verbose logging, increase the verbosity (-v). For example, -v1 will display a legend for the output probability matrix. Higher verbosity levels will display more information, such as progress messages.

For help, use -h.

Analysis

If you turn on verbose logging (-v1), the program will report summary statistics including the moments of the gap length distribution.

The node script relent.js can be used to compute the relative entropy between gap length distributions computed using different methods.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
benchmark		benchmark
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cim.cpp		cim.cpp
cim.h		cim.h
mid.cpp		mid.cpp
mid.h		mid.h
moments.cpp		moments.cpp
moments.h		moments.h
package.json		package.json
relent.js		relent.js
rk4.cpp		rk4.cpp
rk4.h		rk4.h
simulate.cpp		simulate.cpp
simulate.h		simulate.h
stacktrace.h		stacktrace.h
tkf.cpp		tkf.cpp
tkf.h		tkf.h
trajec.cpp		trajec.cpp
trajec.h		trajec.h
trajeclike.cpp		trajeclike.cpp
util.cpp		util.cpp
util.h		util.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trajectory-likelihood

Notes on the implementation of trajectory enumeration from MLH 2004

Installation

Running

Analysis

About

Releases

Packages

Contributors 2

Languages

License

ihh/trajectory-likelihood

Folders and files

Latest commit

History

Repository files navigation

trajectory-likelihood

Notes on the implementation of trajectory enumeration from MLH 2004

Installation

Running

Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages