DISCONTINUED in favor of DCAUtils.jl
Simple package that:
-
Reads FASTA files and translate it in a numerical matrix
-
Computes the reweighting score as described in "Fast and accurate multivariate Gaussian modeling of protein families: Predicting residue contacts and protein-interaction partners" by Carlo Baldassi, Marco Zamparo, Christoph Feinauer, Andrea Procaccini, Riccardo Zecchina, Martin Weigt and Andrea Pagnani, (2014) PLoS ONE 9(3): e92721. doi:10.1371/journal.pone.0092721
-
Compute empirical frequency counts (with and without pseudocount)
The computation of the sequence weights is typically very expensive computationaly. A considerable speed-up can be achieved by exploiting parallel computation. To do so, just start julia with the -p nprocs
argument where nprocs
is the number of workers available on your machine. Alternatively, from julia REPL, just do a:
julia> using Distributed;
julia> addprocs(8) # put here the number of cores available
julia> @everywhere using CorrDCA
All methods available here are present also in the (so far) unregistered "GaussDCA" package.
The remove_duplicate_sequences
has been basically copied by it.