Skip to content

Latest commit

 

History

History
219 lines (175 loc) · 10.2 KB

README.org

File metadata and controls

219 lines (175 loc) · 10.2 KB

RAFFT is a folding tool that builds fast-folding paths for a given sequence. Starting from the completely unfolded structure, it quickly identifies stems with an FFT-based technique. Then, it forms the stem that improves the overall stability. Multiple folding paths can be explored and displayed. Therefore, given a sequence, the user will obtain several structures or folding paths.

Requirements

  • Vienna RNA package (version >= 2.4.17)
  • Numpy, scipy with python 3

Installation

To install RAFFT in your local python library

pip install . --user --upgrade

Then two command line tools will be available: rafft for the folding and rafft_kin for the kinetic analysis.

Alternative Implementation

Another more performant implementation of the core algorithm is available here: rafft-rs This implementation is able to generate output in a compatible format that can be used for the same kinetic analysis using rafft_kin. Please refer to this repository for documentation about differences in usage.

rafft-rs is also added as a submodule to this repository. Use git submodule init after cloning to pull its code.

Usage

For the examples, we used the Coronavirus frameshifting stimulation element obtained from RFAM.

To display only the final structures:

rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU  -ms 5

To display the visited/saved intermediates:

rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU  -ms 5  --traj

The algorithm has two critical parameters:

  • -ms <INT> is the number of saved structures at each folding step (default=1)
  • -n <INT> is the number of positional lag to search for stems (default=100)

The search of stems can be tuned using weights on base pairs types:

  • --GC <FLOAT> GC base pairs weight (default = 3.0)
  • --AU <FLOAT> AU base pairs weight (default = 2.0)
  • --GU <FLOAT> GU base pairs weight (default = 1.0)

Inputs

The input is one sequence in the standard input or a simple text file (it can be in Fasta format).

Outputs

For the trajectory output format: at each step, numbered from 0 to 3, the structures saved are given below the sequence with their stability (computed with Vienna RNA API).

GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU
# ---------0----------
..................................................................................    0.0
# ---------1----------
.....(((((((((((..........))))))))))).............................................  -14.0
..................................((((((((................))))))))................   -6.8
...................(((((.............................................)))))........   -6.4
..................................((((.......)))).................................   -5.5
(((((..............)))))..........................................................   -4.6
# ---------2----------
........((((((((..........))))))))(((((((((((.........))).))))))))................  -23.1
........((((((((..........))))))))((((((((..((........))..))))))))................  -20.9
...................(((((..........(((((((((((.........))).))))))))...)))))........  -18.8
........((((((((..........))))))))((((((((...((....)).....))))))))................  -18.7
.....(((((((((((.((.....)))))))))))))....................((((((.............))))))  -18.2
# ---------3----------
........((((((((.((.....))))))))))(((((((((((.........))).))))))))................  -24.0
........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................  -24.0
........((((((((..........))))))))(((((((((((.........))).))))))))................  -23.1
........((((((((.((.....))))))))))((((((((..((........))..))))))))................  -21.8
........((((((((..........))))))))((((((((..((........))..))))))))................  -20.9

The output of the final structures only is the following.

GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU
........((((((((.((.....))))))))))(((((((((((.........))).))))))))................  -24.0
........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................  -24.0
........((((((((..........))))))))(((((((((((.........))).))))))))................  -23.1
........((((((((.((.....))))))))))((((((((..((........))..))))))))................  -21.8
........((((((((..........))))))))((((((((..((........))..))))))))................  -20.9

Analysis

Fast-paths plot

To create the fast-folding path figures, one can use the utility utility/plot_path.py on rafft output:

It uses VARNA to produce the secondary structure representation, should be download directly from its website

cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 5 --traj > rafft.out
python ../utility/plot_path.py rafft.out -he 500 -wi 900 -rv 1 -o path_5.png

example/path_5.png

With 20 saved structures:

example/path_20.png

Kinetic trajectory

From the above fast-folding graph, one can produce kinetic trajectories. Starting from the completely unfolded structures, it simulates the folding process.

cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 20 --traj > rafft_20.out
rafft_kin rafft_20.out -o kinetic.png --plot -mt 40

example/kinetic.png

The output has the following form

[...]
<structure>                                                                   <population> <Energy> <Structure ID>
.....(((((((((((.((.....)))))))))))))((((((((.........))).)))))...................  0.009 -23.2 44
........((((((((..........))))))))(((((((((((.........))).))))))))................  0.011 -23.1 21
((((((((.......))).)))))(((.....(((((((((((((.........))).))))))))...))......)))..  0.014 -23.6 62
(((((.(((.......))))))))(((.....((((((((((..((........))..))))))))...))......)))..  0.014 -23.6 63
........((((((((.((.....))))))))))(((((((((((.........))).))))))))................  0.049 -24.0 42
........((((((((.((.....))))))))))(((((((((((((....)).))).))))))))................  0.049 -24.0 43
(((((.(((.......))))))))(((((.....(((((((((((.........))).))))))))..))))).........  0.063 -24.5 41
(((((.(((.......))))))))(((((.....(((((((((((((....)).))).))))))))..))))).........  0.063 -24.5 61
(((((.(((.......))))))))(((.....(((((((((((((.........))).))))))))....)).....)))..  0.168 -25.1 60
(((((.(((.......))))))))(((.....(((((((((((((.........))).))))))))...))......)))..  0.531 -25.8 59

Folding landscape

From the fast-folding graph, one can also draw a landscape using the multidimensional scaling algorithm to map the structures onto a plan. It tries to preserve as much as possible the base pair distance between structures.

cd example
rafft -s GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU -ms 20 --traj > rafft_20.out
python ../utility/surface.py rafft_20.out -o landscape.png

(Initial and minimum energy structure are circled in black)

example/landscape.png

Usage as a package

The folding function and the kinetic function can both be called from the rafft package.

from rafft.rafft import fold
from rafft.rafft_kin import kinetics

seq = "GGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUGACAU"
final_set_struct, trajectory = fold(seq, max_stack=20, traj=True)
traj_k, times, struct_list, equi_pop = kinetics(trajectory, 40, 32)
for st, nrj, prob, sid in equi_pop[::-1][:10]:
    print(f"{st} {prob:5.2f}")

Reproducibility of the benchmarks

The dataset curated we used for the benchmarks is in benchmarks_results/benchmark_cleaned_all_length.csv.

The benchmark results files (and associated script to produce them) are given in the following table (for details about those results, see the associated reference):

MethodfileNotes
RAFFTrafft_100n_50ms_best_nrj_scores.csv-n 100 -ms 50 (best energy)
rafft_100n_50ms_scores.csv-n 100 -ms 50 (best score)
rafft_200n_200ms_scores.csv-n 200 -ms 200 (best score)
MFEmfe_scores.csvbench_mfe.py
MLmxfold_scores.csvbench_mxfold.py

analysis.org and utils_analysis.py contain the pieces of script used to perform the analysis and the figures.

For the test case, we used the command line given in the Usage section above. Figures were derived from their output.