Skip to content

Latest commit

 

History

History
102 lines (84 loc) · 5.83 KB

README.md

File metadata and controls

102 lines (84 loc) · 5.83 KB

aRNAque is a simple but efficient evolutionary algorithm for inverse RNA folding inspired by Lévy flights.

For a given target structure in a dot-bracket representation, the tool allows the generation of good quality (low ED and MFE) RNA sequences with the corresponding structure close to the input target. The method relies on local mutations of nucleotide and base pairs independently with respect to some probabilities: P_N and P_C. More details about the choice of P_N and P_C are provided in the SI of our paper.

The repo is organised as follows:

  • data: The clean data used to produce the different plots presented in our paper. The cleaned data are obtained by cleaning up the data generated from our file. For more details, please refer to the Python notebook here

  • [docs] (docs): The files describing the evolutionary algorithm implemented.

  • images: The plots (in pdf) used in the paper and the Python notebook code are in notebook/plots.ipynb.

  • src: The source codes are organised in three main parts:

    • Utilities: set of basic Python functions useful for our EA implementation and the script implementing the folding tool wrappers.

    • aRNAque.py: it contains the EA implementation, the initialization, mutation, selection and EA functions.

Requirements

To be able to run aRNAque, the following softwares are required:

To install all requirements automatically, including the setup of a conda environment called aRNAque via miniconda, type the following command:

make requirements

The installation was tested on the following operating systems:

  • MacOS Mojave
  • Debian Xfce 4.12

For the pseudoknotted RNA targets

  • IPknot: the version we used in this work is 0.0.5, and it can be downloaded to the official website. We can also share the copy we have with you upon request.
  • HotKnots: the patched version of hotknots we used for our benchmark result can be found in the thirdparty folder. After installing the folding tools, make sure you have set an environmental variable for each tool in the bin directories. HOTKNOTS_ROOT for hotknots should be set to /bin, and IPKNOT for IPknot should be set to /build.

How to run the program?

First, please clone the git repo using the command:

  $ git clone [repo link](#)
  $ cd aRNAque
  $ make requirements //In case the dependencies are not yet installed.  
  $ cd aRNAque/src/
  $ python aRNAque.py --target="((....)).((....)).((.....)).((....))"

For more details about the parameters please use:

❯ python aRNAque.py --help
usage: aRNAque.py [-h] [--target TARGET] [--job JOB] [-g G] [-n N] [-msf MSF]
                  [-sm SM] [-bp BP] [--Cs CS] [-EDg EDG] [-c C]
                  [--hairpin_boosting] [--folding_tool FOLDING_TOOL] [--log]
                  [--verbose] [--turner1999] [-seed SEED]

optional arguments:
  -h, --help            show this help message and exit
  --target TARGET, -t TARGET
                        Target RNA secondary structure in dot bracket
                        representation
  --job JOB, -j JOB     Number of EA runs (default: 1)
  -g G                  Number of generation (default: 150)
  -n N                  Population Size (default: 100)
  -msf MSF              maximum sequence found (default: 10)
  -sm SM                Selection method: the only possible values are {F,NED}
                        (default: NED)
  -bp BP                Distribution of nucleotide and base pairs. Possible
                        values are {GC,GC1,GC2,GC3,GC4,GC25, GC50, GC75,ALL},
                        please check the online doc for more details (default:
                        GC2)
  --Cs CS               sequence constraints: the lenght of the sequence
                        should be the same as the target. Example:
                        target=((....)), C=GNNNANNC (default: None)
  -EDg EDG              number of generation for Ensemble defect refinement
                        (default: 0)
  -c C                  Exponent of the zipf's distribution (default: None)
  --hairpin_boosting    Boost the hairpin loops. When false no hairpins
                        boosting (default: False)
  --folding_tool FOLDING_TOOL, -ft FOLDING_TOOL
                        folding tool to be used: v for RNAfold from viennarna,
                        ip for IPknot and hk for Hotknots (default: v)
  --log                 Store the population for each instance of the inverse
                        folding in a folder (default: False)
  --verbose             Print the mean fitness evolution on a standard output
                        (default: False)
  --turner1999          Use the old energy parameters (default: False)
  -seed SEED            Seed for the initial population (default: None)
  $ python aRNAque.py --help

For pseudoknotted target, please choose the appropriate folding tool using the option -ft "ip" or -ft "hk".

Citations

If you use this tool, please cite the following article

Merleau, N.S.C., Smerlak, M. aRNAque: an evolutionary algorithm for inverse pseudoknotted RNA folding inspired by Lévy flights. BMC Bioinformatics 23, 335 (2022). https://doi.org/10.1186/s12859-022-04866-w