Skip to content

5. Extracting target contigs

Tobias Hofmann edited this page May 17, 2016 · 3 revisions

extract_contigs.py

In the previous step we identified and listed those target loci that are present in the contig file for each sample. The script produced an SQL-database (probe.matches.sqlite, stored in the output folder) in the previous step. This database contains the information, which contigs could be found for which sample and what the corresponding contig-header in the contig fasta file is for each identified locus. With that information we can now extract all the desired contigs, which are listed in the database. If you want to view the database you can open it in sqlite3 by typing sqlite3 probe.matches.sqlite, but you don't have to bother with it, the following script accesses the database automatically for you.

Run the script

All the files you need for the extraction of the target loci (besides the contigs-folder) are found in the output folder from the previous step (referred to as path/to/matches-folder in the example commands). The script needs to know where the contigs are stored (flag --contigs), where to find the database with the locus information (flag --locus-db) and where the config file (which was automatically created in the previous command) is located (flag --config). Otherwise you just need to tell the script where to store the output, which will be a fasta file with all the target contigs for all samples (flag --output).

Example:

python2.7 extract_contigs.py --contigs path/to/contig-folder --locus-db path/to/matches-folder/probe.matches.sqlite --config path/to/matches-folder/config --output path/to/matches-folder/matches.fasta