SoloTE v1.09 -- 09/13/2023
MAJOR CHANGES / Changes in behaviour
- SoloTE now generates 5 output matrices:
classtes_MATRIX
: Results summarized at the class (LTR, LINE, SINE, DNA, RC) level.
familytes_MATRIX
: Results summarized at the family level.
subfamilytes_MATRIX:
Results summarized at the subfamily level (similar to scTE results).
locustes_MATRIX
: Results summarized at the locus level. Only uniquely mapped reads are considered here.
legacytes_MATRIX
: Results summarized at the locus level for uniquely mapped reads, and at the subfamily level for multi-mapped reads. This was the default output until v1.08, and the one used in the publication.
- Performance improvements
Code has been cleaned by taking advantage of the versatile options available in SAMtools for BAM processing and filtering.
In turn, the number of I/O operations have been reduced, diminishing the size occupied by temporary files, and speeding up the pipeline.
- Input TE BED file generation
A helper utility, SoloTE_RepeatMasker_to_BED.py
, is now packed to allow for a streamlined generation of the Transposable Element BED file required by SoloTE.
For example, for Human hg38, the following command can be run:
python SoloTE_RepeatMasker_to_BED.py -g hg38
This will generate the BED file hg38_rmsk.bed:
chr1 11505 11675 chr1|11505|11675|L1MC5a:L1:LINE|25.1|- 25.1 -
chr1 11678 11780 chr1|11678|11780|MER5B:hAT-Charlie:DNA|29.4|- 29.4 -
chr1 15265 15355 chr1|15265|15355|MIR3:MIR:SINE|23.0|- 23.0 -
chr1 18907 19048 chr1|18907|19048|L2a:L2:LINE|33.8|+ 33.8 +
chr1 19972 20405 chr1|19972|20405|L3:CR1:LINE|31.2|+ 31.2 +
Minor changes
Added --minoverlap
command line argument. This parameter defines the minimum number of base pairs to annotate a read to a TE (Default = 1 bp).
Bug fixes