Skip to content

SoloTE v1.09 -- 09/13/2023

Latest
Compare
Choose a tag to compare
@bvaldebenitom bvaldebenitom released this 13 Sep 13:27
· 8 commits to main since this release

SoloTE v1.09 -- 09/13/2023

MAJOR CHANGES / Changes in behaviour

  • SoloTE now generates 5 output matrices:

classtes_MATRIX: Results summarized at the class (LTR, LINE, SINE, DNA, RC) level.
familytes_MATRIX: Results summarized at the family level.
subfamilytes_MATRIX: Results summarized at the subfamily level (similar to scTE results).
locustes_MATRIX: Results summarized at the locus level. Only uniquely mapped reads are considered here.
legacytes_MATRIX: Results summarized at the locus level for uniquely mapped reads, and at the subfamily level for multi-mapped reads. This was the default output until v1.08, and the one used in the publication.

  • Performance improvements

Code has been cleaned by taking advantage of the versatile options available in SAMtools for BAM processing and filtering.
In turn, the number of I/O operations have been reduced, diminishing the size occupied by temporary files, and speeding up the pipeline.

  • Input TE BED file generation

A helper utility, SoloTE_RepeatMasker_to_BED.py, is now packed to allow for a streamlined generation of the Transposable Element BED file required by SoloTE.
For example, for Human hg38, the following command can be run:
python SoloTE_RepeatMasker_to_BED.py -g hg38

This will generate the BED file hg38_rmsk.bed:

chr1    11505   11675   chr1|11505|11675|L1MC5a:L1:LINE|25.1|-  25.1    -
chr1    11678   11780   chr1|11678|11780|MER5B:hAT-Charlie:DNA|29.4|-   29.4    -
chr1    15265   15355   chr1|15265|15355|MIR3:MIR:SINE|23.0|-   23.0    -
chr1    18907   19048   chr1|18907|19048|L2a:L2:LINE|33.8|+     33.8    +
chr1    19972   20405   chr1|19972|20405|L3:CR1:LINE|31.2|+     31.2    + 

Minor changes
Added --minoverlap command line argument. This parameter defines the minimum number of base pairs to annotate a read to a TE (Default = 1 bp).

Bug fixes

  • Issues #14, #17: Long-standing matrix generation issues, arising in different systems (i.e., one implementation would work in OSX and not Linux) are now being handled in a platform-independent manner.