-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeat Annotations #245
Comments
Hello, DNA transposons 142553 36526713 bp 2.78 % Rolling-circles 63104 18793057 bp 1.43 % Unclassified: 1798068 422243089 bp 32.18 % Total interspersed repeats: 965967429 bp 73.62 % Small RNA: 8616 1717953 bp 0.13 % Satellites: 7189 1514437 bp 0.12 % Here, if I'm understanding this table correctly, Retroelements=SINEs+Penelope+LINEs+LTR elements. However, the Retroelements as presented in the table appears to be the sum of SINEs, LINEs, and LTR elements. Is this intentional? . Could this be related to #200? In similar vain to above and the question posed by @anna-elisabet, in the above example, no matter how I calculate total interspersed repeats manually, the values reported < the sum of individual subcategories summed together. I am aware of #228 and that DNA transposons isn't a category total, but a subcategory itself, but that only exacerbates the sum difference. It's very likely I'm not making a crucial connection and misunderstanding something. Any guidance is appreciated. I'm using RepeatMasker ver. 4.1.5, if that's helpful. |
Hi @genmor, Second comment, interestingly, I also see that if you sum Retroelements, DNA transposons and Unclassified (excluding rolling-circles) it adds up to the "Total interspersed repeats" given. My guess is still that RepeatMasker does not count "Rolling-circles" as an interspersed repeat, which, according to my understanding of the biology, is wrong. |
Hi @anna-elisabet, I'll note that in #228, the answer given was that DNA transposons isn't a sum of the categories below it, but that they are unspecified DNA transposons. As you note though, the arithmetic here shows that it (i.e., DNA transposons) actually is being treated as a category sum to calculate the total interspersed repeats. I think the long and short of this is that we need @rmhubley to confirm, lol. ================================================
|
Hi @genmor ! I think I finally cracked the code: To get the total interspersed repeats as reported in the .tbl file, sum the following: Retroelements, Penelope, DNA transposons, Unclassified. |
Hi @anna-elisabet, |
Oh my -- sorry for the confusion, and thanks for supporting each other through this. The *.tbl format is a static format that hasn't changed much since Arian first released RepeatMasker. There are different formats for primates, mice, mammals, and one for everything else. Unfortunately, these formats have sometimes lagged behind changes to the classification and I believe you have identified a bug with Penelope here. Penelope used to be classified as "LINE/Penelope" and now is "PLE/" (with quite a few subtypes "Athena", "Chlamys" etc). The table section "Retetroelements" is missing the new type "PLE" in it's tabulation. I will correct this in the next release. Since there isn't a one-size-fit-all to summarizing results, we also provide the util/buildSummary.pl script which performs a per-class, and per-family tabulation of the *.out file which would be more useful to you in this circumstance. |
Hi,
I would like to know whether the "Total interspersed repeats" percentage in the tbl output file is the sum of the "Retroelements", "DNA transposons", "Rolling-circles", and "Unclassified". When I add these categories together, I do not get the total as given.
If it is not the exact total, how is the total calculated?
I have run RepeatMasker on different genomes with different libraries, and in some cases I found that excluding the "Rolling-circles" makes the rest add up to the total given interspersed repeats, but that was not always the case.
I used a custom repeat library generated by RepeatModeler, subsequently classified by DeepTE.
Thanks!
The text was updated successfully, but these errors were encountered: