-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to tag Intro retention, alternative first and last exon from exon-junction relationships? #2
Comments
I have some ideas or points to further discuss, which right now are rather focussed on leafcutter: Concerning the AFE and ALE; this might be a more complex problem if we just work with the junction coordinate and the exon coordinates. We could try to supplement the exon information with ensembl-based exon identifiers, but those would be different depending on which transcript we look at (at least in many multi-isoform genes). We also have to ask ourselves, how do we define AFE and ALE? Do both the exon start AND end coordinates need to be different? Can it overlap with the "normal" exon? Concerning the exon inclusion: if I am not mistaken, this could be resolved using just the coordinates, maybe we need to extract information from the leafcutter "clusters" as well. I guess the big question here is, how do we differentiate between exon inclusion and simple A3SS/A5SS? If we just look at one isolated junction which has START or END coordinates that do not match any exons, we would normally assume it is a A3SS/A5SS right? From the leafcutter cluster information we could probably see, if another junction exists, that represents the "other" junction that encompass the included exon (see scheme below). For intron retention, I guess it is hard to do that with leafcutter... any ideas on your end? |
Thanks for the feedback. There are well established tools to find intron retention, the most notable is IRfinder2. Although I think I would be nice to have such information, I would need to think who to integrate the introns information into Baltica. I opened a separate issue (#4) to handle that. Maybe it would be possible IR with leafcutter as Majiq does, but I would rather test this in context with the established tool. I had to generalize the algorithm that we used in the previous paper to make it work for many tools Baltica.
Because it only have the current exon information, and not the information from the cluster or the intron chain, I can't tell which junction is canonical or alternative. I particularly like this idea because StringTie now also supports long reads, meaning that would be in line with what we proposed to integrate the Illumina and nanopore data. As you point in second part of the diagram, there is a hierarchy problem that can be a bit hard to deal with. I will bring some examples so we have this documented. |
Good points you raised here, I would also see IRFinder as something rather in the middle of the priority list. For some of our projects, it might be really useful though, so especially Lena would be happy if we could integrate it at some point. I didn't know you would use the de novo exons (coming from StringTie I assume?) as well. I have to be honest that I do not fully understand how the intron counts from StringTie help you to resolve the EI/AFE/ALE problem. Maybe you can share your ideas whenever you have time :) |
Given junction j, and vector of all exons overlapping to this junctions we can determine alternative splicing event of type ES, A3SS and A5SS.
We need to define how to classify events of the following types:
The text was updated successfully, but these errors were encountered: