Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to tag Intro retention, alternative first and last exon from exon-junction relationships? #2

Closed
tbrittoborges opened this issue Jan 31, 2020 · 3 comments
Assignees
Labels
question Further information is requested

Comments

@tbrittoborges
Copy link
Member

Given junction j, and vector of all exons overlapping to this junctions we can determine alternative splicing event of type ES, A3SS and A5SS.
We need to define how to classify events of the following types:

  • Exon inclusion
  • AFE (not splicing)
  • ALE
  • Intron retention
@boehmv
Copy link
Collaborator

boehmv commented Feb 19, 2020

I have some ideas or points to further discuss, which right now are rather focussed on leafcutter:

Concerning the AFE and ALE; this might be a more complex problem if we just work with the junction coordinate and the exon coordinates. We could try to supplement the exon information with ensembl-based exon identifiers, but those would be different depending on which transcript we look at (at least in many multi-isoform genes). We also have to ask ourselves, how do we define AFE and ALE? Do both the exon start AND end coordinates need to be different? Can it overlap with the "normal" exon?

Concerning the exon inclusion: if I am not mistaken, this could be resolved using just the coordinates, maybe we need to extract information from the leafcutter "clusters" as well. I guess the big question here is, how do we differentiate between exon inclusion and simple A3SS/A5SS? If we just look at one isolated junction which has START or END coordinates that do not match any exons, we would normally assume it is a A3SS/A5SS right? From the leafcutter cluster information we could probably see, if another junction exists, that represents the "other" junction that encompass the included exon (see scheme below).

AS_events

For intron retention, I guess it is hard to do that with leafcutter... any ideas on your end?

@tbrittoborges
Copy link
Member Author

Thanks for the feedback.

There are well established tools to find intron retention, the most notable is IRfinder2. Although I think I would be nice to have such information, I would need to think who to integrate the introns information into Baltica. I opened a separate issue (#4) to handle that. Maybe it would be possible IR with leafcutter as Majiq does, but I would rather test this in context with the established tool.

I had to generalize the algorithm that we used in the previous paper to make it work for many tools Baltica.
The way I implemented can be though as the following:

  • For each duj (junction diff. spliced) find the overlapping exons (including de novo exons)
  • Compute the relationship for each duj and exon pair
    That's it. What we end with is a list of AS types.

Because it only have the current exon information, and not the information from the cluster or the intron chain, I can't tell which junction is canonical or alternative.
In addition, some tools won't output any cluster information (JunctionSeq) . One alternative to that is maybe using the information the intron count from StringTie to define the canonical intron. By doing that I could probably resolve both the EI problem and AFE and ALE.

I particularly like this idea because StringTie now also supports long reads, meaning that would be in line with what we proposed to integrate the Illumina and nanopore data.

As you point in second part of the diagram, there is a hierarchy problem that can be a bit hard to deal with. I will bring some examples so we have this documented.

@boehmv
Copy link
Collaborator

boehmv commented Feb 21, 2020

Good points you raised here, I would also see IRFinder as something rather in the middle of the priority list. For some of our projects, it might be really useful though, so especially Lena would be happy if we could integrate it at some point.

I didn't know you would use the de novo exons (coming from StringTie I assume?) as well. I have to be honest that I do not fully understand how the intron counts from StringTie help you to resolve the EI/AFE/ALE problem. Maybe you can share your ideas whenever you have time :)
Generally, I am also in favor of implementing StringTie into the whole process.

@tbrittoborges tbrittoborges added the question Further information is requested label Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants