-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scoring of read pairs on same vs. different chromosome #321
Comments
Intuitively, I would say yes, at least in the following sense: If there are two alternatives for mapping a read pair, then I would prefer one that places both on the same chromosome. I guess it depends on our expectations of what can happen in a sample. I assume that a rearrangement that leads to the reads mapping far apart on the same chromosome is likelier than one that leads to them being on a different chromosome. It’s not quite the same, but as an experiment, I increased the
But then I don’t think our test data has this type of event, so it’s natural that this isn’t helpful. |
I think we have detected a bug if my reasoning below is correct, nice! The score was implemented so that the further away from mean the isize is, the lower score. As seen here https://github.com/ksahlin/strobealign/blob/main/src/aln.cpp#L1185 read pairs that map more than 5*sigma away gets a score lower than disjoint reads. So in the scoring that happens here: https://github.com/ksahlin/strobealign/blob/main/src/aln.cpp#L1169 , we should probably have something like:
In the proposed correction above we have capped the penalty to that of disjoint reads, and we have added a small positive value The reason I claim that our current scoring is a bug is because we will actually penalize proper but far away mates more than completely separated reads (if sw scores identical), so we will then prefer disjoint reads. This is likely bad for detecting larger deletions. |
To add a bit of data, I get below penalties libraries for with mu=300, sigma=30 (which I think we use in our simulations?). So, we get that proper pairs with distance higher than about 470 would be penalised more than individual best alignments (e.g. on different chromosomes). Also, we set
|
Originally posted by @ksahlin in #319 (comment)
The text was updated successfully, but these errors were encountered: