-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If R1 and R2 map to same position, treat them as proper pair #319
Conversation
This changes behavior slightly as the conditions were slightly different at the two call sites (a-b < mu+10*sigma vs a-b < 2000), but it seems to make more sense to have consistent behavior. Also, accuracy is nearly unchanged (+/-0.0001)
Very nice! Approved. |
btw, the change in cd26078 highlights that we use |
Sure, I can make that more consistent. There’s also the Lines 311 to 321 in aed4738
But that one is only for setting the "properly paired" flag in SAM output, not for making any mapping decisions, so I guess it’s ok if it is a bit more restrictive. |
I added 47cb5c2 to make things more consistent. For completeness, these are the accuracy changes from that commit alone:
|
Ah I see. yes agreed. Most important is that pairs within 10sigma is scored higher than e.g. reads on different chromosomes. Related: do you think it would make sense / be worth to score read pairs on the same chromosome (but far away) higher than reads on separate chromosomes? Currently I think there is only the binary 'close and proper' or 'not close and proper' classification. |
I think we’re on the same page about this PR, so merging it now. |
The first change is to factor out an
is_proper_nam_pair()
function. This changes behavior slightly as the code wasn’t identical at the two call sites. The condition for the distance wasa-b < 2000
in one case anda-b < mu+10*sigma
in the other. The factored-out function uses the latter because taking the distribution into account seems to be the intended behavior.The second change is to treat a pair of NAM as being a "proper pair" even if the query start positions are the same.
Both commits change accuracy slightly. The overall difference is as follows:
The case that R1 and R2 map to the same location doesn’t occur in our test datasets as far as I can tell, so it’s expected that the numbers don’t improve. The actual benefit would be on real data.
Closes #317