-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing MergeBamAlignment creates invalid BAMs in the presence of secondary alignments
#932
base: master
Are you sure you want to change the base?
Fixing MergeBamAlignment creates invalid BAMs in the presence of secondary alignments
#932
Conversation
MergeBamAlignment creates invalid BAMs in the presence of secondary alignments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments about the Java code, not about the code's meaning.
final Integer hiTagValueOfFirst = firstOfPairOrFragment.get(i).getIntegerAttribute(SAMTag.HI.name()); | ||
final Integer hiTagValueOfSecond = secondOfPair.get(i).getIntegerAttribute(SAMTag.HI.name()); | ||
|
||
if (nonNull(hiTagValueOfFirst) && nonNull(hiTagValueOfSecond)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for using these functions instead of comparing against null
directly? My understanding is that these methods are provided so we have convenient lambda forms for streaming operations, etc. Using them inline like this doesn't look like an improvement to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use functional style instead of procedural even if it could decrease performance a little bit because it usually increases code readability. Especially for functions of Java standard library. I could revert all cases when I use static import if it is the concern. So is it critical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean about functional vs procedural... one form is an expression consisting of a function call, the other is a binary expression using the!=
operator. Both are equally functional and procedural, as I understand these terms.
Readability is a subjective argument; to me, using != null
is more readable. Given that it requires a static import to avoid making the code much wordier than the original, I don't think it's worth the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking about operators vs functions, I have always thought that since Java doesn't allow to make partial application of an operator it is convenient to use functions in such cases. Let's imagine we have equals
function that takes two objects and checks their equality. So isNull
function is just a partial application of equals
function with fixed null
as one of the arguments. Then nonNull
is negated isNull
. So that way, for the sake of consistency it is easier to use functions instead of operators in Java not only in streams but everywhere in mine opinion. Is it a little bit excessively? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like you have a more enlightened view of functional programming in Java than I do. :)
Thanks for the explanation, I now understand the distinction you're making between expressions and functions, and why it's more consistent to use functions everywhere. I can't say I'm convinced but I'll definitely be giving it more thought.
import java.util.Comparator; | ||
import java.util.List; | ||
import java.util.Objects; | ||
|
||
import static java.util.Objects.isNull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general static imports should be avoided as they make code harder to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced by operators as was discussed here.
final Integer hiTagValueOfSecond = secondOfPair.get(i).getIntegerAttribute(SAMTag.HI.name()); | ||
|
||
if (nonNull(hiTagValueOfFirst) && nonNull(hiTagValueOfSecond)) { | ||
if (hiTagValueOfFirst < hiTagValueOfSecond) secondOfPair.add(i, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code didn't do this, but the new code should use {
}
around all if/else body statements, even if they're a single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done here.
return nonNull(first) && nonNull(second) | ||
&& first.getMateAlignmentStart() == second.getAlignmentStart() | ||
&& first.getAlignmentStart() == second.getMateAlignmentStart() | ||
&& Objects.equals(first.getReferenceName(), second.getMateReferenceName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use Object.equals()
here? is it possible that one of these is null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since both SamRecod#mReferenceName
and SamRecord#mMateReferenceName
are nullable we need to ensure that they are non-null or simply delegate it to Objects.equals
to avoid NullPointerException
. This is a theoretical case that may hardly ever occur in runtime. So, do we need to prevent this case or can I unwrap this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK, I was wondering if this was added to fix a known problem or was just precautionary.
Looking at SamRecord
I see that it's common to check these fields for null
so it seems wise to do it here too.
It would be great if picard had nullable constraints in the code base! That would be a huge PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | ||
* @return true if records belong to the same pairwise alignment | ||
*/ | ||
private static boolean arePair(SAMRecord first, SAMRecord second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should be a non-static method on SAMRecord
, as it only deals with fields in that object and takes that object as its argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAMRecord
is part of htsjdk
and this was the only constraint I haven't done it. Could I just make this function a SamRecord#arePair
method and make associated pull-request to htsjdk
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please review associated request to htsjdk
?
e418168
to
11a9c1f
Compare
1 similar comment
@DenisVerkhoturov there are many "esthetic" changes mixed in with substantive changes. Could you please rebase and separate into two commits, one that is purely esthetic and one that is substantive? it will make it much easier to understand what you are doing. |
96bc12a
to
58082fa
Compare
@yfarjoun Sure, done=) |
} | ||
|
||
/** | ||
* Identifies weather records present pairwise alignment or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weather -=> whether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed here.
void coordinateByMate() { | ||
final List<SAMRecord> newFirstOfPairOrFragment = new ArrayList<>(); | ||
final List<SAMRecord> newSecondOfPair = new ArrayList<>(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this file preferred ++i to i++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you might see (here for instance), there was a usage of a postfix increment in this file before. So I just followed this example.
All increment operators in this file changed to prefix form.
|
||
hitsForInsert.addFirstOfPairOrFragment(second); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 3 some function of HitsForInsert::numHits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed here.
|
||
// Now renumber any correlated alignments, and remove hit index if no correlated read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for breaking this out and cleaning up hi, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback=)
58082fa
to
e4c8c0d
Compare
Description
The problem originates in
HitsForInserts#coordinateByHitIndex
. This method makes decision whether read is paired or not on assumption that all the records have HI-tag but ignores other criteria such as RNEXT (Ref. name of the mate/next read) and PNEXT (Position of the mate/next read) SAM fields. So paired reads are determined as two unrelated ones in the case when HI-tag is absent.Implementation
coorditanteByMate
) for correlating pair-aware alignments. It does not require presence of HI-tags and performs work based on the data contained in fields: RNAME, RNEXT, POS, PNEXT.MergeBamAlignmentTest#testPairedReadsAreProcessedProperlyInAbsenceOfHitIndexTags
running MergeBamAlignment with SAM file contained pair-aware alignments without HI-tag.testCoordinateByMateMethodWorksProperly
for the new method.BestMapqPrimaryAlignmentSelectionStrategy
. It uses implemented methodcoordinateByMate
instead ofcoordinateByHitIndex
now. We kept the old method due to backward compatibility.Performance
All the benchmarks were measured by JMH (Mode - thrpt, Cnt - 100, Units - ops/s)
Resolves
MergeBamAlignment creates invalid BAMs in the presence of secondary alignments
Checklist (never delete this)
Never delete this, it is our record that procedure was followed. If you find that for whatever reason one of the checklist points doesn't apply to your PR, you can leave it unchecked but please add an explanation below.
Content
Review