Silent failure with nw_trace_scan_16 for larger sequences #46

ifiddes · 2019-11-07T20:55:45Z

I tried aligning a 81kb sequence to a 101kb sequence with nw_trace_scan_16 and the resulting CIGAR is just 1X. If I repeat this with nw_trace_scan_32, it works fine. Is this intended behavior?

Is there any heuristics I can use to predict which bit size I can use based on the length of the sequences?

The text was updated successfully, but these errors were encountered:

jeffdaily · 2019-11-07T21:49:45Z

You might try nw_trace_scan_sat, where "sat" stands for "saturate". It will try the _8 first, then _16, then _32. It will try to detect integer overflow during the calculation, where the bits for an integer fully saturate (all 1's). Unfortunately, if saturation is detected near the end of the _16 attempt, that's a lot of wasted work. So your performance might suffer.

Honestly, _32 should the widest bit you need for most alignments. The _64 will be quite slow because most CPU ISAs don't have a complete set of 64-bit integer vector operations so I'm forced to emulate them with something slower.

A good heuristic would be to estimate what the biggest score would be if your two sequences were to align perfectly. Perfectly would be max(length(A), length(B)) * match score. If that solution fits into a 16-bit integer, you're probably okay. A 16-bit integer can store up to 216 = 65536. A 32-bit integer can store up to 232 = 2147483647. Your longest sequence was 101,000 characters, already longer than a 16-bit value can store.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Silent failure with nw_trace_scan_16 for larger sequences #46

Silent failure with nw_trace_scan_16 for larger sequences #46

ifiddes commented Nov 7, 2019

jeffdaily commented Nov 7, 2019

Silent failure with nw_trace_scan_16 for larger sequences #46

Silent failure with nw_trace_scan_16 for larger sequences #46

Comments

ifiddes commented Nov 7, 2019

jeffdaily commented Nov 7, 2019