Skip to content

Commit

Permalink
perf: Add bounds for haplotype matrix
Browse files Browse the repository at this point in the history
We know that the upper bound for the value of any cell in the occurence
matrix has to be equal to or less than the number of reads. Leverage that
knowledge to shrink the size of arrays that are needed.
  • Loading branch information
MillironX committed Dec 22, 2023
1 parent c605145 commit 4928e71
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion src/haplotypecalling.jl
Original file line number Diff line number Diff line change
Expand Up @@ -253,7 +253,16 @@ dimensional matrix.
function occurrence_matrix(
haplotype::AbstractArray{Variation{S,T}}, reads::AbstractArray{Haplotype{S,T}}
) where {S<:BioSequence,T<:BioSymbol}
hapcounts = SparseArray{UInt}(undef, Tuple(repeat([2], length(haplotype))))
Q = UInt
for int_type in [UInt8, UInt16, UInt32, UInt64, UInt128]
if length(reads) < typemax(int_type)
Q = int_type
break
end #if
error("Too many reads to represent in memory")
end #for

hapcounts = SparseArray{Q}(undef, Tuple(repeat([2], length(haplotype))))

for read in reads
coordinates = zeros(Int, size(haplotype))
Expand Down

0 comments on commit 4928e71

Please sign in to comment.