-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes to FillImputer #289
Fixes to FillImputer #289
Conversation
if Missing <: elscitype(vnew) | ||
w = copy(vnew) # transform must be non-mutating | ||
w[ismissing.(w)] .= filler | ||
w_tight = convert.(nonmissing(eltype(w)), w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ablaom I know this has already been merged. I hope you don't mind me making a slight change here.
w = copy(vnew) # transform must be non-mutating
w[ismissing.(w)] .= filler
w_tight = convert.(nonmissing(eltype(w)), w)
the above code has too much allocations (w = copy(vnew)
and w_tight = convert.(nonmissing(eltype(w)), w)
. Since a new array must always be created due the requirement that transform
must be non-mutating , i feel that the following rewrite is slightly more efficient (Any slight increase in efficiency is required. right?)
w_tight = similar(vnew, nonmissing(eltype(vnew)))
@inbounds for i in eachindex(vnew)
ismissing(vnew[i]) ? (w_tight[i] = filler ) : (w_tight[i] = vnew[i])
end
Maybe the following contrived example can help show why?
using Random, BenchmarkTools
function h1!(vnew, filler) #code avoiding double allocation
w = similar(vnew, nonmissingtype(eltype(vnew)))
@inbounds for i in eachindex(vnew)
ismissing(vnew[i]) ? (w[i] = filler ) : (w[i] = vnew[i])
end
w
end
function h2!(vnew, filler) #code with double allocations
w = copy(vnew) # transform must be non-mutating
w[ismissing.(w)] .= filler
w_tight = convert.(nonmissingtype(eltype(w)), w)
w_tight
end
n = [repeat([missing],10000)..., rand(20000)...]; #array containing missing values
n1 = copy(n)
n2 = copy(n)
shuffle!(n);
n3 = copy(n);
n4 = copy(n);
julia> @btime h1!($n1, 0);
38.010 μs (2 allocations: 234.45 KiB)
julia> @btime h2!($n2, 0);
144.883 μs (9 allocations: 584.45 KiB)
julia> @btime h1!($n3, 0);
166.380 μs (2 allocations: 234.45 KiB)
julia> @btime h2!($n4, 0);
173.802 μs (9 allocations: 584.45 KiB)
h1!
compared to h2!
is slightly faster and involves little allocations (this becomes important if this code is called by other functions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes great idea! Can you make a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And thank you for investigating.
This PR addresses #287 and #286. To this end I have added a UnivariateFillImputer. This shall make refactoring after #288 easier, or might be a model for implementing #288.
To do: