Store first value in Dict directly in innerjoin #29
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I saw some of the discussion in
JuliaData/DataFrames.jl#2340 and got
curious about what was possible.
This avoids allocating a Vector for the case where
l
does not havemultiple indices with the same value. For the smoke-test benchmark in
JuliaData/DataFrames.jl#2340 (comment),
this reduces allocations by half and overall runtime by 10%.
Most of the allocations still come from this
line
which it's much less clear how to reduce allocations in. I'm not sure
how much JuliaLang/julia#24909 affects
performance in this case. One option would be to heuristically
estimate the size of
out
based on the size ofl
andr
and callsizehint!
on it; this didn't seem to help in my testing.I realize I'm only optimizing a single method of
innerjoin
, but I'mnot super familiar with this field nor with the inner workings of this
package, so I leave it to you to decide if this is a worthwhile
complication and if it's relevant elsewhere in the package.