-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up fromList for IntMap #653
base: master
Are you sure you want to change the base?
Conversation
Make `fromList` and `fromListWithKey` for `IntMap` smarter. Rather than rebuilding the path from the root for each element, insert as many elements as possible into each subtree before backing out.
@jwaldmann, your review/benchmarking would be greatly appreciated. |
Make `fromList` for `IntSet` better for partially sorted input. Performance seems to be similar to the old implementation for random input, but nearly as fast as `fromDistinctAscList` for sorted or reverse sorted input. There are pathological cases where the new implementation is significantly but not horribly slower than the old. In particular, I noticed that ```haskell iterate (\n -> (n ^ 2) `rem` (2^32-1)) 20 ``` is pretty bad for the new implementation for some reason.
Interestingly, it seems that the new |
Hmm..... I have the feeling that part of the trouble with |
Hi. I was reading your code and did some experiments. Code: the description should make it more clear that Experiment: since the algorithm does not have look-ahead, it can be fooled. In the following, I am using
This is a ghci session on Data.IntMap.Internal (using your version of
Do you want to export
|
Thanks, @jwaldmann! If your results are for a |
Oh, and yes, I would prefer to export |
they were for
|
Could you do me a favor and also run some tests with longer lists? I saw much bigger differences for contiguous data in my own testing, where I was using millions of elements. |
one millon:
|
30 million:
|
It really looks like the effects we're seeing are real. The new |
Perhaps it makes sense to expose the old version as |
@3noch I'm not generally a big fan of cluttering the API with one or two line functions, especially when anyone likely to need them will almost certainly know how to write them. |
Perhaps it makes sense then to simply describe where |
@3noch, yes, definitely. I'm just still hoping there's a way to cut down on the regression.... |
Somewhat surprisingly it looks like we can indeed speed things up slightly by avoiding the repeated destruction of the same list, based around the following return type: data Inserted' a
= InsertedNil !(IntMap a)
| InsertedCons !(IntMap a) !Key a ![(Key,a)] See https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-hs-L102-L163 for code and https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-out for some benchmarks. (Functions: But for the most part I believe the slowdown for that adverserial alternating list case is the price one has to pay for checking, on each level of the tree, whether the new key fits or not, where the |
That is somewhat surprising. I wonder what the Core looks like. I'd have thought the best bet for that sort of thing would've been an unboxed sum as a result, but maybe GHC is doing something clever.... Did you experiment with passing along a key in the map and a mask the way you do for |
Yes, there is certainly some inherent arithmetic cost to that. All we can hope to do is minimize additional costs from allocation and pattern matching. |
No I didn't try that... it's not directly applicable. The point with P.S. the term "prefix" is hopelessly overloaded in this context... a) on a high level, a prefix is just a finite bit string that may be shorter than a word. b) in IntMap, it's also used for a key with the lower bits masked. c) in IntSet there's a split into a "prefix" and a "bitmask"... |
I'm not so sure. In the bad case, we call |
IIRC GHC's "enter" operation on the STG level actually takes several continuations for multi-constructor datatypes. So the |
insertMany' Nil k x kxs' = InsertedNil Nil
doesn't look so great, because it makes insertMany' lazy in the key and
list. Since we don't reach that anyway, it might as well force those things.
…On Sun, Jul 7, 2019, 7:41 PM Bertram Felgenhauer ***@***.***> wrote:
Somewhat surprisingly it looks like we can indeed speed things up slightly
by avoiding the repeated destruction of the same list, based around the
following return type:
data Inserted' a
= InsertedNil !(IntMap a)
| InsertedCons !(IntMap a) !Key a ![(Key,a)]
See
https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-hs-L102-L163
for code and
https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfl-out
for some benchmarks.
But for the most part I believe the slowdown for that adverserial
alternating list case is the price one has to pay for checking, on each
level of the tree, whether the new key fits or not, where the naive
version simply starts at the root again each time.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#653?email_source=notifications&email_token=AAOOF7JT3564HU25PV7FVJ3P6J5KPA5CNFSM4H5BDIM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLVJAA#issuecomment-509039744>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOOF7OLXSXZJOZI4QV7TMTP6J5KPANCNFSM4H5BDIMQ>
.
|
I changed that, but it had no effect on the running time. |
Probably got inlined away.... Did you check the CSE between link and
branchMask calls? If I'm not mistaken, link makes essentially the same
branchMask call you do, but with the arguments flipped.
…On Sun, Jul 7, 2019, 8:40 PM Bertram Felgenhauer ***@***.***> wrote:
insertMany' Nil k x kxs' = InsertedNil Nil doesn't look so great, because
it makes insertMany' lazy in the key and list. Since we don't reach that
anyway, it might as well force those things.
I changed that, but it had no effect on the running time.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#653?email_source=notifications&email_token=AAOOF7IAX3MVZCWFH2J2J5TP6KEGNA5CNFSM4H5BDIM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZLWLBY#issuecomment-509044103>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOOF7OKYZQ6NVWA7BIBSDDP6KEGNANCNFSM4H5BDIMQ>
.
|
Btw, we should also consider that the resulting speedup is miniscule and the code becomes more difficult to read as a result. |
No I had not checked that... and no, it did not do the CSE, regardless of the order of arguments to linkWithMask :: Mask -> Prefix -> IntMap a -> Prefix -> IntMap a -> IntMap a
linkWithMask m p1 t1 p2 t2
| zero p1 m = Bin p m t1 t2
| otherwise = Bin p m t2 t1
where
p = mask p1 m
{-# INLINE linkWithMask #-}
|
I'm currently doing a bit of triage on the open PRs. Could someone please summarize the current state of this PR, and possibly also sketch out what needs to be done until it can be merged? |
A lot going on here. I'll attempt a summary.
|
Thanks, @int-e! :) Could the benchmarks be merged separately, so they are not lost or bit-rotted, and so they could potentially be used for other attempts at speeding up Also, how about incorporating your code improvements into this PR? I guess the easiest way to do that would be if you'd make a PR against @treeowl's branch. Of course it would also be nice to make progress on the original problem, but since that is currently stalled, let's try to ensure that the value created so far is saved. |
I'm not going to work on this. The benchmarks are in a very ad-hoc state. Ideally, the benchmarks would use realistic inputs for As for my improvements, after looking at the benchmark data again, there's either no speedup at all or it's so small that it looks like noise. While my changes were relatively minor tweaks and refactorings, they make the code more complicated too, so they're almost certainly not worthwhile. The real question is... is there a better approach? But that requires some serious meditation (and time). |
Make
fromList
andfromListWithKey
forIntMap
smarter. Ratherthan rebuilding the path from the root for each element, insert
as many elements as possible into each subtree before backing out.