Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'between' to 'Data.Map' and 'Data.Set' #803

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

TOTBWF
Copy link
Contributor

@TOTBWF TOTBWF commented Oct 2, 2021

Patch Description

This PR adds 2 new functions to containers: Data.Map.between, and Data.Set.between. These functions will consume a map/set (respectively), and will filter the map/set so that the keys/values lie between an upper and lower bound (inclusive). This operation comes up pretty often in certain applications, and feels generally useful.

Notes

I'm by no means convinced that between is a good name, so if anyone feels that they have something more evocative please chime in!

@treeowl
Copy link
Contributor

treeowl commented Oct 2, 2021

Interesting. But this is only one of four kinds of "between". What about the others?

@TOTBWF
Copy link
Contributor Author

TOTBWF commented Oct 2, 2021

Perhaps a more generalized version could be added, along with something like:

data Bound a
  = Inclusive a
  | Exclusive a

If this was added, I still think the simpler inclusive version ought to be added for ergonomics sake, but I might just be biased because that's the one I need 🙂

@treeowl
Copy link
Contributor

treeowl commented Oct 2, 2021

Yeah, I think you're biased. I imagine just as many people will need the others.

@sjakobi
Copy link
Member

sjakobi commented Mar 9, 2022

For reference: GHC contains a related restrictMap utility:

https://gitlab.haskell.org/ghc/ghc/-/blob/18b9ba5602121c75f184f29e5b3e70bd7d4779c4/compiler/GHC/Cmm/Switch.hs#L467-470

Note that it's implemented using split, which I think might be more efficient.

@jwaldmann
Copy link
Contributor

why would implementation via split be more efficient? It's something like

between lo hi = snd . M.split lo . fst . M.split hi

and the first split it might waste some work to build (re-balance) a tree that will be cut down by the second split. Consider the extreme case lo = hi = some mid-range key. The suggested implementation will do no balancing, but just return the singleton (or empty) map.

I'm all for keeping interfaces small, but this could be a case where the current interface does not allow an efficient implementation.

@sjakobi
Copy link
Member

sjakobi commented Mar 10, 2022

Thanks for pointing this out, @jwaldmann! I think I misunderstood the current implementation and also missed the bit where split is strict in both the trees it returns.

In light of this, I think the current implementation is actually pretty good.

@jwaldmann
Copy link
Contributor

jwaldmann commented Mar 10, 2022

Strictness - I did not check, but yes, I thought that all balanced maps are strict in the structure. Note, my argument (too much work) also holds if only one of the components returned by split is evaluated.

Perhaps the suggested function (and implementation) can be generalized to take a "bitonic filter" predicate? (a function f :: Key -> Ordering that must be monotonic - which we don't check - , and the result is the submap of all keys k with f k == EQ) But that's speculative generality ... It would solve the inclusive/exclusive variants, we could write filterBitonic (\ k -> if k <= lo then LT else if k > hi then GT else EQ), etc. Well, looks ugly, and looks like more operations - unless it's inlined everywhere. Altough I guess the cost is not in doing these comparisons, but in the construction (the balancing) of the result map.

[EDIT] the between function could detect and short-cut the case lo > hi (giving empty result). My suggested filterBitonic function could not do this (it cannot know beforehand that there is no k with f k == EQ).

@jwaldmann
Copy link
Contributor

I made a note of this at https://gitlab.haskell.org/ghc/ghc/-/issues/21217

@jwaldmann
Copy link
Contributor

See also #778 . It says that implementation via split is "at most a constant factor off", which I believe for time, but not for space.

@treeowl
Copy link
Contributor

treeowl commented Jul 21, 2022

I wonder if we can generalize this idea some more, taking some inspiration from mergeA. Suppose we have two predicates, p, q :: k -> Bool, with the following properties:

  1. If x <= y then p x <= p y and q x <= q y.
  2. For any x, q x <= p x.

Now p and q will divide any set/map into three portions:

  1. An initial part where p x = q x = False.
  2. A middle part where p x = True but q x = False.
  3. A final part where p x = q x = True.

We might want to do something different with each part: preserve it, drop it, or transform it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants