-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String key ergonomics #13
Comments
I like the idea of a Perhaps |
Definitely so, although there might be a slight misunderstanding here: in the opening message, I considered equality of byte slices (“bytewise equality”) a concrete property as opposed to abstract; with abstract, I meant things like Perhaps you meant instead that keys should be agnostic to ownership, which is the mandate of pub fn get<'a, Q: ?Sized>(&self, key: &'a Q) -> Option<&V>
where
Q: Borrow<K::Borrowed>
It's as you say: there can be no “canonical” order over UTF-8 strings other than the lexicographic order on bytes, which is the one we already use; going beyond requires collation and locales, but even those only concern islands of codepoints (because there is no meaningful ordering relationship between symbols from different scripts, or between letters and punctuation, and so on).
I agree that it's reasonable to isolate different capabilities to different traits, and similarly apply bounds to affected methods only. Moreover, the notion of |
@tapeinosyne sorry for the late response! I have to admit that it never crossed my mind when I first wrote this that the equality/ordering between keys is strictly "bytewise" - mostly because when I wrote this, I was intending to use it strictly for bytes. The idea of using it for strings and other types came later (although I now find myself wanting to use it for paths.) I think it's definitely a must that keys be agnostic to ownership in the same manner as |
@tapeinosyne on the topic of |
I'd say so, certainly. (Although Going back to |
Gladly! Can't promise to be the most responsive for the next month or so but if you submit a PR I'll do my best to get to it in a timely manner. You want me to take a look at it as-is or do you have changes to make first? |
By all means, take your time. (Hardly unreasonable, innit, considering that it took me four months to reply "yes" above!) The PR can already be reviewed for issues of substances with I'll update the opening message of #14 to better reflect this. |
@sdleffler, would you be available for revisiting this issue? I would be happy to revise or update #14 as we like. |
I would! There were a few bugfixes since #14 was submitted - would you mind rebasing #14? It honestly looks good enough to me now that I've finally had the time to go over it that I don't think I can argue for much in the way of revision. Let's get it rebased and then I'll check it out and probably merge it. |
Certainly. I'll need to read it over myself—it's been a while since I wrote it. |
A question: do we want to deprecate the |
Let's deprecate it for now. If it gets in the way of the API change, then it can go. |
#14 is ready for review, although documentation and tests are yet to be fully updated. The reason I have left them midway is that, hands-on, I rather feel that
All of which leaves [1] Notably, the |
Currently, string keys are rather unwieldy. They cannot be used directly in QP-tries, and must either be wrapped individually or funnelled through ancillary inherent methods – loose methods which exist as duplicates without a supporting trait.
Primarily, the issue arises from the requirement that keys be
Borrow<[u8]>
:Borrow
is concerned with abstract properties such asEq
orHash
, but qp-tries manipulate keys on a strictly structural basis, byte by byte. From this stems a certain tension that can be felt most clearly in the public API, where users must rely on different inherent methods depending on the key type.I outline here three possible paths to better key ergonomics, namely:
AsKey
trait,AsRef<[u8]>
rather thanBorrow<[u8]>
,Of the three, I believe the first would be the best choice, as it not only guarantees ergonomic string keys but also gives us the opportunity to admit more key types. I will follow up with a PR to get a better feel for the proposal.
Introduce an
AsKey
traitShift the bound on keys from
Borrow
to a public traitAsKey
(orRadical
, if we're feeling whimsical), which provides both a slice view – akin toBorrow
itself – and a byte view; the slice view subsumesBorrow
and the byte view makesBStr
wrappers redundant. (Note that qp-tries already requireBreak
as a non-std trait.) As an additional benefit,AsKey
could later be implemented for more types expected of a radix tree – notably integers, integer vectors and arrays, and possibly tuples.The basic idea can be illustrated thus:
The main disadvantage, here, is that coherence rules prevent us from providing much in the way of blanket impls: we can cover concrete types from the standard library, and that is about it. Nevertheless such impls should suffice, as users can rely on conversion traits and deref coercions at the call site.
(In order to provide blanket impls, we would need to parametrize the trait, turning it into
AsKey<Borrowed>
, and trackBorrowed
withPhantomData
internally. That should allow us to implement the trait forK : Borrow<[u8]>
andK : Borrow<str>
without overlap, by parametrizing each impl with its respective borrowed form. Not complicated, just… cluttered.)Make keys
AsRef<[u8]>
Relax the bounds from
Borrow<[u8]>
toAsRef<[u8]>
, allowing tries to accept string keys without further changes. This is by and large analogous to what is done now, internally, as we only ever work with byte slices. Other byte-based tries published on Crates.io (fst
comes to mind) also admitAsRef
keys.To justify the transition, I will note that
Borrow
is meant to enforce abstract properties, forming a contract between things that – be they owned or borrowed – hash the same way, remain equal underEq
, have the same order underOrd
, and so on. However, QP-tries need only rely on structural equality, i.e. they need only know the actual bytes that form a key, and nothing beyond that. From this perspective,AsRef<[u8]>
is better tailored to our needs thanBorrow
, as its purpose is to offer a cheap reference-to-reference conversion to the parameter, and a reference to[u8]
can be little else than an agnostic byte view.Having said that, the fact that trie keys would not follow
Eq
may prove confusing, and the new bounds should be clearly documented with explanations and examples.Wrap the trie, not the keys
Given that keys are homogeneous, there is little benefit in wrapping them individually and using ad-hoc methods to reduce boilerplate. Rather, we can introduce a newtype
StringTrie<V>
, wrapping a plain qp-trie overBorrow<[u8]>
keys. Trie methods are moved to a trait, whichStringTrie
can implement by delegation to the inner trie – with the exception of prefix lookup, whereStringTrie
requires index validation.There are degrees of sophistication to this approach and how generic it should be, but the resulting interface should allow users to forgo wrappers after trie instantiation:
This is the least impactful approach, in that it does not ultimately alter bounds, but merely lifts the newtype to the outer structure of the trie. However, it does not address the issues underlying string keys.
The text was updated successfully, but these errors were encountered: