-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: 1.0 release? #40
Comments
My main question would be: do the That said, part of the reason I don't use them much has been due to the fact that the library is pre-1.0 and these seem the most likely item to be removed to me. In principal I like the idea of being able to specify something is a I think the main other changes I can think of would be API additions, and thus compatible. |
I think so. There are two main use cases I envision for them:
|
Yep, that all makes sense to me, and I hadn't considered point 1 at all. |
Another place that |
Now that For reference, not using |
Could you say more about this? In particular, I would like to hear more about its importance in public APIs. |
A good example for its use is a freshly parsed commit whose memory is backed by a buffer somewhere else. The goal is to make clear that these fields represent user-readable strings, but in Rust there is no way to do this without enforcing an encoding or a custom type. That custom type is It also works nicely for the mutable version of such a commit. When testing, working with Note that I put the From there it only proliferated and it's now used in no less than 13
To find a more concise answer: |
@Byron Ah wow, that is great feedback. I agree that using I don't think there is much work to be done to get When do you plan to release |
Thanks so much! Part of me was afraid I'd be told it's all wrong as it was so strongly discouraged 😅.
Also thanks for the prioritization, even though I think no matter how long it takes you, I will be slower. If it happens at the end of the year I would be served well already.
|
Ah perfect. My time is super limited and devoting pretty much all of what little free time I have to |
Having this stable would also enable its use with the riot-wrappers crate, where for example process names are exposed (they're by all probability ASCII even, but an unsafe assume-it-is opens up for UB if a C user on the same system does something weird, and converting them to &str means pulling in UTF-8 checks, and either fallible operations or panics). No hurry from me (I'll need to wait with this for my next API bump anyway), just a data point, and thanks for working on this! |
Since there hasn't been a lot of movement in the past 9 months and since I am hereby happily offering to step in as collaborator or maintainer and serve as helping hand for whatever @BurntSushi deems right. Thank a lot :)! |
It currently uses 'char::is_whitespace', but this is more of an implementation detail. While 'char::is_whitespace' is available in 'core', it's plausible that we might use our own data some data. In particular, 'trim' already uses its own data. I believe this is the only routine that makes direct use of some kind of Unicode data that wasn't previously gated behind the 'unicode' feature. Ref #40
It currently uses 'char::is_whitespace', but this is more of an implementation detail. While 'char::is_whitespace' is available in 'core', it's plausible that we might use our own data some data. In particular, 'trim' already uses its own data. I believe this is the only routine that makes direct use of some kind of Unicode data that wasn't previously gated behind the 'unicode' feature. Ref #40
It currently uses 'char::is_whitespace', but this is more of an implementation detail. While 'char::is_whitespace' is available in 'core', it's plausible that we might use our own data some data. In particular, 'trim' already uses its own data. I believe this is the only routine that makes direct use of some kind of Unicode data that wasn't previously gated behind the 'unicode' feature. Ref #40
Folks, I've updated the top comment of this issue to include a list of breaking changes (all of which are minor). My plan is to push out the 1.0 release on Monday, July 11, 2022. So please, if you have comments or feedback or other ideas for breaking changes, now is the time to do it.
For what it's worth, it would have probably been an order of magnitude more work to onboard someone else to do the 1.0 release than to just do it myself. (Which ended up taking almost a full day of work.) It's important to remember this when offering to maintain a project. It's not as simple as me just clicking a button and giving someone permissions. :-) |
That's certainly true, yet I find it worth clarifying that my offer wasn't limited to the 1.0 release. In any case, thanks for this wonderful |
I sincerely hope that the version of Thank you for such a fundamental building block of the ecosystem. |
Note that I've published |
Indeed, this is I believe a good point. In fact, all of the methods on |
This comment was marked as off-topic.
This comment was marked as off-topic.
OK, the docs for what's currently on master have been published: https://docs.rs/bstr/1.0.0-pre.1/bstr/ (I haven't made the changes to stop consuming |
Can you create a new issue with these? That is, just copy and paste your helper routines. (Or if they fit better in an existing issue, that's fine too.) |
I've added another breaking change: restructure serde feature flags Now that bstr has an 'alloc' feature, we need to rethink how we setup |
I've added another breaking change: several methods on |
Would it be worth bumping the MSRV to 1.60 and use weak / namespaced feature flags? This way you have
|
@epage Yeah I mused about this here: #111 (comment) Basically, I kind of feel like 1.60 is really way too new. My first approximation is to track Debian stable (which is at Rust 1.48), but in principle, I'm okay with something newer. Unfortunately, there does seem to be a fundamental conflict here. Namely, if we move forward with the existing trichotomy of serde features, then it's going to have to stay that way since I think moving to the scheme you propose would be a breaking change. (Not the MSRV bump, that's fine, but changing the features around.) Since I don't see myself putting out a I personally don't mind being a bit aggressive, but I suppose it depends on my users. @lopopolo @thomcc @Byron @TethysSvensson How do you feel about Rust 1.60 as the MSRV? |
With respect to MSRV, if Rust 1.60 is too new for some folks, then I suppose they could stick with |
Given there's a question open here, I'm going to delay the 1.0 release another week. |
Speaking of
In the docs, it says
(1) could be simplified, depending on what is done with Should the existing functions have more specific names to open the door for the more general functions to be added later, if possible? Non-blocker for 1.0: would it also be worth pointing people to |
As a usability note, I wrote an application with I'm assuming this is fully intentional and won't be changing but figured it was still worth noting. EDIT: Split out into #117 |
@BurntSushi 1.60.0 MSRV works for me. Applications I build with On a related note, I've been using the weak features and namespaced features in recent cargo and have found them to be quite pleasant to work with. Using |
Right yeah. That's what bstr 0.1 was: https://docs.rs/bstr/0.1.4/bstr/ |
I think it's probably the right choice in this situation, given that you are trying to stabilize. In planus, our base policy is to only bump the MSRV to versions that are at least six months old, which would mean that we could not start using bstr until October. That being said, it's a policy we try to follow, but break when there is sufficient reason to do so. (While writing this, I am also realizing that our MSRV simultaneously says 4 versions of rust and 6 months, which are two very different things) |
@lopopolo Yeah the new Cargo feature stuff is awesome. I very much like that it gives more control over what's a public feature. @epage See #5 and #8 for more details on why Happy to answer more questions on that but would prefer opening a new issue for it if you have more questions. |
I'd be OK with it, thanks for asking. |
Hi @BurntSushi I know this is late in the game for 1.0, but I wanted to float an idea of an optional dependency on Lines 463 to 471 in 0d9d222
I'm not sure how you'd want to structure this, but maybe an optional Edit: coming back to this, I'm actually not sure whether I should expect the DFA in bstr or SIMD routines in simdutf8 to be faster. I just associate SIMD with 🏎️ |
Yeah that's not a 1.0 concern. I'm unlikely to take a dependency for that. Instead, I would rather see the implementation ported to bstr or maybe even memchr. That said, I've never looked at it in depth, so I'm unsure of its complexity. Feel free to open a new issue about this to avoid cluttering up the 1.0 release thread. For this thread, I would really like to keep it scoped to 1.0 concerns. |
September update on Rust versions in the Linux world:
1.60 seems a completely reasonable MSRV to start with for 1.0. Linux distros will catch up to 1.60 "very soon", and the 0.2 version is still usable until then. You could also document that the MSRV will stay up to date with Debian Stable once they eventually get past 1.60, or whatever other reference point. |
And switch to Rust 2011 and 'resolver = 2'. The idea here is to overhaul bstr features to make use of the new resolver in Cargo. But this requires Rust 1.60. This does make the MSRV a bit newer than I would hope, but it's at least a few releases old at this point. And if I don't do this now, then I either have to accept a sub-optimal feature setup for 1.0 forever, or delay 1.0 until Rust 1.60 is "old enough." Instead, we can just bump to Rust 1.60 now. For people too squeamish to move to such a new Rust, they can stick to bstr 0.2.0 until they're ready to move on. Ref: #40 (comment)
This gets rid of the 'serde1-{std,alloc,core}' features and just replaces it with a single proper 'serde' feature. We make it work as one would expect with the other crate features by using the new 'dep:' and 'pkg?' Cargo.toml feature syntax. This is a breaking change because the 'serde1' features no longer exist. Instead, you just need to enable the 'serde' feature and it should automatically do the right thing depending on whether you also have 'std', 'alloc' or neither set. Ref: https://doc.rust-lang.org/cargo/reference/features.html Ref: #40 (comment)
And switch to Rust 2011 and 'resolver = 2'. The idea here is to overhaul bstr features to make use of the new resolver in Cargo. But this requires Rust 1.60. This does make the MSRV a bit newer than I would hope, but it's at least a few releases old at this point. And if I don't do this now, then I either have to accept a sub-optimal feature setup for 1.0 forever, or delay 1.0 until Rust 1.60 is "old enough." Instead, we can just bump to Rust 1.60 now. For people too squeamish to move to such a new Rust, they can stick to bstr 0.2.0 until they're ready to move on. Ref: #40 (comment)
This gets rid of the 'serde1-{std,alloc,core}' features and just replaces it with a single proper 'serde' feature. We make it work as one would expect with the other crate features by using the new 'dep:' and 'pkg?' Cargo.toml feature syntax. This is a breaking change because the 'serde1' features no longer exist. Instead, you just need to enable the 'serde' feature and it should automatically do the right thing depending on whether you also have 'std', 'alloc' or neither set. Ref: https://doc.rust-lang.org/cargo/reference/features.html Ref: #40 (comment)
OK, I've rejiggered/simplified the features (just a My new 1.0 release date is September 6, 2022. If there are any last comments on backward incompatible changes, now is the time to make them. :) |
Done! 1.0 tag: https://github.com/BurntSushi/bstr/releases/tag/1.0.0 Blog post: https://blog.burntsushi.net/bstr/ |
Congrats @BurntSushi! This is huge. Thank you so much for all of the effort put into this high quality, foundational crate. bstr makes so much possible for me and my projects. 🙇 |
For those coming here that don't know what
bstr
is: it is a string library for&[u8]
. The essential difference between the strings inbstr
and the&str
type in std is thatbstr
treats&[u8]
as conventionally UTF-8 instead of requiring that it be UTF-8. Its main utility is in contexts where you believe your data is UTF-8 (but it might not be completely UTF-8) and you either don't have any information about what its actual encoding is or do not want to pay for the UTF-8 validity check. A common example of this is reading data from files. Thebstr
documentation says a lot more.This issue is about releasing 1.0. Since I do not currently have any plans for a 2.0, I would like to get as many eyes on this as possible. If you have any feedback with respect to API breaking changes, I would love to hear about it.
OK, so I promise that the 1.0 release is imminent. Here is an exhaustive list of planned breaking API changes, all of which are currently present on
master
(some brought in via #104, others via #123):Bytes::as_slice
is renamed toBytes::as_bytes
.ByteVec::into_os_string
now returnsResult<OsString, FromUtf8Error>
instead ofResult<OsString, Vec<u8>>
.ByteVec::into_path_buf
now returnsResult<PathBuf, FromUtf8Error>
instead ofResult<PathBuf, Vec<u8>>
.Find<'a>
has been changed toFind<'h, 'n>
, which represents the lifetimes of both the haystack and the needle, instead of the shorter of the two.FindReverse<'a>
has been changed toFindReverse<'h, 'n>
, which represents the lifetimes of both the haystack and the needle, instead of the shorter of the two.Split<'a>
has been changed toSplit<h, 's>
, which represents the lifetimes of both the haystack and the splitter, instead of the shorter of the two.SplitReverse<'a>
has been changed toSplitReverse<'h, 's>
, which represents the lifetimes of both the haystack and the splitter, instead of the shorter of the two.SplitN<'a>
has been changed toSplitN<h, 's>
, which represents the lifetimes of both the haystack and the splitter, instead of the shorter of the two.SplitNReverse<'a>
has been changed toSplitNReverse<h, 's>
, which represents the lifetimes of both the haystack and the splitter, instead of the shorter of the two.ByteSlice::fields
is now gated behind theunicode
feature. Previously, it was available unconditionally.serde1
has been renamed toserde1-std
, andserde1-nostd
has been split intoserde1-alloc
andserde1-core
.BufReadExt::for_byte_line
now accepts&mut self
instead ofself
.BufReadExt::for_byte_record
now accepts&mut self
instead ofself
.BufReadExt::for_byte_line_with_terminator
now accepts&mut self
instead ofself
.BufReadExt::for_byte_record_with_terminator
now accepts&mut self
instead ofself
.OsStr
andPath
conversion routines had their API docs tweaked slightly so that they could defer to a possibleOsStr::as_bytes
(andOsStr::from_bytes
) routine in the future, if it's added. But their behavior otherwise currently remains the same.serde1-*
features have been dropped.bstr
now just has aserde
feature and uses the newdep:
andpkg?
syntax so that it will combine as one would expect with other features.ByteSlice::copy_within_str
has been removed, sinceslice::copy_within
has been stable since Rust 1.37.slice::copy_within
does the exact same thing.Assuming this is an exhaustive list, and given that these are all very minor changes, I'm hopefully that migration from
0.2
to1.0
will be very easy. Hopefully requiring no changes in most cases.My plan is to release 1.0 on
July 11, 2022July 18, 2022September 6, 2022. If you have feedback to give, please do so now. :-)Note that I've published
1.0.0-pre.3
to crates.io. Docs: https://docs.rs/bstr/1.0.0-pre.3(Below is the message I initial wrote a couple years ago. I'm way late to the party.)
It has been almost a year since I released
bstr 0.2
with the major breaking change of moving most of the routines to extension traits. It seems like this has been a success. Namely,bstr
's reverse dependency list is growing. More generally, I personally like working with the new API better than the old one. While I still bemoan the loss of a distinct type and its correspondingDebug
impl as the One True Byte String, I think the benefits of the extension trait API have ended up outweighing that cost.I've been giving some thought to bstr's API and its future evolution, and nothing immediately comes to mind in terms of breaking changes. That is, most everything I can think of are API additions, or at worst, deprecations. The only breaking change I can think of is to more carefully audit which API routines are available when
unicode
mode is disabled. I just want to make sure I'm not boxing myself into any corners there. (e.g., Some extant implementations might currently rely on std for its Unicode support, but it may wind up being the case that we want to re-implement some of those, which will require bringing in our own Unicode data. If that occurs, then those APIs should be gated behind theunicode
feature.)Otherwise, my feeling is that, unless I hear otherwise, I will make a
1.0
release in a few months. June 2020 is the 1 year anniversay of the 0.2 release, so that sounds like a good a time as any.Thoughts?
cc @thomcc I know you've done some work on bstr and are actually using it, so would definitely appreciate if you have any thoughts here! Mostly what I'm looking for are things that we might want to do that will break the current API. While 1.0 doesn't necessarily mean "breaking changes must stop," I generally try to commit to a long termish period of stability for each major version in core libraries.
The text was updated successfully, but these errors were encountered: