-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(rust): replace lexical by atoi to parse integer #10655
Conversation
The failed test Furthermore, the following is also true: assert_eq!(Some(42), atoi::<u32>(b"42 is the answer to life, the universe and everything")); It doesn't seem to provide options to prevent this(at least I didn't find it), and I'm a bit unclear about what our guiding principles are, more lenient or more strict? 🤔 |
https://docs.rs/atoi/latest/atoi/trait.FromRadix10Checked.html#tymethod.from_radix_10_checked returns also the Btw, this is the code from that function: fn from_radix_10_checked(text: &[u8]) -> (Option<I>, usize) {
let max_safe_digits = max(1, I::max_num_digits_negative(nth(10))) - 1;
let (number, mut index) = I::from_radix_10(&text[..min(text.len(), max_safe_digits)]);
let mut number = Some(number);
// We parsed the digits, which do not need checking now lets see the next one:
while index != text.len() {
if let Some(digit) = ascii_to_digit(text[index]) {
number = number.and_then(|n| n.checked_mul(&nth(10)));
number = number.and_then(|n| n.checked_add(&digit));
index += 1;
} else {
break;
}
}
(number, index)
} I think we can inline that and raise on the |
Ah, thanks for the so quick reply! This is really a helpful suggestion 👍 , I will take a walk along this road tomorrow. |
Hi @ritchie46, I just inline(almost copy) this part of the code from The current implement does not support
We used to directly return Additionally, do we have some benchmark to cover this change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, do we have some benchmark to cover this change?
We don't have benchmarks. But I am definitely interested in some.
Probably hoist the parsing code into a generic function and benchmark that against lexical. This can be done independent of the csv parsing of course.
fn parse(bytes: &[u8]) -> Option<i32> { | ||
lexical::parse(bytes).ok() | ||
} | ||
macro_rules! impl_integer_parser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this work with generics and then a macro that dispatches to the generics?
crates/polars-io/src/csv/buffer.rs
Outdated
// `number += digit * signum`. | ||
let value = match sign { | ||
Sign::Plus => { | ||
let max_safe_digits = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me if this will have constant time. It should be constant computable if I am not mistaken. If we are creating generics, we could also set it on the type?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are creating generics, we could also set it on the type?
Intuitively, I like this idea. But when it came to implementation, I was a bit scratching my ears and cheeks, and I probably didn't find the trick.
Say we want pre-compute this value: i32::max_num_digits(nth(10))
and set it on the first const generics:
fn parse_from_radix_10<const MAX_DIGITS: usize, other_generics>{
// ........balabalabala
}
Due to neither max_num_digits
nor nth
being const fn
, the compiler is unhappy and it also preventing me from implementing nth
as const fn
. I don't have much experience in this area, I would like to hear your advice. 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going on holiday today, but I will see if I can find time to tinker with this.
I think you are correct about the const
not working, but the generic part should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will refactor it into a generic function, then macro only used for dispatching.
Most importantly: Have a pleasant holiday! 😄
I don't think we should merge this. This approach will be much slower than use anyhow::Result;
fn main() -> Result<()> {
let x: i32 = lexical::parse("2147483647")?;
let y: i32 = lexical::parse("2147483648")?;
dbg!(x, y);
Ok(())
}
|
Actually, let y: i32 = lexical::parse("9589934592")?;
dbg!(y); 1000000000 Perhaps we can fix it upstream? If that is not viable, we should implement an optimized integer parser ourselves or find another dependency, because radix-by-radix multiplying is rather slow. |
I think fixing |
Maybe we should benchmark atoi vs std. |
Yes, it sometimes warp overflow value instead of throw error. That's why we want to replace it by other lib.
This is just a PoC to demonstrate whether this is feasible. A reasonable solution is to find a way to benchmark it. If there is indeed a significant regression, feel free to close this. :) |
Also, looking at the code, @reswqa you copy/pasted significant chunks of code from the That's not ok! |
Sorry, I accidentally mark it as |
My bad. I asked to inline part of it to cache some part. We do need to mention the original code. |
Ah, I did not see that you asked for it. Either way we should mention it, but I think we should have a more optimized implementation regardless. Perhaps we can accept a simplified implementation knowing it's a speed regression for now and then find/develop a faster one later. I know I can do that if needed. |
Ah, That's not your fault. TBH, I realized this copyright problem at the beginning, but considering that it's only draft at the moment, so I'm slacking off(this is definitely not a good habit) :) You guys are doing something very meaningful (Polars are really great), which is also why I started to be active here. At first, I posted this on Discord to see if anyone knew the best alternative to |
Your help is very welcome. You contributed a lot of great patches in the last days. Really great to have your help! :) |
Closed as completed via #12512. |
This fixes #10635.