-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Right-aligned numbers in dataframe printount #7378
Comments
I'd also add a Rust tag as the python's |
I'm not sold on this as the default yet, if only because our float formatting doesn't fix the number of decimals. I think if we could also add that at the same time (eg: default to 6dp with an option to modify) then it might be more compelling? Example: import polars as pl
df = pl.DataFrame({
"str": ["aaa", "bbb", "ccc"],
"flt": [1.23456789,12435.645,31.999],
}) Currently...
vs (right aligned)
As the decimal point is still all over the place (as dp not constant) we don't actually gain much (any?) readability 🤔 However, with fixed dp precision, such as...
...or...
...I think it would have a lot more value, as the decimal point (and hence the magnitude of the value) is immediately comparable between rows. |
Yeah, I agree for decimals it's weird. An alternative that may be nonstandard is to align decimals but not display all, e.g.:
Ok, please don't do that, it's ugly as heck. |
Let's say that would be... "novel" :) |
I like 3 decimals as default, with the option to increase. |
While on the topic of float formatting, how should we handle scientific notation? It kinda messes up the alignment: pl.Config.set_float_precision(3)
pl.DataFrame({
'a': [45231.1, 2.22, 99999999.333],
'b': [4.10, 115.23, 6.3200000004570000024],
'c': [714.3, 8.424, 9.24222]
})
shape: (3, 3)
┌───────────┬─────────┬─────────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═══════════╪═════════╪═════════╡
│ 45231.100 ┆ 4.100 ┆ 714.300 │
│ 2.220 ┆ 115.230 ┆ 8.424 │
│ 1.000e8 ┆ 6.320 ┆ 9.242 │
└───────────┴─────────┴─────────┘ |
This might be helpful on that front? Also, let's go with 6dp to start with; I remember implementing the same sort of thing years ago in JPMorgan and the feedback was that 3dp was actually too few for a lot of common cases; better to offer a fuller picture by default, and have the option to tune it down to people's preferences / use-cases. Update: actually, as a default, let's not enable fixed-precision decimal places at all; having the option will allow us to experiment with different values first. |
Sorry, perhaps I wasn't clear, I meant to ask how would we handle the case where we have mixed notations in a single column, like we do in column a in the following example:
These currently can come about when some of the values are above a certain magnitude or length threshold. I suppose one solution could be to simply always display the floats in scientific notation, no matter their magnitude, as shown in column b. However, this would mean column c would be displayed as column d, which to me feels like a downgrade in terms of readability. Another approach would be to use the scientific notation on a per column basis when one of the values in that column is over the threshold. The downside is that, for some use cases, in a large enough dataset there is bound to be a single outlier value that would cause the whole column of otherwise small values to be displayed in scientific notation. Lastly, we could apply this per-column rule while only considering the values currently being printed, rather than the whole column. However, this would mean that printing different df slices could potentially print the same column in different notations, which would be rather unintuitive. I'm almost tempted not to worry about cases like column a as there is not much point in aligning those values in different notations, as even when aligned they wouldn't be comparable due to their different magnitudes. It just really looks ugly though. |
Good point... I think the best/straightforward option for now probably is This would probably net the best consistency, though with the downside that scientific notation is most likely to kick-in when the magnitude is really large, and you'd probably appreciate the brevity, hmm. What do you think? Stick with |
I have implemented With the other option it'd be more of a case of not packing the value in the first place. Some threshold should exist though, as printing Perhaps the threshold should be its own setting too? Say The issue raised in that SO thread is also worth implementing, but I just haven't gotten around to that yet :-) |
Nice; I'll review shortly :) FYI: I spotted we have |
I think this fell by the wayside, any chance of reviving? |
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
It's pretty standard in tables to right-align numbers and left-align text. It looks a bit prettier IMO.
Reproducible example
Expected behavior
Installed versions
The text was updated successfully, but these errors were encountered: