Right-aligned numbers in dataframe printount #7378

mcrumiller · 2023-03-06T19:21:54Z

Polars version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Issue description

It's pretty standard in tables to right-align numbers and left-align text. It looks a bit prettier IMO.

Reproducible example

import polars as pl

print(
    pl.Dataframe({
        'a': ['aa', b', 'cc', 'd', 'ee', 'f', 'gg', 'h'],
        'b': [1, 31, 2, 4, 5, 66, 99, 103],
    })
)

shape: (8, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ aa  ┆ 1   │
│ b   ┆ 31  │
│ cc  ┆ 2   │
│ d   ┆ 4   │
│ ee  ┆ 5   │
│ f   ┆ 66  │
│ gg  ┆ 99  │
│ h   ┆ 103 │
└─────┴─────┘

Expected behavior

shape: (8, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ aa  ┆   1 │
│ b   ┆  31 │
│ cc  ┆   2 │
│ d   ┆   4 │
│ ee  ┆   5 │
│ f   ┆  66 │
│ gg  ┆  99 │
│ h   ┆ 103 │
└─────┴─────┘

Installed versions

---Version info---
Polars: 0.16.11
Index type: UInt32
Platform: Windows-10-10.0.19044-SP0
Python: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
---Optional dependencies---
pyarrow: 9.0.0
pandas: 1.5.3
numpy: 1.24.2
fsspec: <not installed>
connectorx: 0.3.0
xlsx2csv: 0.8
deltalake: <not installed>
matplotlib: 3.6.1

The text was updated successfully, but these errors were encountered:

alicja-januszkiewicz · 2023-03-10T09:56:29Z

I'd also add a Rust tag as the python's DataFrame.__str__ simply calls the rust's PyDataFrame.as_str, so impl Display for DataFrame would need to be updated.

alexander-beedie · 2023-03-13T16:39:49Z

I'm not sold on this as the default yet, if only because our float formatting doesn't fix the number of decimals. I think if we could also add that at the same time (eg: default to 6dp with an option to modify) then it might be more compelling?

Example:

import polars as pl
df = pl.DataFrame({
    "str": ["aaa", "bbb", "ccc"],
    "flt": [1.23456789,12435.645,31.999],
})

Currently...

shape: (3, 2)
┌─────┬───────────┐
│ str ┆ flt       │
│ --- ┆ ---       │
│ str ┆ f64       │
╞═════╪═══════════╡
│ aaa ┆ 1.234568  │
│ bbb ┆ 12435.645 │
│ ccc ┆ 31.999    │
└─────┴───────────┘

vs (right aligned)

shape: (3, 2)
┌─────┬───────────┐
│ str ┆       flt │
│ --- ┆       --- │
│ str ┆       f64 │
╞═════╪═══════════╡
│ aaa ┆  1.234568 │
│ bbb ┆ 12435.645 │
│ ccc ┆    31.999 │
└─────┴───────────┘

As the decimal point is still all over the place (as dp not constant) we don't actually gain much (any?) readability 🤔

However, with fixed dp precision, such as...

shape: (3, 2)
┌─────┬──────────────┐
│ str ┆          flt │
│ --- ┆          --- │
│ str ┆          f64 │
╞═════╪══════════════╡
│ aaa ┆     1.234568 │
│ bbb ┆ 12435.645000 │
│ ccc ┆    31.999000 │
└─────┴──────────────┘

...or...

shape: (3, 2)
┌─────┬───────────┐
│ str ┆       flt │
│ --- ┆       --- │
│ str ┆       f64 │
╞═════╪═══════════╡
│ aaa ┆     1.234 │
│ bbb ┆ 12435.645 │
│ ccc ┆    31.999 │
└─────┴───────────┘

...I think it would have a lot more value, as the decimal point (and hence the magnitude of the value) is immediately comparable between rows.

mcrumiller · 2023-03-13T16:47:37Z

Yeah, I agree for decimals it's weird.

An alternative that may be nonstandard is to align decimals but not display all, e.g.:

shape: (3, 2)
┌─────┬──────────────┐
│ str ┆          flt │
│ --- ┆          --- │
│ str ┆          f64 │
╞═════╪══════════════╡
│ aaa ┆     1.234568 │
│ bbb ┆ 12435.645    │
│ ccc ┆    31.999    │
└─────┴──────────────┘

Ok, please don't do that, it's ugly as heck.

alexander-beedie · 2023-03-13T16:53:25Z

Ok, please don't do that, it's ugly as heck.

Let's say that would be... "novel" :)

mcrumiller · 2023-03-13T16:58:01Z

I like 3 decimals as default, with the option to increase.

alicja-januszkiewicz · 2023-03-13T23:06:59Z

While on the topic of float formatting, how should we handle scientific notation? It kinda messes up the alignment:

pl.Config.set_float_precision(3)
pl.DataFrame({
    'a': [45231.1, 2.22, 99999999.333], 
    'b': [4.10, 115.23, 6.3200000004570000024], 
    'c': [714.3, 8.424, 9.24222]
})

shape: (3, 3)
┌───────────┬─────────┬─────────┐
│         a ┆       b ┆       c │
│       --- ┆     --- ┆     --- │
│       f64 ┆     f64 ┆     f64 │
╞═══════════╪═════════╪═════════╡
│ 45231.100 ┆   4.100 ┆ 714.300 │
│     2.220 ┆ 115.230 ┆   8.424 │
│   1.000e8 ┆   6.320 ┆   9.242 │
└───────────┴─────────┴─────────┘

alexander-beedie · 2023-03-14T05:42:14Z

While on the topic of float formatting, how should we handle scientific notation? It kinda messes up the alignment:

This might be helpful on that front?

Alignment of floating point numbers printed in scientific notation

Also, let's go with 6dp to start with; I remember implementing the same sort of thing years ago in JPMorgan and the feedback was that 3dp was actually too few for a lot of common cases; better to offer a fuller picture by default, and have the option to tune it down to people's preferences / use-cases.

Update: actually, as a default, let's not enable fixed-precision decimal places at all; having the option will allow us to experiment with different values first.

alicja-januszkiewicz · 2023-03-14T21:27:55Z

Sorry, perhaps I wasn't clear, I meant to ask how would we handle the case where we have mixed notations in a single column, like we do in column a in the following example:

┌─────────┬─────────┬─────────┬─────────┐
│       a ┆       b ┆       c ┆       d │
│     --- ┆     --- ┆     --- ┆     --- │
│     f64 ┆     f64 ┆     f64 ┆     f64 │
╞═════════╪═════════╪═════════╪═════════╡
│  24.010 ┆ 2.401e1 ┆   1.020 ┆ 1.020e0 │
│ 8.252e7 ┆ 8.252e7 ┆  14.500 ┆ 1.450e1 │
│ 342.040 ┆ 3.420e2 ┆ 342.042 ┆ 3.420e2 │
│ 4.295e6 ┆ 4.295e6 ┆   9.420 ┆ 9.420e0 │
│   4.400 ┆ 4.400e0 ┆   4.422 ┆ 4.422e0 │
│ 9.922e7 ┆ 9.922e7 ┆ 122.230 ┆ 1.222e2 │
└─────────┴─────────┴─────────┴─────────┘

These currently can come about when some of the values are above a certain magnitude or length threshold.

I suppose one solution could be to simply always display the floats in scientific notation, no matter their magnitude, as shown in column b. However, this would mean column c would be displayed as column d, which to me feels like a downgrade in terms of readability.

Another approach would be to use the scientific notation on a per column basis when one of the values in that column is over the threshold. The downside is that, for some use cases, in a large enough dataset there is bound to be a single outlier value that would cause the whole column of otherwise small values to be displayed in scientific notation.

Lastly, we could apply this per-column rule while only considering the values currently being printed, rather than the whole column. However, this would mean that printing different df slices could potentially print the same column in different notations, which would be rather unintuitive.

I'm almost tempted not to worry about cases like column a as there is not much point in aligning those values in different notations, as even when aligned they wouldn't be comparable due to their different magnitudes. It just really looks ugly though.
Edit: I suppose their magnitudes would be comparable.

alexander-beedie · 2023-03-15T21:22:26Z

I meant to ask how would we handle the case where we have mixed notations in a single column

Good point... I think the best/straightforward option for now probably is a, despite aesthetic reservations ;) Another option (which is what I initially thought the SO answer was referring to - oops) is to unpack the value according to the magnitude of eNN, such that 8.252e7 => 82520000.000, ditching scientific notation entirely (ideally not losing any precision that may be 'behind the scenes', as it were).

This would probably net the best consistency, though with the downside that scientific notation is most likely to kick-in when the magnitude is really large, and you'd probably appreciate the brevity, hmm. What do you think? Stick with a for now and iterate in a second pass, or unpack so everything lines-up? (I agree with your thoughts about c => d).

alicja-januszkiewicz · 2023-03-16T02:41:33Z

I have implemented a in #7475 for the time being.

With the other option it'd be more of a case of not packing the value in the first place. Some threshold should exist though, as printing 1e30 or 1e60 in normal notation seems like a bad default.

Perhaps the threshold should be its own setting too? Say POLARS_FMT_NUM_LEN, or perhaps we could rename POLARS_FMT_STR_LEN to something like POLARS_FMT_COL_LEN and use that for both?

The issue raised in that SO thread is also worth implementing, but I just haven't gotten around to that yet :-)

alexander-beedie · 2023-03-16T16:20:16Z

I have implemented a in #7475 for the time being.

Nice; I'll review shortly :)

FYI: I spotted we have fn fmt_float, which seems to address some of these issues; perhaps we can add some extra options there, as needed? Something to look at in a second pass.

mcrumiller · 2023-07-13T14:53:45Z

I think this fell by the wayside, any chance of reviving?

mcrumiller added bug Something isn't working python Related to Python Polars labels Mar 6, 2023

mcrumiller changed the title ~~Numbers should be right-aligned in dataframe printount~~ Right-aligned numbers in dataframe printount Mar 6, 2023

zundertj added enhancement New feature or an improvement of an existing feature and removed bug Something isn't working labels Mar 7, 2023

alicja-januszkiewicz mentioned this issue Mar 10, 2023

feat(rust): right-align numeric columns #7475

Merged

ritchie46 closed this as completed in #7475 Oct 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Right-aligned numbers in dataframe printount #7378

Right-aligned numbers in dataframe printount #7378

mcrumiller commented Mar 6, 2023 •

edited

Loading

alicja-januszkiewicz commented Mar 10, 2023 •

edited

Loading

alexander-beedie commented Mar 13, 2023 •

edited

Loading

mcrumiller commented Mar 13, 2023

alexander-beedie commented Mar 13, 2023 •

edited

Loading

mcrumiller commented Mar 13, 2023

alicja-januszkiewicz commented Mar 13, 2023 •

edited

Loading

alexander-beedie commented Mar 14, 2023 •

edited

Loading

alicja-januszkiewicz commented Mar 14, 2023 •

edited

Loading

alexander-beedie commented Mar 15, 2023

alicja-januszkiewicz commented Mar 16, 2023

alexander-beedie commented Mar 16, 2023

mcrumiller commented Jul 13, 2023

Right-aligned numbers in dataframe printount #7378

Right-aligned numbers in dataframe printount #7378

Comments

mcrumiller commented Mar 6, 2023 • edited Loading

Polars version checks

Issue description

Reproducible example

Expected behavior

Installed versions

alicja-januszkiewicz commented Mar 10, 2023 • edited Loading

alexander-beedie commented Mar 13, 2023 • edited Loading

mcrumiller commented Mar 13, 2023

alexander-beedie commented Mar 13, 2023 • edited Loading

mcrumiller commented Mar 13, 2023

alicja-januszkiewicz commented Mar 13, 2023 • edited Loading

alexander-beedie commented Mar 14, 2023 • edited Loading

alicja-januszkiewicz commented Mar 14, 2023 • edited Loading

alexander-beedie commented Mar 15, 2023

alicja-januszkiewicz commented Mar 16, 2023

alexander-beedie commented Mar 16, 2023

mcrumiller commented Jul 13, 2023

mcrumiller commented Mar 6, 2023 •

edited

Loading

alicja-januszkiewicz commented Mar 10, 2023 •

edited

Loading

alexander-beedie commented Mar 13, 2023 •

edited

Loading

alexander-beedie commented Mar 13, 2023 •

edited

Loading

alicja-januszkiewicz commented Mar 13, 2023 •

edited

Loading

alexander-beedie commented Mar 14, 2023 •

edited

Loading

alicja-januszkiewicz commented Mar 14, 2023 •

edited

Loading