Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust): right-align numeric columns #7475

Merged
merged 11 commits into from
Oct 16, 2023

Conversation

alicja-januszkiewicz
Copy link
Contributor

@alicja-januszkiewicz alicja-januszkiewicz commented Mar 10, 2023

closes #7378

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature rust Related to Rust Polars labels Mar 10, 2023
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 13, 2023

Hey, thanks for this :)

I put a follow-up comment on the original issue; do you think you could update the PR to include an option for fixing the number of floating point decimal places offered? I suspect right-alignment would be a lot more attractive/useful then.

Also, if you could make this opt-in for now, via a suitable Config option and associated environment variable, we can test it out a bit before deciding to change the default behaviour 👍

@alicja-januszkiewicz
Copy link
Contributor Author

Hey, thanks for the reply. This looks doable for me, I will give it a shot.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 14, 2023

Hey, thanks for the reply. This looks doable for me, I will give it a shot.

Fantastic :)
I'd suggest introducing two new config options via the following environment vars:

  • POLARS_FMT_DECIMAL_PLACES: default to None, otherwise a +ve integer.
  • POLARS_FMT_TABLE_CELL_NUMERIC_ALIGNMENT: default to None, allowing "LEFT", "RIGHT", "CENTER" - same asPOLARS_FMT_TABLE_CELL_ALIGNMENT, which it should override for numeric cols (if not None).

Add them into the Config class on the python side (can pattern after set_tbl_cell_alignment), and then we can experiment with the two settings independently to see what might work best for a potential new default 👍

@alexander-beedie
Copy link
Collaborator

Nice, looks like it's almost there - and POLARS_FMT_FLOAT_PRECISION is a good choice (fits well with the existing float_precision param in write_excel/write_csv) 👍

@alicja-januszkiewicz
Copy link
Contributor Author

ready for review @alexander-beedie :)

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 21, 2023

Apologies for the delay; I managed to convince myself that GitHub was acting up last night after accidentally disabling JavaScript in my browser...

Have checked out the branch and compiled it up, so...

  • Right-alignment of numeric cols:
    • Looks good to me - let's consider that part done.
  • Float precision:
    • Instead of the env var, let's use the same mechanism as set_float_format & get_float_format to set/load the integer value of decimal places directly; reading the env var for each float is expensive given that this function is used by Display (I didn't connect those dots earlier).
    • Experimenting with more values, it seems that the switch to scientific notation throws things off earlier than desirable, as the .len() > 9 check kicks in much too soon (for instance, with 6dp); we should set this quite a bit higher when in fixed-dp mode. Maybe even as high as 19?? I'm open to counter-offers on competing values for that, heh. We need to maintain fixed-dp into at least the hundreds of millions (or early/mid billions?) though 💭

With those two changes (and a rebase) we should be good to go :)

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 23, 2023

FYI, I added a mechanism for allowing directly applied values (those not mediated by env vars) to better participate in Config load/save/scoping yesterday: #7696.

@alicja-januszkiewicz
Copy link
Contributor Author

Right I've updated the PR to use that mechanism for float_precision.

This is sort of ready for review again, but there is a bit of an issue regarding the float_precision option. Namely, that when using set_float_format & get_float_format instead of using env vars, there isn't a straightforward way of checking if the option was ever set in the first place, so I have reserved the value of 255 (u8:MAX) to indicate an unset value. I admit this is a bit lazy but the alternative would require something like FLOAT_PRECISION: RwLock<Option<AtomicU8>>, which seems a bit over-complicated. Or perhaps that would have been better? What do you think?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Apr 6, 2023

I admit this is a bit lazy but the alternative would require something like FLOAT_PRECISION: RwLock<Option<AtomicU8>>, which seems a bit over-complicated. Or perhaps that would have been better? What do you think?

I think that's fine; keeps it simple - can you raise an error if anyone tries to set a nonsense value? Probably anything above 16, which (if I recall correctly) is about the precision limit of an f64. Certainly anything much above that is living in real denial about floating point accuracy :)

Have had family in town recently, so got a bit backlogged; will download the latest updates at the w/e and run through them (could you do a quick rebase?) 👍

@alicja-januszkiewicz
Copy link
Contributor Author

Sorry I was a bit busy myself over the Easter. I've tried to rebase tonight but I can't figure out how to without adding in hundreds of commits to the PR. I will try to sort this out next weekend.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Apr 18, 2023

Sorry I was a bit busy myself over the Easter. I've tried to rebase tonight but I can't figure out how to without adding in hundreds of commits to the PR. I will try to sort this out next weekend.

No problem; I think it'll be a great addition :) We have a 4 day weekend coming up here (national holidays), so I'll have plenty of time to review and/or help out.

I can't figure out how to without adding in hundreds of commits

A rebase + force-push should do the trick? Initially it'll look like you're incorporating tons of things, but after the force you should just be left with your own changes.

@alicja-januszkiewicz
Copy link
Contributor Author

alicja-januszkiewicz commented Apr 23, 2023

@alexander-beedie Hey sorry I only just got around to fixing this, I've added the check for float precision being greater than 16 and rebased.

A rebase + force-push should do the trick?

Yeah that's what I tried originally, but I ended up getting some commits from main in the commits/changes tabs to this PR. It was probably because I merged main into this branch at some earlier point, so I tried removing that merge commit first and that seems to have fixed it.

@stevenlis
Copy link

I'd suggest introducing two new config options via the following environment vars:

  • POLARS_FMT_DECIMAL_PLACES: default to None, otherwise a +ve integer.
  • POLARS_FMT_TABLE_CELL_NUMERIC_ALIGNMENT: default to None, allowing "LEFT", "RIGHT", "CENTER" - same asPOLARS_FMT_TABLE_CELL_ALIGNMENT, which it should override for numeric cols (if not None).

Add them into the Config class on the python side (can pattern after set_tbl_cell_alignment), and then we can experiment with the two settings independently to see what might work best for a potential new default 👍

Sorry to bother you guys. But is this still happening? @alexander-beedie @alicja-januszkiewicz

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 27, 2023

Sorry to bother you guys. But is this still happening? @alexander-beedie @alicja-januszkiewicz

Argh... this is 100% my fault; I must have missed some GitHub notifications and let it slip through the cracks. Short version is that "yes", I'd love to get this merged - it's a great contribution, and (potentially) could even become the default format.

@alicja-januszkiewicz; since this is my bad, I have checked out this branch and rebased/merged locally; unfortunately I don't seem able to push it back out, though the "Maintainers are allowed to edit this pull request" option has been checked. I'll see if I can work out why not and will update accordingly 🤔

error: failed to push some refs to 'https://github.com/alicja-januszkiewicz/polars.git'
To https://github.com/alicja-januszkiewicz/polars.git
!	refs/heads/right_aligned_numbers:refs/heads/right_aligned_numbers	[remote rejected] 

@ritchie46
Copy link
Member

@alexander-beedie do you use the github cli? This always works much, much easier for pushing to a PR.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 27, 2023

@alexander-beedie do you use the github cli?

I do not, but I'm willing to try ;)


Update: Ah! It's ok, all working now.
Just needed to update my GitHub personal access token with the workflow capability.

@alexander-beedie alexander-beedie force-pushed the right_aligned_numbers branch 2 times, most recently from b67e7f1 to ead4fe8 Compare August 27, 2023 12:23
@alicja-januszkiewicz
Copy link
Contributor Author

Hi,

I'd be happy to continue working on this, but I'm not sure what is the next step here - this is still pending review right? @alexander-beedie

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Sep 2, 2023

I'd be happy to continue working on this, but I'm not sure what is the next step here - this is still pending review right? @alexander-beedie

From my perspective I think it's in good shape; @ritchie46 or @orlp, would one of you mind having a quick look? (Note that this is a new formatting option, not enabled by default). Then let's merge and we can iterate further if necessary :)

crates/polars-core/src/fmt.rs Outdated Show resolved Hide resolved
crates/polars-core/src/fmt.rs Outdated Show resolved Hide resolved
@@ -710,6 +724,15 @@ const SCIENTIFIC_BOUND: f64 = 999999.0;

fn fmt_float<T: Num + NumCast>(f: &mut Formatter<'_>, width: usize, v: T) -> fmt::Result {
let v: f64 = NumCast::from(v).unwrap();

let precision = get_float_precision();
if precision != u8::MAX {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of usingu8::MAX to signal 'no option set' I think we should use an Option<T> initialized with None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alicja-januszkiewicz: want to take this one? :) I'm going to take a look at the max precision value again, as we should consider Decimal too (support for which is going to be turned on by default soon).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll try to get this over by the end of the week.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the type of FLOAT_PRECISION from AtomicU8 to RwLock<Option<usize>>.

I've picked usize for now as that's the type the write! and format! macros expect, so this saves having to repeatedly cast the value.

I've wrapped it in an RwLock to hopefully retain whatever protection AtomicU8 was providing, as there doesn't seem to be an atomic equivalent of Option.

py-polars/polars/config.py Outdated Show resolved Hide resolved
py-polars/src/functions/meta.rs Outdated Show resolved Hide resolved
@svaningelgem
Copy link
Contributor

@stinodego : This is a PR that I think could be merged as @alicja-januszkiewicz is still active on it (congratz mate!) and I think the feature would be well received.
CC: @alexander-beedie

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Oct 11, 2023

@stinodego : This is a PR that I think could be merged as @alicja-januszkiewicz is still active on it (congratz mate!) and I think the feature would be well received.

CC: @alexander-beedie

Definitely. Give me a few minutes with this one; has been a (too) long road getting this one over the line (which is on me 😅)

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Oct 11, 2023

Ok, rebased it and added an example to the docstring; pending a look from @orlp, I think we're ready to ship! 🥳
@alicja-januszkiewicz: sincere thanks for this one - and for your patience :)

@ritchie46 ritchie46 merged commit 716cd56 into pola-rs:main Oct 16, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Right-aligned numbers in dataframe printount
6 participants