fix: Remove duplicated content in error messages #8107

rben01 · 2023-04-10T04:30:29Z

Should fix #8079 , but please avoid merging for now, as I have only tested a small subset of the things this might affect (and I have not yet added any formal tests at all; what's below is the extent of my testing). I would also like input on this to see if this the ideal way to handle this problem and to report errors in general.

Changes

Prevented exponential growth of error messages by avoiding nesting of error messages
Instead, used a counter to track how many operations had been applied to a LazyFrame after an error
Error message contains stack depth and original error
Renamed ErrorState to ErrorStateEncounters and made it crate-private
Renamed ErrorStateSync to ErrorState and removed its Deref impl so that the only way to access its data is through (crate-private) accessors, thus reducing the size of its public API

Error messages are now O(1) in length (well, technically O(log(n)), if you take into account that the depth counter has log10(n) digits, but this is effectively O(1)).

Test case (singular)

Input

In each case we print > title of test case, = df.describe_plan(), and # df.collect().unwrap_err()

fn main() -> Result<(), Box<dyn std::error::Error>> {
    fn report_err(df: LazyFrame, ops_after: usize) {
        println!("> select xyz, then {ops_after} ops after");
        println!("= {}", df.describe_plan());
        println!("# {:?}", df.collect().unwrap_err());
        println!();
    }

    let df = df! [
        "c1" => [0, 1],
    ]
    .unwrap()
    .lazy();

    println!("> initial df");
    println!("= {}", df.describe_plan());
    println!("# NO ERROR");
    println!();

    let df0 = df.clone().select([col("xyz")]);
    report_err(df0, 0);

    let df1 = df.clone().select([col("xyz")]).select([col("c1")]);
    report_err(df1, 1);

    let df2 = df
        .clone()
        .select([col("xyz")])
        .select([col("c1")])
        .select([col("c2")]);
    report_err(df2, 2);

    let df3 = df
        .clone()
        .select([col("xyz")])
        .select([col("c1")])
        .select([col("c2")])
        .select([col("c3")]);
    report_err(df3, 3);

    Ok(())
}

Output

In each case we print > title of test case, = df.describe_plan(), and # df.collect().unwrap_err()

> initial df
= DF ["c1"]; PROJECT */1 COLUMNS; SELECTION: "None"
# NO ERROR

> select xyz, then 0 ops after
= ErrorState(NotYetEncountered { err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) })
# ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\""))

> select xyz, then 1 ops after
= ErrorState(AlreadyEncountered { n_times: 1, prev_err_msg: "not found: xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"" })
# ComputeError(ErrString("LogicalPlan already failed (depth: 1) with error: 'not found: xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

> select xyz, then 2 ops after
= ErrorState(AlreadyEncountered { n_times: 2, prev_err_msg: "not found: xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"" })
# ComputeError(ErrString("LogicalPlan already failed (depth: 2) with error: 'not found: xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

> select xyz, then 3 ops after
= ErrorState(AlreadyEncountered { n_times: 3, prev_err_msg: "not found: xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"" })
# ComputeError(ErrString("LogicalPlan already failed (depth: 3) with error: 'not found: xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

ritchie46 · 2023-05-14T06:10:30Z

Any news on this @rben01?

rben01 · 2023-05-14T12:51:17Z

Thanks for the reminder. I'll circle back on this this week and if nothing else write some tests. But first I'd like to see if there's a better way to include multiple error messages than just the depth.

rben01 · 2023-05-18T03:34:46Z

@ritchie46 I'm thinking about implementing a system where subsequent operations are tracked in some detail — not just the number of them, but an entire Vec indicating the operations attempted after the error, e.g., [Projection, Sort, Filter]. The idea is that it should make tracking down the source of an error much easier. I've experimented with it a little and it's feasible to store the names of the operations (corresponding to the methods of LogicalPlanBuilder) and some of the data that would be relevant to a user, such as the columns used in the projection. Do you think this is a good idea, or should we just stick with the count for now?

rben01 · 2023-06-15T20:34:38Z

So I think this is fine to merge in its current state. It's better than the current behavior. I'll think more about what improving the error messages might look like in the future, but for now this PR is a strict improvement on the existing behavior.

svaningelgem · 2023-10-11T18:57:56Z

@rben01 , if the maintainers of polars like your PR as much as I do: could you update it with the main branch and add a test? This can be based on your example with an added assertion that the contents looks correct.

rben01 · 2023-10-17T02:44:34Z

@rben01 , if the maintainers of polars like your PR as much as I do: could you update it with the main branch and add a test? This can be based on your example with an added assertion that the contents looks correct.

I pulled in main and wrote some tests, but I'm unable to run them because it won't build now (looks like some upstream compilation errors). I'll check back on this periodically and keep pulling from main and hopefully eventually I can run the new tests and have them pass.

rben01 · 2023-10-18T02:45:11Z

Ok, test added, should be ok to merge now.

ritchie46 · 2023-10-18T07:53:49Z

Ok, test added, should be ok to merge now.

Can you ensure CI is green? :)

rben01 · 2023-10-18T22:17:55Z

Can you ensure CI is green? :)

Nothing I can do about python but rust CI should be all green now :)

svaningelgem · 2023-10-19T05:01:22Z

Hi @rben01 , I think it is because the way the errors are now handled.
You do some kind of "compression", and likely this is loosing the Duplicate error that is being thrown.

What if you add in your tests something like this:

    let df4 = df
        .clone()
        .select([col("c1"), col("c1")])
        .collect();

That should give you the DuplicateError that Python is complaining about. And likely fix that specific test as well. Because Python expects a DuplicateError (on rust side it's called Duplicate or polars_bail!(duplicate=), but it gets a ComputeError instead.

So I think it has to do with the "compression"/"unwrapping" that you implemented.

rben01 · 2023-10-20T15:32:59Z

Thanks @svaningelgem , you're absolutely right. I changed things so that instead of wrapping everything in ComputeError, I wrap them in their original error variant. Here is the new output:

> initial df
= DF ["c1"]; PROJECT */1 COLUMNS; SELECTION: "None"
# NO ERROR

> select xyz, then 0 ops after
= ErrorState(NotYetEncountered { err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) })
# ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\""))

> select xyz, then 1 ops after
= ErrorState(AlreadyEncountered { n_times: 1, orig_err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) })
# ColumnNotFound(ErrString("LogicalPlan already failed (depth: 1) with error: 'xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

> select xyz, then 2 ops after
= ErrorState(AlreadyEncountered { n_times: 2, orig_err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) })
# ColumnNotFound(ErrString("LogicalPlan already failed (depth: 2) with error: 'xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

> select xyz, then 3 ops after
= ErrorState(AlreadyEncountered { n_times: 3, orig_err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) })
# ColumnNotFound(ErrString("LogicalPlan already failed (depth: 3) with error: 'xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

And now we're all green.

rben01 · 2023-10-23T05:00:14Z

Updated again to simplify the implementation and remove some of the pointless stuff in debug output. Hopefully this remains all green.

> initial df
= DF ["c1"]; PROJECT */1 COLUMNS; SELECTION: "None"
# NO ERROR

> select xyz, then 0 ops after
= ErrorState { n_times: 0, err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) }
# ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\""))

> select xyz, then 1 ops after
= ErrorState { n_times: 1, err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) }
# ColumnNotFound(ErrString("LogicalPlan already failed (depth: 1) with error: 'xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

> select xyz, then 2 ops after
= ErrorState { n_times: 2, err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) }
# ColumnNotFound(ErrString("LogicalPlan already failed (depth: 2) with error: 'xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

> select xyz, then 3 ops after
= ErrorState { n_times: 3, err: ColumnNotFound(ErrString("xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"")) }
# ColumnNotFound(ErrString("LogicalPlan already failed (depth: 3) with error: 'xyz\n\nError originated just after this operation:\nDF [\"c1\"]; PROJECT */1 COLUMNS; SELECTION: \"None\"'"))

rben01 · 2023-11-01T13:56:37Z

@ritchie46 does this look good to you?

rben01 · 2024-02-05T02:45:01Z

@ritchie46 Any update on merging this?

stinodego · 2024-02-05T07:54:03Z

@rben01 could you rebase this? Then I'll have a look or ask Ritchie to look at it.

Also, the linked issue has been closed. Is there a MRE we can run to see the fix in action? Adding a Python test with such an example would be preferable.

Prevented exponential growth of error messages by avoiding nesting of error messages Instead, used a counter to track how many operations had been applied to a LazyFrame after an error Renamed `ErrorState` to `ErrorStateEncounters` and made it crate-private Renamed `ErrorStateSync` to `ErrorState` and removed its Deref impl so that the only way to access its data is through (crate-private) accessors, thus reducing the size of its public API

…slashes in output Minor cleanup + commenting

Prevented exponential growth of error messages by avoiding nesting of error messages Instead, used a counter to track how many operations had been applied to a LazyFrame after an error Renamed `ErrorState` to `ErrorStateEncounters` and made it crate-private Renamed `ErrorStateSync` to `ErrorState` and removed its Deref impl so that the only way to access its data is through (crate-private) accessors, thus reducing the size of its public API

Instead of leaving a `String` in place of `PolarsError`, leave in place the original `PolarsError`. Then, when it's time to produce a new error, we clone the original error if possible; and it's almost always possible, as all `PolarsError` variants contain ErrString as their data except one: `Io`. So just convert `Io` to `ComputeError` with an error message saying so, otherwise reuse the `PolarsError` variant with a cloned message. (This is based off, and utilizes, the method `PolarsError::wrap_msg`.) This way we can retain the "shape" of the original error and just augment its message to indicate that it's already failed n times. Also cleaned up the `err_msg.rs` test in anticipation of adding a bit more to that file.

Now that errors aren't taken and replaced by a String - the original is stored and cloned as needed - we can simplify the implementation. Instead of NotYetEncountered and AlreadyEncountered, we can just have a single struct with a counteer (no enum needed). This massively simplifies the implementation Also, simplified the implemtnation of tests Also cleaned up ErrorState's Debug implementation so that users don't see the nuts and bolts inside the struct - they only see the important debugging info

rben01 · 2024-02-06T04:02:01Z

Also, the linked issue has been closed. Is there a MRE we can run to see the fix in action? Adding a Python test with such an example would be preferable.

I closed it with a smaller PR that prevented the escaping of backslashes leading to O(2^n) error message length, but there was still duplication within error messages, as all old messages were printed — that's what this PR fixes.

Here is an example from Python of compounding error messages (edited after posting to replace c, c, c with c, d, e to make it clear that the failure from the first select destroys the LazyFrame):

In [10]: pl.__version__
Out[10]: '0.20.7'

In [11]: df = pl.DataFrame({'a': 1, 'b': range(2)})

In [12]: df.lazy().select('c').select('d').select('e')
Out[12]: ---------------------------------------------------------------------------
ColumnNotFoundError                       Traceback (most recent call last)
<<< backtrace omitted >>>
ColumnNotFoundError: c

Error originated just after this operation:
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"

Error originated just after this operation:
ErrorStateSync(AlreadyEncountered(not found: c

Error originated just after this operation:
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"))
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"

Error originated just after this operation:
ErrorStateSync(AlreadyEncountered(not found: c

Error originated just after this operation:
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"

Error originated just after this operation:
ErrorStateSync(AlreadyEncountered(not found: c

Error originated just after this operation:
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"))
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"))
ErrorStateSync(AlreadyEncountered(not found: c

Error originated just after this operation:
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"))
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"

stinodego

I added a test for this.

However, if we look at the resulting exception message now, it's not quite correct:

import polars as pl

lf = pl.LazyFrame({"a": 1, "b": range(2)})
lf.select("c").select("d").select("e").collect()

Traceback (most recent call last):
  File "/home/stijn/code/polars/py-polars/repro.py", line 29, in <module>
    df.select("c").select("d").select("e").collect()
  File "/home/stijn/code/polars/py-polars/polars/lazyframe/frame.py", line 1927, in collect
    return wrap_df(ldf.collect())
                   ^^^^^^^^^^^^^
polars.exceptions.ColumnNotFoundError: LogicalPlan already failed (depth: 2) with error: 'c

Error originated just after this operation:
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"'

Two things:

The quote around the c column name ends up at the end (e.g. we have 'c now with the last quote at the end of this message.
I would expect a simple ColumnNotFound: 'c' - not something about LogicalPlan already failed.

rben01 · 2024-02-07T04:20:05Z

Oh, I see. Looks like the Rust error messages have been specially crafted so they look nice in Python. The error is not ColumnNotFoundError: c, but rather ColumnNotFoundError: c\n\nError originated just after….

On the Rust side, the current (v0.37.0) error message looks like this:

Err(ColumnNotFound(ErrString("c\n\nError originated just after this operation:\nDF
[\"a\", \"b\"]; PROJECT */2 COLUMNS; SELECTION: \"None\"")))

When it makes it to Python, I guess everything but the message itself is stripped off and converted to a ColumnNotFoundError and then the message is printed. So if I muck with the message I have to make sure it has the property that it looks nice in Python. Shouldn't be too hard to clean this up.

…e display of Python error messages

rben01 · 2024-02-07T04:43:31Z

So now, the error messages should look like:

polars.exceptions.ColumnNotFoundError: c

Error originated just after this operation:
DF ["a", "b"]; PROJECT */2 COLUMNS; SELECTION: "None"'

LogicalPlan had already failed with the above error; after failure, 3 additional operations were attempted on the LazyFrame

I wasn't able to test in Python; I'm just going off my understanding of how the error messages on the rust side are translated into Python.

ritchie46 · 2024-02-07T11:31:05Z

If @stinodego is happy with the error messages, this can be merged. Thanks a lot for this @rben01

stinodego

Looks great, thanks!

rben01 requested review from ritchie46 and orlp as code owners October 17, 2023 02:43

rben01 requested review from stinodego and c-peters as code owners January 15, 2024 18:48

stinodego changed the title ~~Removed duplicated content in error messages~~ fix: Remove duplicated content in error messages Feb 5, 2024

stinodego added the A-exceptions Area: exception handling label Feb 5, 2024

github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Feb 5, 2024

rben01 added 6 commits February 5, 2024 22:33

Improved formatting of ErrorState to avoid exponential growth of back…

38310d7

…slashes in output Minor cleanup + commenting

fixed clippy lint (removed unused variable)

af4f778

Cleaned up raise_err! and updated its docs

c76729a

Added tests for error messages

f51b7af

rben01 added 5 commits February 5, 2024 22:36

Removed inadvertent duplicate variable

f850f3f

Fixed up test of error messages for LazyFrame with error plan

1887688

Fixed formatting issues

1ef40a9

rben01 force-pushed the fix-error-fmt-exponential-length branch from f801640 to 7058c89 Compare February 6, 2024 03:37

Removed unused import

4a4283e

Add test

dcb4fe0

rben01 requested review from alexander-beedie and MarcoGorelli as code owners February 6, 2024 10:44

stinodego requested changes Feb 6, 2024

View reviewed changes

Swapped order of elements in "already failed" error message to improv…

1f9b933

…e display of Python error messages

rben01 requested a review from stinodego February 7, 2024 04:41

stinodego approved these changes Feb 7, 2024

View reviewed changes

stinodego merged commit 9dfc9fd into pola-rs:main Feb 7, 2024
22 checks passed

This was referenced Feb 14, 2024

Slow error formatting on large plan presents like hang #8277

Closed

fix(rust): Prevent recursive AlreadyEncountered error formatting #12097

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Remove duplicated content in error messages #8107

fix: Remove duplicated content in error messages #8107

rben01 commented Apr 10, 2023 •

edited

Loading

ritchie46 commented May 14, 2023

rben01 commented May 14, 2023

rben01 commented May 18, 2023

rben01 commented Jun 15, 2023

svaningelgem commented Oct 11, 2023

rben01 commented Oct 17, 2023

rben01 commented Oct 18, 2023

ritchie46 commented Oct 18, 2023

rben01 commented Oct 18, 2023

svaningelgem commented Oct 19, 2023

rben01 commented Oct 20, 2023

rben01 commented Oct 23, 2023

rben01 commented Nov 1, 2023

rben01 commented Feb 5, 2024

stinodego commented Feb 5, 2024 •

edited

Loading

rben01 commented Feb 6, 2024 •

edited

Loading

stinodego left a comment •

edited

Loading

rben01 commented Feb 7, 2024

rben01 commented Feb 7, 2024 •

edited

Loading

ritchie46 commented Feb 7, 2024

stinodego left a comment

fix: Remove duplicated content in error messages #8107

fix: Remove duplicated content in error messages #8107

Conversation

rben01 commented Apr 10, 2023 • edited Loading

Changes

Test case (singular)

Input

Output

ritchie46 commented May 14, 2023

rben01 commented May 14, 2023

rben01 commented May 18, 2023

rben01 commented Jun 15, 2023

svaningelgem commented Oct 11, 2023

rben01 commented Oct 17, 2023

rben01 commented Oct 18, 2023

ritchie46 commented Oct 18, 2023

rben01 commented Oct 18, 2023

svaningelgem commented Oct 19, 2023

rben01 commented Oct 20, 2023

rben01 commented Oct 23, 2023

rben01 commented Nov 1, 2023

rben01 commented Feb 5, 2024

stinodego commented Feb 5, 2024 • edited Loading

rben01 commented Feb 6, 2024 • edited Loading

stinodego left a comment • edited Loading

Choose a reason for hiding this comment

rben01 commented Feb 7, 2024

rben01 commented Feb 7, 2024 • edited Loading

ritchie46 commented Feb 7, 2024

stinodego left a comment

Choose a reason for hiding this comment

rben01 commented Apr 10, 2023 •

edited

Loading

stinodego commented Feb 5, 2024 •

edited

Loading

rben01 commented Feb 6, 2024 •

edited

Loading

stinodego left a comment •

edited

Loading

rben01 commented Feb 7, 2024 •

edited

Loading