-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically check "invariants" #13652
Comments
take |
Possible tasks
|
For the physical optimization invariants, we have that the output physical plan schema cannot change; meaning the output results cannot change, but how we get the results can. What else should be included as a responsibility/check? Maintain input ordering if required? (The idea is to have a check, perhaps run in debug mode, that would error if a user-defined physical plan or optimization pass fails to maintain the invariant. Throw error closer to the source when debugging.) |
Here are some ideas based on bugs we have hit / my memory of what has changed and caused us pain:
|
My impression was that the plan construction occurred with the LP (as we do), and not by constructing their own de novo physical plan. Is this correct? If so, then I think the above list of invariants to check would most likely occur at the LP-level (not the physical plan). I can definitely put up a PR for those. Thank you! Whereas for the physical plan invariants, (not LP), do we want any invariant checking there? Because I looked at the apache docs & physical plan APIs and from (my naive) understanding these are the only two invariants to check after physical plan mutations (a.k.a. PhysicalOptimizerRule applied). Is this correct? 🤔 |
I am sorry -- I don't understand what you are asking (is LP LogicalPlan?) What does a de novo physical plan mean? You mean like creating a
I am not sure what the actual invariants are (part of this project I think is to discover that information) In my opinion we should be seeking to discover what the existing implicit assumptions are and encode them explicitly in the invariant check. Once we have all the existing assumptions encoded then we can move on to trying to add more assumptions |
I just discovered that @houqp basically filed this same ticket 2 years ago: |
Agreed. Modifying this list above, we have infrastructure components of:
|
Is your feature request related to a problem or challenge?
I extracted this from #13651 so it was more visible
During upgrade, downstream systems often experience issues due to implicit changes (not explicit API changes) of LogicalPlans that DataFusion code begins relying on, and which result in unintended consequences when upgrading to a new version of DataFusion (see #13525).
Describe the solution you'd like
The idea is to make the current implicit assumptions ("Invariants" in more formal language)( explict and automatically check them.
Examples of implicit assumptions:
UnionExec
must have the same schemaDescribe alternatives you've considered
I like the approach @wiedld took in #13651 :
Additional context
Sub tasks:
The text was updated successfully, but these errors were encountered: