Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide user-defined invariants for logical node extensions. #14329

Merged
merged 5 commits into from
Feb 4, 2025

Conversation

wiedld
Copy link
Contributor

@wiedld wiedld commented Jan 27, 2025

Which issue does this PR close?

Part of #14029

Rationale for this change

Enable logical plan invariants to be defined by the user.
For this first use case (this PR), the user-defined invariants are provided on the extension nodes.

What changes are included in this PR?

At a high level:

  • document existing behavior for logical plan checks.
  • provide the structure for a user-defined Invariant
  • start by permitting a user-defined Invariant on an extension node.

Are these changes tested?

Yes

Are there any user-facing changes?

A new trait method is available (UserDefinedLogicalNode::invariants()), but is not required for existing implementations -- since the default implementation is given.

@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate labels Jan 27, 2025
Comment on lines +300 to +301
/// Run invariant checks on the logical plan extension [`TopKPlanNode`].
async fn topk_invariants() -> Result<()> {
Copy link
Contributor Author

@wiedld wiedld Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test: demonstrate the basic use case. That user-defined invariants will fail for an invalid extension node.

Comment on lines +349 to +350
#[tokio::test]
async fn topk_invariants_after_invalid_mutation() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test: demonstrate a failed invariant check after logical plan mutation (during optimizer run).

Comment on lines +72 to 74
/// Visit the plan nodes, and confirm the [`InvariantLevel::Executable`]
/// as well as the less stringent [`InvariantLevel::Always`] checks.
pub fn assert_executable_invariants(plan: &LogicalPlan) -> Result<()> {
Copy link
Contributor Author

@wiedld wiedld Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documents the existing behavior.

  • The assert_always_invariants() (renamed to assert_always_invariants_at_current_node) assess only the current node, and does not assess the remaining DAG.
  • whereas the assert_executable_invariants() (a) visits the subplan, and (b) validates the always and executable.

Copy link
Contributor Author

@wiedld wiedld Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The does feel like a blurring of definitions. The previous decision was made based upon minimizing the performance impact; at the time, we wanted the "always" invariants to be be cheap and able to be checked more frequently.

However, as of now the frequency is:

  • LP always invariants checked before analyzer
  • LP executable (including always) invariants checked:
    • after analyzer
    • once before all optimizer runs
    • once after all optimizer runs

Should I undo this blurring, and have the assert_always_invariants also be recursive?

Comment on lines 75 to 80
// Always invariants
assert_always_invariants(plan)?;
assert_valid_extension_nodes(plan, InvariantLevel::Always)?;

// Executable invariants
assert_valid_extension_nodes(plan, InvariantLevel::Executable)?;
Copy link
Contributor Author

@wiedld wiedld Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presuming that the extension nodes are not at the plan root, it does not make sense to check during the assert_always_invariants_at_current_node.

Instead, the extension node invariants are checked during the recursive assertion (a.k.a. assert_executable_invariants). This recursive assertion is done less frequently during the planning -- for example, after all of the optimizers run.

@wiedld wiedld force-pushed the 13525/lp-extension-invariants branch from 590c115 to 3836444 Compare January 27, 2025 19:59
@wiedld wiedld marked this pull request as ready for review January 27, 2025 20:22
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wiedld -- this looks great. Very nicely tested too.

The only question I have before I think this PR would be ready to merge is if we need both new API functions (invariants and check_invariants).

The other comments are just style comments.

datafusion/expr/src/logical_plan/invariants.rs Outdated Show resolved Hide resolved
datafusion/expr/src/logical_plan/extension.rs Outdated Show resolved Hide resolved
///
/// Refer to [`UserDefinedLogicalNode::check_invariants`](super::UserDefinedLogicalNode)
/// for more details of user-provided extension node invariants.
fn assert_valid_extension_nodes(plan: &LogicalPlan, check: InvariantLevel) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written I think this this does a separate walk of the tree than assert_subqueries_are_valid

Maybe as a follow on PR we could unify the walks (so the tree gets walked once and all checks applied) rather than two separate plans

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wiedld -- this looks good to me

I merged up from main to get the CI tests to run again to make sure everything looks good. But otherwise I think this is good to go

@alamb alamb merged commit d8bc49f into apache:main Feb 4, 2025
25 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 4, 2025

Thanks again @wiedld

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants