Provide user-defined invariants for logical node extensions. #14329

wiedld · 2025-01-27T19:44:36Z

Which issue does this PR close?

Rationale for this change

Enable logical plan invariants to be defined by the user.
For this first use case (this PR), the user-defined invariants are provided on the extension nodes.

What changes are included in this PR?

At a high level:

document existing behavior for logical plan checks.
provide the structure for a user-defined Invariant
start by permitting a user-defined Invariant on an extension node.

Are these changes tested?

Yes

Are there any user-facing changes?

A new trait method is available (UserDefinedLogicalNode::invariants()), but is not required for existing implementations -- since the default implementation is given.

wiedld · 2025-01-27T19:46:01Z

datafusion/core/tests/user_defined/user_defined_plan.rs

+/// Run invariant checks on the logical plan extension [`TopKPlanNode`].
+async fn topk_invariants() -> Result<()> {


test: demonstrate the basic use case. That user-defined invariants will fail for an invalid extension node.

wiedld · 2025-01-27T19:46:42Z

datafusion/core/tests/user_defined/user_defined_plan.rs

+#[tokio::test]
+async fn topk_invariants_after_invalid_mutation() -> Result<()> {


test: demonstrate a failed invariant check after logical plan mutation (during optimizer run).

datafusion/expr/src/logical_plan/extension.rs

datafusion/expr/src/logical_plan/invariants.rs

wiedld · 2025-01-27T19:52:05Z

datafusion/expr/src/logical_plan/invariants.rs

+/// Visit the plan nodes, and confirm the [`InvariantLevel::Executable`]
+/// as well as the less stringent [`InvariantLevel::Always`] checks.
 pub fn assert_executable_invariants(plan: &LogicalPlan) -> Result<()> {


Documents the existing behavior.

The assert_always_invariants() (renamed to assert_always_invariants_at_current_node) assess only the current node, and does not assess the remaining DAG.

whereas the assert_executable_invariants() (a) visits the subplan, and (b) validates the always and executable.

The does feel like a blurring of definitions. The previous decision was made based upon minimizing the performance impact; at the time, we wanted the "always" invariants to be be cheap and able to be checked more frequently.

However, as of now the frequency is:

LP always invariants checked before analyzer

LP executable (including always) invariants checked:

after analyzer

once before all optimizer runs

once after all optimizer runs

Should I undo this blurring, and have the assert_always_invariants also be recursive?

wiedld · 2025-01-27T19:55:06Z

datafusion/expr/src/logical_plan/invariants.rs

+    // Always invariants
    assert_always_invariants(plan)?;
+    assert_valid_extension_nodes(plan, InvariantLevel::Always)?;
+
+    // Executable invariants
+    assert_valid_extension_nodes(plan, InvariantLevel::Executable)?;


Presuming that the extension nodes are not at the plan root, it does not make sense to check during the assert_always_invariants_at_current_node.

Instead, the extension node invariants are checked during the recursive assertion (a.k.a. assert_executable_invariants). This recursive assertion is done less frequently during the planning -- for example, after all of the optimizers run.

…mutation during an optimizer pass

alamb

Thank you @wiedld -- this looks great. Very nicely tested too.

The only question I have before I think this PR would be ready to merge is if we need both new API functions (invariants and check_invariants).

The other comments are just style comments.

datafusion/expr/src/logical_plan/invariants.rs

datafusion/expr/src/logical_plan/extension.rs

alamb · 2025-01-29T20:05:29Z

datafusion/expr/src/logical_plan/invariants.rs

+///
+/// Refer to [`UserDefinedLogicalNode::check_invariants`](super::UserDefinedLogicalNode)
+/// for more details of user-provided extension node invariants.
+fn assert_valid_extension_nodes(plan: &LogicalPlan, check: InvariantLevel) -> Result<()> {


As written I think this this does a separate walk of the tree than assert_subqueries_are_valid

Maybe as a follow on PR we could unify the walks (so the tree gets walked once and all checks applied) rather than two separate plans

… it doesn't make sense for the extension node's checks

…variants

alamb

Thank you @wiedld -- this looks good to me

I merged up from main to get the CI tests to run again to make sure everything looks good. But otherwise I think this is good to go

alamb · 2025-02-04T20:42:46Z

Thanks again @wiedld

feat(13525): permit user-defined invariants on logical plan extensions

876959d

github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate labels Jan 27, 2025

wiedld commented Jan 27, 2025

View reviewed changes

datafusion/expr/src/logical_plan/extension.rs Outdated Show resolved Hide resolved

wiedld commented Jan 27, 2025

View reviewed changes

datafusion/expr/src/logical_plan/invariants.rs Outdated Show resolved Hide resolved

wiedld commented Jan 27, 2025

View reviewed changes

wiedld force-pushed the 13525/lp-extension-invariants branch from 590c115 to 3836444 Compare January 27, 2025 19:59

wiedld marked this pull request as ready for review January 27, 2025 20:22

test(13525): demonstrate extension node invariants catching improper …

3836444

…mutation during an optimizer pass

alamb reviewed Jan 29, 2025

View reviewed changes

wiedld added 2 commits January 30, 2025 19:20

chore: update docs

0cda030

refactor: remove the extra Invariant interface around an FnMut, since…

159d62d

… it doesn't make sense for the extension node's checks

github-actions bot added the substrait label Jan 31, 2025

Merge remote-tracking branch 'apache/main' into 13525/lp-extension-in…

b8e6a57

…variants

alamb approved these changes Feb 3, 2025

View reviewed changes

alamb merged commit d8bc49f into apache:main Feb 4, 2025
25 checks passed

alamb mentioned this pull request Feb 4, 2025

Define extension API for user-defined invariants. #14029

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide user-defined invariants for logical node extensions. #14329

Provide user-defined invariants for logical node extensions. #14329

wiedld commented Jan 27, 2025 •

edited

Loading

wiedld Jan 27, 2025 •

edited

Loading

wiedld Jan 27, 2025

wiedld Jan 27, 2025 •

edited

Loading

wiedld Jan 27, 2025 •

edited

Loading

wiedld Jan 27, 2025 •

edited

Loading

alamb left a comment

alamb Jan 29, 2025

alamb left a comment

alamb commented Feb 4, 2025

		/// Run invariant checks on the logical plan extension [`TopKPlanNode`].
		async fn topk_invariants() -> Result<()> {

		#[tokio::test]
		async fn topk_invariants_after_invalid_mutation() -> Result<()> {

Provide user-defined invariants for logical node extensions. #14329

Provide user-defined invariants for logical node extensions. #14329

Conversation

wiedld commented Jan 27, 2025 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

wiedld Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

wiedld Jan 27, 2025

Choose a reason for hiding this comment

wiedld Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

wiedld Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

wiedld Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Jan 29, 2025

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb commented Feb 4, 2025

wiedld commented Jan 27, 2025 •

edited

Loading

wiedld Jan 27, 2025 •

edited

Loading

wiedld Jan 27, 2025 •

edited

Loading

wiedld Jan 27, 2025 •

edited

Loading

wiedld Jan 27, 2025 •

edited

Loading