Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double-check whether predicate expressions have correct nullability semantics #51

Open
jvanstraten opened this issue Sep 13, 2022 · 8 comments
Assignees

Comments

@jvanstraten
Copy link
Collaborator

jvanstraten commented Sep 13, 2022

When I wrote the bulk of the validation code, I was properly confused about nullable booleans in the context of predicate expressions (both things that want one, and things that return one, like set containment subqueries). There is a good summary about what the behavior of these things should be here, but it's probably worth double-checking corner cases against query engine documentation or just by experimentation. When a nullable boolean is received by a predicate expression slot, the validator should probably also emit a note of some kind about what null means in that particular context.

Todo list (to check):

@mbasmanova
Copy link

@jvanstraten Curious what is the "validation code"? Is there some sort of a test that specifies query plans and inputs and expected outputs? I'd be curious to see what we have for the joins.

@jvanstraten
Copy link
Collaborator Author

The validator (this repository) only checks whether a plan is valid, and sort of describes in English how the individual relations and expressions should behave; it doesn't know how to actually execute a plan given some input tables. The descriptions are probably not precise enough for joins in this case, because all the descriptions are currently limited to my own personal understanding. I would love some double-checking of the descriptions it does emit, though. Running the validator should be super easy if you have some plans already (see toplevel readme file); if you disagree with something it says or if you think the spec disagrees with it, please open an issue here.

@mbasmanova
Copy link

@jvanstraten This sounds super interesting. I'd love to try it out. Did you mean to include a link for "(this repository)"? For some reason it shows up as regular text.

@mbasmanova
Copy link

Oh... perhaps, you literally meant this repository: https://github.com/substrait-io/substrait-validator

Let me try it.

@mbasmanova
Copy link

@jvanstraten I was able to install and run the validator on one of Substrait plan checked in to Velox as JSON: velox/substrait/tests/data/q6_first_stage.json

I see a lot of errors, but I assume we can fix these. I also see the query plan. This is pretty cool.

Screen Shot 2022-09-13 at 10 20 58 AM

The description for the Filter node says "This relation discards all rows for which and:bool_bool(and:bool_bool(and:bool_bool(..), ..), lt:fp64_fp64(?.0, ..)) yields false. Behavior for a null condition is unspecified." I wonder why it says "Behavior for a null condition is unspecified." Should this be removed or, perhaps, clarified to explain that if the filter condition is evaluated to NULL, the row is not included?

@jvanstraten
Copy link
Collaborator Author

Yeah, that's exactly what this issue is for. @mbrobbel is already working on it.

@mbasmanova
Copy link

@jvanstraten Great! Thank you for clarifying.

@mbrobbel
Copy link
Member

@jvanstraten you can assign this one to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants