Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update VirtualTable to use an expression for defining table records #786

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jcamachor
Copy link
Contributor

@jcamachor jcamachor commented Feb 14, 2025

feat: This PR modifies the VirtualTable message to introduce a new record_list_expression field. The record_list_expression field is an Expression that evaluates into a list of structs (LIST<STRUCT<T1, ..., Tn>>).

This improvement leverages Substrait's support for nested types and provides greater flexibility than the existing representation, i.e., it allows using a single dynamic parameter expression (#780) to represent the records in a virtual table.

@EpsilonPrime
Copy link
Member

While I like this simplification there is on thing I'm concerned about. That is the lack of widespread support for complex types (here list and structs). A simplistic Substrait consumer could still be considered compliant without that support. But if we require support that means systems like DuckDB will need that support before they can even define a virtual table which is a way of providing data without a physical table. I suppose we could implement this as an alternative with the expectation that both methods could still be options a few years from now.

@jcamachor
Copy link
Contributor Author

Thanks, @EpsilonPrime ! That makes sense--rather than deprecating the field, I've updated it to use a oneof. I've already addressed that in the latest commit, but there's still an issue since repeated isn't allowed within oneof. I'll take a look at that later today.

@jcamachor
Copy link
Contributor Author

@EpsilonPrime, could you take a look at the latest iteration? I had to wrap the repeated fields in a message, but this approach isn't ideal since it breaks backward compatibility. Other possible options include: (1) introducing a new VirtualTable message, or (2) keeping the fields separate and not using oneof, shifting correctness responsibility to producers and validation responsibility to consumers. I'm not sure what's the best path forward. Please let me know your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants