Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly support Options for Scalar/Aggregate/Window functions #278

Merged
merged 3 commits into from
Jul 11, 2024

Conversation

Blizzara
Copy link
Contributor

@Blizzara Blizzara commented Jul 2, 2024

Options were supported in the Java immutables but in most cases dropped when converting into protobuf

This

  • fixes the protobuf conversion,
  • adds a builder for FunctionOption
  • adds a ExpressionCreator functions with options
  • does some refactoring

@@ -12,7 +11,7 @@ public abstract class AggregateFunctionInvocation {

public abstract List<FunctionArg> arguments();

public abstract Map<String, FunctionOption> options();
Copy link
Contributor Author

@Blizzara Blizzara Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed weird to me, since the name is already kept inside the FunctionOption, so this just duplicates the source of truth. That said I don't mind reverting this if it's preferable to keep the map.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this is/was intended to allow for easy lookup of options given a name, so that consumers could look up the relevant options for a given function. However given that:

  // Name of the option to set. If the consumer does not recognize the
  // option, it must reject the plan. The name is matched case-insensitively
  // with option names defined for the function.

https://github.com/substrait-io/substrait/blob/7dbbf0468083d932a61b9c720700bd6083558fa9/proto/substrait/algebra.proto#L741-L744

It probably is safer to have this be a list and force consumers to process all of them.

}

public static Expression.ScalarFunctionInvocation scalarFunction(
SimpleExtension.ScalarFunctionVariant declaration,
Type outputType,
List<? extends FunctionOption> options,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if it's better to change the existing functions or add new ones given this is an optional parameter, so I added new ones. But happy to remove the old ones too if that'd be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think were getting close to the point where this pattern (individual constructor variants with different field combinations) may not be good as we start to explode the number of function combinations. These are fine for now, but if were to add more beyond these it might make sense to revisit this utility function approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair - I decided to revert the additions and add some docstrings to point to using the builder directly in 9aae3ec, does that seem alright to you?

@Blizzara Blizzara force-pushed the avo/support-function-options branch from 984583b to 484c512 Compare July 2, 2024 12:24
Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started taking a look. This seems fairly reasonable so far. I can spend some more time on this tomorrow.

.name(o.getName())
.addAllValues(o.getPreferenceList())
.build();
public static FunctionOption fromFunctionOption(io.substrait.proto.FunctionOption o) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat awkward to make this static so it can be re-used in the ProtoAggregateFunctionConverter, but I also can't think of a better place to put this.


public static ImmutableFunctionOption.Builder builder() {
return ImmutableFunctionOption.builder();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition ✨

@@ -12,7 +11,7 @@ public abstract class AggregateFunctionInvocation {

public abstract List<FunctionArg> arguments();

public abstract Map<String, FunctionOption> options();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this is/was intended to allow for easy lookup of options given a name, so that consumers could look up the relevant options for a given function. However given that:

  // Name of the option to set. If the consumer does not recognize the
  // option, it must reject the plan. The name is matched case-insensitively
  // with option names defined for the function.

https://github.com/substrait-io/substrait/blob/7dbbf0468083d932a61b9c720700bd6083558fa9/proto/substrait/algebra.proto#L741-L744

It probably is safer to have this be a list and force consumers to process all of them.

FunctionOption.builder()
.name("option")
.addValues("VALUE1", "VALUE2")
.build()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta: The lead function doesn't actually define any options. Potentially, we should reject both:

  • Building plans w/ invalid options.
  • Reading protobuf plans in w/ invalid options.

I think this is outside the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair 😅 but yep agreed it'd be a separate PR

@Blizzara
Copy link
Contributor Author

Blizzara commented Jul 5, 2024

@vbarua did you have a chance to give this another look? :)

Also, any ideas why the editorconfig check is complaining? The files it mentions seem unrelated to this PR:

Run editorconfig-checker
.gitmodules:
	2: Wrong indent style found (tabs instead of spaces)
	3: Wrong indent style found (tabs instead of spaces)
core/src/main/antlr/SubstraitType.g[4](https://github.com/substrait-io/substrait-java/actions/runs/9796466765/job/27050988486?pr=278#step:4:5):
	110: Wrong indent style found (tabs instead of spaces)
...

@vbarua
Copy link
Member

vbarua commented Jul 5, 2024

any ideas why the editorconfig check is complaining?

Not only where they unrelated to your PR, they also haven't been modified in a long time. I've fixed the issues in:
#280

https://substrait.io/expressions/scalar_functions/#options
Options were supported in the Java classes but in most cases dropped when converting into protobuf

This
- fixes the protobuf conversion,
- adds a builder for FunctionOption
- adds a ExpressionCreator functions with options
- does some refactoring
@Blizzara Blizzara force-pushed the avo/support-function-options branch from 92f800c to e786345 Compare July 8, 2024 09:01
Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks reasonable to me.

.addAllPreference(o.getValue().values())
.build())
f.options().stream()
.map(ExpressionProtoConverter::from)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta: I think re-using this is good, but also think it's a bit weird that this lives in ExpressionProtoConverter and is used in RelProtoConverter. I'm not sure what a better place for this is yet, so I think it's fine to keep this here for now.

.outputType(protoTypeConverter.from(measure.getOutputType()))
.aggregationPhase(Expression.AggregationPhase.fromProto(measure.getPhase()))
.invocation(Expression.AggregationInvocation.fromProto(measure.getInvocation()))
.options(options)
.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other advantage of this is that by forcing callers to wrap this in a Measure themselves, it also makes it easier for them to include the PreMeasure filter when it's available.

}

public static Expression.ScalarFunctionInvocation scalarFunction(
SimpleExtension.ScalarFunctionVariant declaration,
Type outputType,
List<? extends FunctionOption> options,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think were getting close to the point where this pattern (individual constructor variants with different field combinations) may not be good as we start to explode the number of function combinations. These are fine for now, but if were to add more beyond these it might make sense to revisit this utility function approach.

@vbarua vbarua merged commit e574913 into substrait-io:main Jul 11, 2024
12 checks passed
@Blizzara Blizzara deleted the avo/support-function-options branch August 20, 2024 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants