Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should LUB be partial function for records? #2539

Closed
leafpetersen opened this issue Sep 30, 2022 · 6 comments
Closed

Should LUB be partial function for records? #2539

leafpetersen opened this issue Sep 30, 2022 · 6 comments
Assignees
Labels
records Issues related to records.

Comments

@leafpetersen
Copy link
Member

In the discussion of issue #2528, @munificent made the point here that in general upper bounds on records with different shapes is probably more likely to indicate a programming error than a desired outcome, and that the result of using upper bound is likely to be confusing. I'm not 100% sure, but I think it would be workable to specify the upper bound meta-function to be a partial function for record types, which is undefined when the two inputs are record types with different shapes. We would have to adjust the various places where upper bounds are used in the meta-theory to make it a static error (usually an inference failure) if the upper bound did not exist. But at least for the obvious places I can think of this should work out. It does imply that constraint merging during inference also becomes a partial function, which is also something new - so there may be some implementation concerns, but that code is (or at least was) fairly local. Some examples:

var a0 = true ? (3, 3) : ("hello", "hello"); // Ok, static type (Object, Object)
var a1 = true ? (3, 3) : ("hello",); // Static error, no type can be inferred
var a2 = <Record>[(3, 3), (3,)]; // Ok - no upper bound result is required
var a3 = [(3, 3), (3,)]; // Error, upper bound does not exist, not type can be inferred
var a4 = [(3, 3) , (3,) as Record]; // Ok?  We could choose either way, but as described above, this upper bound exists

Thoughts on this?

cc @munificent @lrhn @eernstg @jakemac53 @natebosch @stereotype441 @kallentu @scheglov @chloestefantsova

@lrhn
Copy link
Member

lrhn commented Sep 30, 2022

TL;DR: I'm split.

I can see the issue of getting unusable LUB results, but restricting a solution to record types feels spurious.

As I read the idea here, it's:

If we find an upper bound on a number of types, T1, ..., Tn, and all of the types are record types, but not all
of the record types have the same shape, then we find no upper bound. Otherwise we find the pointwise
upper bounds of the same-position field types.

Basically, it's never OK to infer Record unless one of the types is Record to begin with.

And then we have to recover from not finding a type where we expect one, typically by introducing a compile-time error.

So UP((1, 1), (1, (2, 3)) would be (int, Object).
It means that [1, (1,), (1, 2)] is a List<Object>, but [(1,), (1,2)] is invalid instead of a List<Record>.
I guess that's OK, and somewhat explainable.

The example, in the issue, of var list = [1, 2, ...nums, othernums];, where you forgot to spread othernums, is just as bad as the record case.
Same if you extracted a list to a variable, and forgot the braces in place, var list = [1, 2, ...[nums]]; (braces shouldn't have been there, whoops).
It's a list of elements, where one is not like the others. Sometimes that's deliberate, sometimes it's an error, so it sounds like a job for a heuristic helper tool, not the language specification, to decide whether it makes sense.

The argument for making records special is that you can more easily make typos in record literal field names, and nothing will stop you because all names are valid for some type. That is a concern. All other names in the program must be declared somewhere, but record literal named field names can be anything. The corresponding type is silently introduced along with the literal, no name is invalid.

I hope our tools will eventually become smart enough that to suggest "You wrote '(fooo: 42), did you mean (foo: 42)" when ({int foo})already occurs in the program, and({int fooo})doesn't. And allowing you to complete()` to common record field names, prioritized by relevance.

Maybe it won't be a problem then, but we're not there yet.
So that's an argument for helping people by forcing them to acknowledge whenever there s such a potential record shape clash. But it may also be sufficient to make it a hint, not a language rule, because the analyzer might be able to spot a reason for the code being completely valid.

Hmm. Take a "tagged value" list like var taggedValues = [(pointKind, (1.0, 2.0)), (colorPointKind, (1.0, 2.0, Color.red))];.
That list is valid and deliberate in having different record types in the second position. Maybe. Maybe maybe it should be <(int, Object)> instead of <(int, Record)>. It's impossible to know.

The more I try to come up with examples, the harder it becomes to come up with an example where I want to infer Record. Which means that not inferring Record may not get in anyone's way, and when it does, the code might still be better/more readable with a explicit type.

I'm always worried when LUB starts inventing types.
As long as we can prevent you from getting dynamic as the LUB result, the hypothesis is that you'll get a type error further down, and you'll track it back to the original error. Which is usually true, but not when the result ends up going into a string interpolation or something else which accepts any value.
That's not always true, and not always a good developer experience even when it is true.

@eernstg
Copy link
Member

eernstg commented Sep 30, 2022

We used to have a similar discussion about the treatment of the conditional expression and other locations where an UP could yield Object or a top type.

const one = 1.5;
const two = "two";

void main() {
  List<int> numbers = [one, two]; // Silently accepted back when we had implicit downcasts.
}

This was a very serious footgun, which is now gone. However, this problem still exists in more subtle forms, and the cases involving record types seem to be perfectly good examples of the same issue.

For example, this behavior of UP may give rise to unexpectedly imprecise type arguments (foo([one, two]) gets inferred as foo<Object>(...) rather than foo<int>(...)), and/or it may cause unexpected compile-time errors. Examples involving record types are given elsewhere in this issue.

In any case, the underlying cause is that we incur a massive loss of precision at an upcast from any record type to Record, as well as at a non-trivial upcast from any type to Object, and at any upcast from a function type to Function. The point is that the upcast target is a type with a very large number of subtypes, and this fact motivates special caution.

I think we could consider the same remedy which was proposed back then: Whenever UP yields the result Record, Function, Object, or a top type, and none of the arguments to UP is said result type, that type is marked as having a massiveLossOfInformation. Subsequently, if a type with a massiveLossOfInformation is used to infer the type of a variable, or it is used to bind an actual type argument, a diagnostic message is emitted.

Given that there is nothing unsound about these situations, I'd suggest that we use a lint or a hint to give developers this heads-up.

@munificent
Copy link
Member

I can see the issue of getting unusable LUB results, but restricting a solution to record types feels spurious.

Me too. It's good to explore the idea, but it feels weird to take record types out of LUB when really it's LUB in general that's the problem.

The argument for making records special is that you can more easily make typos in record literal field names, and nothing will stop you because all names are valid for some type.

That's equally true of function types, which are also structural. Even worse, the Function type lets you dynamically call it with any argument list:

test(bool b) {
  var f = b ? (({required int foo}) => 1) : (({required int fooo}) => 2);
  f(fooooo: 'nooooo');
}

main() {
  test(false);
}

This program compiles fine and then throws at runtime. At least when LUB gives you Record, you get a type you can't try to do anything with.

I think we could consider the same remedy which was proposed back then: Whenever UP yields the result Record, Function, Object, or a top type, and none of the arguments to UP is said result type, that type is marked as having a massiveLossOfInformation. Subsequently, if a type with a massiveLossOfInformation is used to infer the type of a variable, or it is used to bind an actual type argument, a diagnostic message is emitted.

Given that there is nothing unsound about these situations, I'd suggest that we use a lint or a hint to give developers this heads-up.

I like this a lot.

I could see us long term maybe trying to move away from LUB in general (to something like MAX()), but I think just doing it for record types would feel weird and not actually help that much.

@leafpetersen
Copy link
Member Author

I can see the issue of getting unusable LUB results, but restricting a solution to record types feels spurious.

Me too. It's good to explore the idea, but it feels weird to take record types out of LUB when really it's LUB in general that's the problem.

FWIW, I wish that I had done so for other types back when we designed strong mode (except that of course, the P0 with strong mode was accepting as much existing code as possible, so probably I couldn't have done so).

That's equally true of function types, which are also structural. Even worse, the Function type lets you dynamically call it with any argument list:

I only somewhat agree here, because functions have width subtyping, unlike records. So then two function types combine to form another actual function type, it can be useful. When they combine to form Function, not so much.

Given that there is nothing unsound about these situations, I'd suggest that we use a lint or a hint to give developers this heads-up.

I like this a lot.

Yeah, we could add something like this to the strict inference flag. I don't know whether we could do so for Function without it being too breaking, but we could definitely do this for Record. cc @srawlins this could be something that @kallentu could take on if we decide to do it.

@srawlins
Copy link
Member

srawlins commented Oct 8, 2022

Great idea!

We haven't implemented the checks, but there is the proposal for two types LUB-ing to Object, so I could see similar checks for two record types or two function types LUB-ing to Record, or Function, respectively.

@munificent
Copy link
Member

I'm going to close this out. I think the answer is "no", and we'll just stick with the current specified LUB. I could definitely see us adding lints in the future to avoid LUB with weird pairs of types and in that case, records with different shapes are a prime candidate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
records Issues related to records.
Projects
None yet
Development

No branches or pull requests

5 participants