Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Patterns] Could we allow == patterns to have a receiver pattern? #2620

Closed
leafpetersen opened this issue Nov 9, 2022 · 18 comments
Closed
Assignees
Labels
patterns Issues related to pattern matching.

Comments

@leafpetersen
Copy link
Member

The current proposal for patterns has a pattern of the form == c for a constant c, and a derived pattern != c. The semantics of these are the same as the usual v == c and v != c expressions, where the receiver of the equality method call in each case is the implicit matched value of the pattern.

This pattern does not allow matching further on the matched value. Could we either allow (optionally) or require a receiver pattern? That is, we could have p == c which is essentially equivalent to == c && p and p != c which is essentially equivalent to != c && p?

This is nice syntactic sugar in general, but also allows for an (arguably) more readable if case syntax for checking for null:

if (nullableField case var name != null) {
  // name is non-nullable in here

If combined with #2563 , then for explicit field reads you could replace if (foo.bar case var bar?) with if (foo case (:var bar != null)) which might arguably be more readable and more concise (when bar is a long name)?

cc @mit-mit @munificent @lrhn @eernstg @kallentu @stereotype441 @jakemac53 @natebosch

@leafpetersen leafpetersen added the patterns Issues related to pattern matching. label Nov 9, 2022
@lrhn
Copy link
Member

lrhn commented Nov 9, 2022

Using relation patters as pattern suffixes instead of stand-alone can work.
If we don't want to have both, it would be case _ == 4: instead of just case == 4:, which actually reads pretty well.

I worry that someone will start piling the operators up, so we get var x >= 0 <= 4, which doesn't read that well.

But then, if it's not a stand-alone pattern, we can allow both a prefix and a suffix versions, so you can write:

 case 0 <= var x < 10: ...

You can also write a lot of ugly things, like 10 > 0 <= var x or var x != 1 != 2 != 4, but it's code. You can always make it ugly. As long as you can also make it pretty, we'll just hope you do that instead.

And we can get rid of the ? suffix operator and use != null instead, on any pattern, not just using & as != null & ....

Looks promising :)

But in retrospect (the day after), it makes parsing horrible because of constant patterns, where 4 < 5 can be mean either 4 & < 5 or >= 4 & 5 (one of which matches only 4, the other matches only 5).
If we only allow one real pattern as operand of a relational pattern, and the other one must be a constant value, then we probably can't allow the constant to go on either side of the operator, not without significant ambiguity.

Then we're back to allowing relationalPattern ::= primaryPattern? relOp constant, with an omitted pattern being equivalent to _.
I still like that. I particularly like var x != null as a more readable alternative to var x?.

Maybe we should restrict it to one relational operator, to avoid var x >4 <6. Or maybe it's fine, case >=0 <5: ... reads well enough. But we probably want to read the operators left-to-right, so that case !=null >5 works on an int?.
Which again suggests at most one, and using & for the rest.

@stereotype441
Copy link
Member

This is nice syntactic sugar in general, but also allows for an (arguably) more readable if case syntax for checking for null:

if (nullableField case var name != null) {
  // name is non-nullable in here

Ooh, I really like how this looks. In fact I like it so much that I would be tempted to drop the null-check pattern from the language entirely and just make users always write != null instead of ? when they want a null-check pattern. Yeah, it's way more typing, but it's so much more intuitive that it might be worth it.

Using relation patters as pattern suffixes instead of stand-alone can work. If we don't want to have both, it would be case _ == 4: instead of just case == 4:, which actually reads pretty well.

Agreed. I think the extra _ is a small price to pay for the extra readability.

I worry that someone will start piling the operators up, so we get var x >= 0 <= 4, which doesn't read that well.

I would just prohibit this by just saying that these operators are non-associative in patterns just like they are in expressions. If someone really wants to pile them up they can always do (var x >= 0) <= 4 (which is still ugly but IMHO nowhere near as bad as the unparenthesized version).

But then, if it's not a stand-alone pattern, we can allow both a prefix and a suffix versions, so you can write:

 case 0 <= var x < 10: ...

As appealing as this is, I don't want to go there because of the parsing difficulties Lasse mentioned above.

@DanTup
Copy link

DanTup commented Nov 10, 2022

I particularly like var x != null as a more readable alternative to var x?.

I just learned about var x? today, and it wasn't particularly obvious to me what it meant. I do usually prefer terse code, but I think I would prefer to always write x != null here. I think I'd also find it odd if x? means x != null here but I'm not also able to use it in an if``/else statement to mean the same:

if (x?) {

(if it worked there too, maybe I'd be more likely to use it in patterns, as it's then it wouldn't feel a little obscure)

@munificent
Copy link
Member

Really cool idea. I like how it looks. I like that using _ on the left makes relational patterns look less weird. Here's some before/after of the null check patterns I used in my test refactoring of dart_style:

if (_contiguousFunctions(arguments) case (var start, var end)?) {
if (_contiguousFunctions(arguments) case (var start, var end) != null) {

if (_blockToken(argument) case var bracket?) blocks[argument] = bracket;
if (_blockToken(argument) case var bracket != null) blocks[argument] = bracket;

case var block?:
case var block != null:

case MethodInvocation(:var target?) =>
case MethodInvocation(:var target != null) =>

if (block.argument case argument?) {
if (block.argument case var argument != null) {
// Or:
if (block case Block(:var argument != null)) {

I think all of the != null ones are clearer, though they are more verbose.

I definitely don't think relational patterns should be associative. No chaining. Dart already prohibits chaining comparison expressions, and I can't see any value in allowing chaining == or != patterns.

Range and comparisons

If we require a left operand on the relational pattern, then range and comparison patterns get more verbose:

switch (n) {
  case _ < 0: print('negative');
  case _ >= 0 & _ <= 100: print('small');
  case _ > 100: print('big');
}

// Versus:
switch (n) {
  case < 0: print('negative');
  case >= 0 & <= 100: print('small');
  case > 100: print('big');
}

I'm on the fence with this. I like the simplicity and clarity of always requiring a LHS, but it feels weird to require a bunch of essentially pointless _ patterns.

Resolved type for the operator

Here's an interesting example:

Object foo();
switch (foo()) {
  case int _ < 3: print('A number less than three.');
}

This is a compile error if we call < on the matched value type (Object) and not the type we know it must have for the LHS pattern to match (int). I can see users expecting this to work, and it would be nice if it did.

Specifying that is a little interesting. We could say that we invoke the operator method using the required type of the LHS pattern, which means defining required type for all patterns, even ones that don't themselves type check. (Currently, we only define the required type for patterns that need it themselves.)

Or we could rely on something like type promotion. We'd say the semantics of p op c are closer to p & p op c and have the LHS of the & potentially promote the matched value to the matched type.

Expanding required type doesn't quite work because the required type of null-assert and cast patterns needs to be loose to allow them to be used in pattern variable declarations. Probably the easiest way is to just specify the "demonstrated type" for each kind of pattern (which will be the required type for most, GLB for &, LUB for |, and the casted type for ! and as).

Evaluation order

Pattern matching can have side effects, so the evaluation order is visible. I think we probably want to evaluate the LHS pattern first before calling the operator. That's the order you get for expression evaluation where operands are (by necessity!) evaluated first, and it's what you see reading right to left.

Unary patterns

We may want to allow a unary pattern on the left for things like:

List<int?> allowLeadingNulls = ...
switch (allowLeadingNulls) {
  // If the first element is null, we know the second won't be:
  case [0, _! < 3]: ...
}

And:

List<Object?> json = ...
switch (json) {
  case ['update-age', _ as int < 0]:
    throw FormatException('New age must be a positive number.');
}

Those also require the relational operator to be called on the value after we know its type based on the LHS pattern.

I don't see much need to support constant op pattern forms where the pattern is on the right. Any type that doesn't define the equality and comparison operators symmetrically has already lost the game in my book.

No unary patterns

The other option is to require the LHS pattern to be a primaryPattern. That's variables, destructuring patterns, and () for grouping. The above examples would then be written:

// Instead of:
case [0, _! < 3]: ...
// Write:
case [0, (< 3)!]: ...

// Instead of:
case ['update-age', _ as int < 0]:
// Write:
case ['update-age', (< 0) as int]:

In that case, I'd probably want to allow omitting the LHS.

But I lean towards allowing unary patterns since ! and as expressions are both valid LHS operands to relational expressions.

Next steps

I'd like to get more feedback since this is a fairly large change, but I'm definitely interested. If it were up to me, I would probably:

  • Require a pattern on the LHS of relational patterns. I wouldn't make it optional. Brevity is nice, but there's something appealing about these infix operators always showing up in an infix location. We could always make it optional later if we want.
  • Get rid of null-check patterns since != null patterns cover them. Also, I don't know anyone who actually likes the ? null-check syntax.
  • Make the LHS pattern be a unaryPattern. That allows null asserts and casts on the LHS but prohibits chaining relational patterns.

@munificent
Copy link
Member

The more I think about this, the less sure I actually understand the semantics around how types get inferred or promoted. Consider:

int? maybeInt = 0;
switch (maybeInt) {
  case var x != null: print('x is not null');
  case int y > 0: print('$y is positive');
  case var z! < 0: print('$z is negative');
}

The first question is do we expect all of these cases to be valid?

If so, the intent is that x, y, and z should all have type int, but for different reasons. In the first case, the way I can see this flowing is:

  1. We invoke != null on the matched value.
  2. If that returns true, then we promote the value to non-null and recurse into the subpattern.
  3. There we use the now-promoted type of the matched value to infer the type int for x.

For the second case, I would imagine the execution is like:

  1. We send the matched value into the int y subpattern.
  2. If that matches, then we promote the value to the demonstrated type of that pattern, int and then invoke the > operator on that type.

The third case is similar to the second.

These are obviously different semantics. Which do we pick?

I suspect the right answer is that we pick the first semantics (which is what Leaf proposed) and the other two cases here do not work. Instead, if you want to call a relational operator that isn't defined on the matched value's type, you have to explicitly get it promoted first, using something like:

case int y & y > 0:
case _! & var z < 0:

Another option is to not treat equality and comparison patterns uniformly. We could have == and != work like Leaf proposes. It's always safe to call the operator before recursing into the LHS because those operators are defined on all types anyway.

Then for the comparison operators, we could continue to not allow a pattern on the LHS at all, which makes range patterns (which are the main use case) terse:

case > 0 & < 100: print('within (0:100)');

@leafpetersen
Copy link
Member Author

int? maybeInt = 0;
switch (maybeInt) {
  case var x != null: print('x is not null');
  case int y > 0: print('$y is positive');
  case var z! < 0: print('$z is negative');
}

My thinking is that probably the second two shouldn't work. We could potentially say that for p > c we use the "demonstrated type" for p as discussed in #2622 when checking for the > operator. If it just falls out, then that seems fine, but it's a little weird.

For var z! < 0, can't this be written anyway as (var z < 0)!?

We could consider adding an instance check pattern: p is Type which checks that the matched value is Type, and then checks p using Type as the matched value type. Then you get (y < 0) is int as an option.

Personally, I mostly find the variants written using & a bit more readable in general.

For == and != specifically, it seems to me that we do want to/need to do something special there to reflect the fact that == is not just a method call, but also a null check (in which case the == method is not called).

So I think you would say that for p == c:

  • If c has type Null
    • The matched value type for p is Null
    • At runtime, the match succeeds if the matched value is null, and the matched value matches p and otherwise fails
      • In the null case, we do run p, can it have side effects? I think with extension getters on an object pattern it can:
        • Example: (Null(extensionGetter: _) == null
      • In the not null case, p is never run.
        • Example: with matched value [], the pattern List(:first) == null will fail, rather than throw an exception.
        • We could choose to run p as well?
    • The demonstrated type is Null
  • If the type of c is not Null
    • The matched value type for p is the matched value type for the whole pattern
    • At runtime, the match succeeds:
      • if c is null and the matched value is null, and the matched value matches p
      • if c is not null and the matched value is not null, and the matched value matches p, and the equality operator on the matched value returns true when called with c
      • Otherwise, the match fails.
      • In the last case, p is never run.
        • Example: with matched value [], and const int? c = null the pattern List(:first) == c will fail, rather than throw an exception.
        • We could choose to run p as well?
    • The demonstrated type is the matched value type

I've based the above on the static type of c rather than the value of c to avoid entangling constant evaluation. We could further restrict this to only be the literal null if desired.

There's a further tweak we could maybe do when the static type of c is non-nullable: in that case we could make the matched value type for p be the non-nullable version of the incoming matched value type, and also use that as the demonstrated type.

The p == c case isn't that interesting, the interesting one is the p != c case.

For p != c:

  • If c has type Null
    • The matched value type for p is the non-nullable version of the matched value type.
    • At runtime, the match succeeds if the matched value is not null, and the matched value matches p and otherwise fails
      • In the null case, p is never run.
    • The demonstrated type is the non-nullable version of the matched value type.
  • If the type of c is not Null
    • The matched value type for p is the matched value type for the whole pattern
    • At runtime, the match succeeds:
      • if c is null and the matched value is not null, and the matched value matches p
      • if c is not null and the matched value matches p, and the equality operator on the matched value returns true when called with c
      • Otherwise, the match fails.
      • In the last case, p is never run.
    • The demonstrated type is the matched value type

I'm not sure I 100% have all that right, but I think something like that is where we might go with that?

@leafpetersen
Copy link
Member Author

Thinking about this a bit more, I think that we can probably just boil this down to #2622, and treat this entirely as syntactic sugar. That is, given a matched value type of T, we say that:

  • The demonstrated type of == c is Null when c has type Null
  • The demonstrated type of == c is NONNULL(T) when c has non-nullable type
  • The demonstrated type of != c is NONNULL(T) when c has type Null

The we say that p op c is always just shorthand for op c & p.

I think this gives us what we want. != null & p promotes the matched value type to the non-nullable version on the RHS, and == null | p does the same.

On order of evaluation, you say:

I think we probably want to evaluate the LHS pattern first before calling the operator. That's the order you get for expression evaluation where operands are (by necessity!) evaluated first, and it's what you see reading right to left.

The semantics I propose above doesn't correspond to this: you call the operator before matching the LHS pattern. I think this is actually what you want - it corresponds to "outside-in" evaluation, which is what all of the other patterns do. This is different from expression evaluation order, but it feels like the right thing to me. Some examples:

  • [var x] as List<int> is specified to do the cast first, and the sub-pattern second (which makes sense to me, otherwise this is not an irrefutable pattern)
  • [var x]? is specified to do the null check first, then the inner pattern, which again makes sense to me, since otherwise there is an implied (redundant) null check on the inner pattern.

So I think the consistent (and right) thing to do for p op c is to do op c first, then p, which conveniently gives the right promotion behavior, and matches the de-sugaring semantics I propose above.

@stereotype441
Copy link
Member

Thinking about this a bit more, I think that we can probably just boil this down to #2622, and treat this entirely as syntactic sugar. That is, given a matched value type of T, we say that:

  • The demonstrated type of == c is Null when c has type Null
  • The demonstrated type of == c is NONNULL(T) when c has non-nullable type
  • The demonstrated type of != c is NONNULL(T) when c has type Null

If we did this, the demonstrated type would actually "promote" things in a lot more cases than we currently do type promotion for == in non-pattern code. Currently:

  • if (x == c), where c has type Null, does not promote x to Null. See 'null' isn't promoted to 'Null'. #1505 (comment) for details of what happened when I tried to add that functionality.
  • if (x == c), where c has a non-nullable type, does not promote x to NONNULL(T).
  • if (x != c), where c has type Null, only promotes x to NONNULL(T) if c is a literal null.

Is there a consistency argument for making demonstrated types behave similarly?

The we say that p op c is always just shorthand for op c & p.

I think this gives us what we want. != null & p promotes the matched value type to the non-nullable version on the RHS, and == null | p does the same.

I'm ok with this as a way to think about it. My usual caveat applies, which is "please don't actually spec it this way, though, because the implementations will not desugar it before running type analysis, so if we spec it in terms of a desugaring, we place a big cognitive burden on the implementers to "re-sugar" the spec while doing the implementation; and that leads to mistakes." 😃

On order of evaluation, you say:

I think we probably want to evaluate the LHS pattern first before calling the operator. That's the order you get for expression evaluation where operands are (by necessity!) evaluated first, and it's what you see reading right to left.

The semantics I propose above doesn't correspond to this: you call the operator before matching the LHS pattern. I think this is actually what you want - it corresponds to "outside-in" evaluation, which is what all of the other patterns do. This is different from expression evaluation order, but it feels like the right thing to me. Some examples:

  • [var x] as List<int> is specified to do the cast first, and the sub-pattern second (which makes sense to me, otherwise this is not an irrefutable pattern)
  • [var x]? is specified to do the null check first, then the inner pattern, which again makes sense to me, since otherwise there is an implied (redundant) null check on the inner pattern.

So I think the consistent (and right) thing to do for p op c is to do op c first, then p, which conveniently gives the right promotion behavior, and matches the de-sugaring semantics I propose above.

Agreed.

@lrhn
Copy link
Member

lrhn commented Nov 14, 2022

Looking at var x != null & > 0, for that to actually work, we need to treat p != null as doing the != null test before p, and p > 0 to do > 0 after p.

That makes sense, because != is type promoting, and > 0 is type restricted. The >0 only works on some types, but doesn't imply anything new about that type. The != null` works for all types, but does imply something about the type of the value.

So, we should not treat == and > the same, for good, practical reasons - they are not the same. (When either order would work, then it also doesn't matter which one we pick.)

If we don't like doing the == and > operations in different order, I'd just drop the combining >s, and require you to write p & > 0 instead of p > 0. Because that's the order you actually want, and it's completely clear what it means.
Then we still have p != null and p == null working outside-in.

@leafpetersen
Copy link
Member Author

Looking at var x != null & > 0, for that to actually work, we need to treat p != null as doing the != null test before p, and p > 0 to do > 0 after p.

I don't follow you here. There is no p to stage here WRT the comparison. That example is (parenthesized and "de-sugared": ((!= null) & var x) & (> 0), for which all of the order works out fine. Did you mean perhaps var x != null > 0? I agree that that doesn't work, you need to do the null check first. So you'd need to write (var x > 0) != null (parenthesizing to make it clearer).

Am I missing something? I think you can say p > 0 does the comparison and then matches p just fine.

@lrhn
Copy link
Member

lrhn commented Nov 14, 2022

My bad, there shouldn't be a & in the first code, and I did mean var x != null > 0.

The examples I would want to work are:

case (nullableInt) {
  case int x > 0: // x is non-negative int
  case var x != null: // x is int

If the matchee has type int?, then the first case must promote before checking >0, and the second case must promote before binding x (if we want x to be non-nullable, a p != null to replace p?).

If we can't do case int x > 0, then I don't see much reason to allow a combining pattern > 0 at all, and we might as well require case int x & > 0. And that's fine, perfectly happy with that.

@leafpetersen
Copy link
Member Author

@lrhn yes, that's a good example. I think this is perhaps pointing at the fact that there isn't a general instance check pattern (other than Object patterns). The only instance checks are either Object patterns or variable bindings. You could imagine have a general instance check pattern, e.g. p is int. I don't think that reads very nicely for your example: (var > 0) is int. Another syntax you could imagine is to generalize int x to int p. Then your example could be int (var x > 0).

I don't know that I'm strongly opposed to evaluating p > c in the "opposite" order, but... it's a little weird. I'm slightly worried that there will be other cases where we'd expect the opposite, but maybe not. Mostly it shouldn't matter what order you do things in, I think it's mostly around promotion that it matters.

@lrhn
Copy link
Member

lrhn commented Nov 15, 2022

Generalizing int x to int p, and allowing int x, suggests that we allow un-prefixed identifiers to be binding inside the int, so also allowing int (x).
We likely don't want to allow int (var x). (Or we can allow it, and just assume our users aren't actively trying to hurt themselves). Still means that the int(...) changes the meaning of identifiers inside the (...), so int _ & someConstant is not the same as int (someConstant). (But it's obviously also not the same as int someConstant.)

It also means we have both prefix and suffix operators, so we need precedence. Previously we only had suffix operators. I think making prefix operators always bind stronger than suffix operators might work, so var x as int should be (var x) as int. Suffix operators are generally promoting, which you want to do first. Prefix int can also be promoting in refutable patterns, but then you likely don't need both, and it's probably not wrong to do the suffix ones first. Except if we want int x > 0, which is another strike against that.

If int is a pattern prefix, then int (bitLength: >0) should mean that the object/property/extractor pattern (bitLength: >0) can work even without the prefix int. And that would be awesome, and I think we want that anyway. (If the matched value type is not a record type and not a supertype of record, then (extractors) is equivalent to MatchedValueType(extractors)).

@stereotype441
Copy link
Member

Reflecting back on this after some time away from it, I think what I would personally favor is:

  • Add support for pattern == expression and pattern != expression, with the semantics of "first do the equality check, then match the subpattern", and appropriate type promotion in the case of pattern != null.
  • Drop support for pattern?, on the grounds that it's equivalent to pattern != null, but people find it confusing.
  • Drop support for == expression and != expression patterns, on the grounds that they're not likely to be used very often, and can be equivalently expessed using _ == expression and _ != expression.
  • Leave the existing < expression, <= expression, > expression, and >= expression patterns. Don't support pattern < expression, pattern <= expression, pattern > expression, or pattern >= expression. My rationale: I don't want to have to explain to users about the execution order issues we've been discussion here. Also it feels like a half baked feature if we give users a really intuitive way to express an open ended range (e.g. int x >= 0) but every way of expressing a closed range is weird and ugly (e.g. (int x >= 0) < 10).

This means that if you want to match a variable and range check it you have to use some extra characters (e.g. int x & >= 0 & < 10) but at least that means there's no confusion about ordering semantics, and checking a closed range is just as ugly as checking an open range.

I'm a little sad that this means that equality operators and relational operators aren't treated consistently, but I think it's justified because their use cases are so different.

@lrhn
Copy link
Member

lrhn commented Nov 30, 2022

Would there ever be a reason to do pattern == constant?

When you've already recognized that the value == constant, what more is there to test for? Which means it probably always boils down to var x == constant or _ == constant (and even binding seems spurious - why not just constant instead of x? - unless you are somehow de-canonicalizing the value.)

Which means that pattern != constant is the only variant which makes any sense.

And still not a lot of sense when constant is anything but null, because you don't get a lot of information from something not being equal to a specific value.
I don't expect to see case (var x != 1) != 2 && complicatedRemainingPattern: ....
(If not for the complicated remaining pattern, I'd definitely put the !=s in a when clause).
And you can still do != 1 & != 2 & var x for that, which is more readable IMO.

So maybe it really is just pattern != null that we want, as a replacement for pattern?.
It's longer, but possibly more readable to people who haven't been noodling on the pattern syntax for months.

(So, TL;DR, remove pattern ? and introduce pattern != null instead. Keep everything else the same.)

@stereotype441
Copy link
Member

(So, TL;DR, remove pattern ? and introduce pattern != null instead. Keep everything else the same.)

I could be talked into this. It has the advantage that it's way less implemetnation work 😃

@munificent munificent mentioned this issue Dec 9, 2022
@munificent
Copy link
Member

Lasse and I spent a bunch of time talking about this today. As Lasse states elsewhere in this issue, it's not generally useful to allow a receiver pattern before == or != for any other use except != null. So I think the answer to the original proposal here is, no, let's not generalize in that way.

The other more limited suggestion here is to change the null-check pattern's syntax from a postfix ? to a postfix != null. That's nice for clarity, but it's much more verbose. More worrisome is how it would potentially interact with other pattern syntax. I think we sidestep the worst monstrosities already by not allowing unary patterns to nest. So there's no:

case var foo != null! is int: ...

But there is still weird things like:

case (var foo is int) != null:

Overall, I feel like != null looks too much like an expression. It feels like it's in the uncanny valley between an expression and a pattern and I don't quite know how to read it. After discussing with Lasse a bunch, we think the best approach is to keep the current postfix ? syntax. It doesn't make a good first impression, but it is terse and once you've learned it, the semantics are pretty clear.

If we do something like let in the future, then a null-check pattern may end up superseded by that. But in the meantime, it's a useful terse pattern that I believe will end up being used fairly often. When I've migrated code to use patterns, it comes up pretty frequently.

Unless anyone strongly objects, I'll close this issue in a day or two and call it resolved.

@munificent
Copy link
Member

OK, closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patterns Issues related to pattern matching.
Projects
None yet
Development

No branches or pull requests

5 participants