Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patterns] Can we make type declarations and type tests look different? #2677

Closed
eernstg opened this issue Dec 1, 2022 · 4 comments
Closed
Labels
patterns Issues related to pattern matching. question Further information is requested

Comments

@eernstg
Copy link
Member

eernstg commented Dec 1, 2022

I'm a little bit worried about patterns taking a step backwards, so to speak, in that they use a syntax which is otherwise declaring the type of a variable in a way that turns it into a dynamic type test. We're going to get a match failure rather than a dynamic error, but otherwise it's very similar to the implicit downcasts that we had a couple of years ago.

My main worries are (1) it is difficult to discern the actual semantics based on reading the source code, and (2) it's easy to get those type annotations wrong, and then they are just silently turned into type tests (this makes the construct similar to implicit downcasts, except that we're getting different/buggy control flow rather than getting an exception).

void main() {
  List<num> xs = [1, 2, 3];
  switch (xs) {
    case [Object o, num n, int i]: ...;
  }
}

The static type of xs is used to determine some elements of the list pattern (in particular, inferring the type argument num). The three variables are treated differently.

We might want to introduce a variable like o with a more general type because we might want to assign other values to o than the value that it gets via pattern matching. So it's not necessarily a mistake to give o the declared type Object, and we have to do it because the inferred type would otherwise be num, and then we couldn't assign those other values later on.

The variable n gets the type num that we would otherwise have gotten by inference. This may be useful for documentation purposes.

The variable i gets the specified type int, but pattern matching will now include a type test: The element at [2] which is known to be a num may or may not be an int as well, but we will test this at run time.

I think it would be helpful if we used a different syntax for the cases where the declared type is guaranteed to be satisfied by the matched value, and the cases where it may or may not hold, and there will be a type test at run time.

One possible approach could be to simply say that the version with a dynamic type test is not supported: If we declare a type which is not statically known to hold for the matched value then it's a compile-time error. The developer must then perform the dynamic type test in some other way (for instance, when i is int).

Another possibility would be to say that the type test can't be requested using the syntax int i (that's still an error), but you could use a different syntax, like var i is int. This would be similar to patterns using != null, and it would provide the promotion based on syntax which is already recognizable as such.

If we do this then num n would actually be more useful: It would serve as a contract between n and the surrounding source code: I'm expecting n to be able to have the type num, and I'd like to get a heads-up if this is not the case.

(Note that this wouldn't change anything for object patterns: We can still match an object using a dynamic type test for int by matching it with the object pattern int()—it is unsurprising that this kind of matching includes a type test, because the ability to handle different cases corresponding to a set of distinct subtypes of the scrutinee type is absolutely standard for all pattern matching mechanisms.)

@eernstg eernstg added question Further information is requested patterns Issues related to pattern matching. labels Dec 1, 2022
@munificent
Copy link
Member

munificent commented Dec 2, 2022

Sorry for the essay...

I spent a little time refreshing my memory of what other languages in the space do:

Swift

Swift doesn't support normal type annotations inside patterns at all. So you can write this:

let a: Int = 1

But not this:

let (a: Int, b: Int) = (1, 2)

They must special case the variable declaration syntax for single-variable declarations.

Variable binders in patterns are thus usually unannotated and their types are inferred. If you want to test the matched value's type at runtime, you can use a type cast pattern:

class A {}
class B : A {}

let a: A = B()
switch a {
  case let b as B: print("was actually a B")
  //         ^^^^ type cast pattern
  //   ^^^^^ subpattern
  case _: print("default")
}

Type annotations aren't the only possible patterns that do type tests, though. What about destructuring patterns like lists and tuples? As far as I can tell, Swift sidesteps that. They have no list patterns at all. Tuple patterns can only be used to match against values that are statically known to be tuples. If you upcast the tuple to Any, it won't let you use a tuple pattern to match it:

let value: Any = (1, 2)
switch value {
  case (let a, let b): print("was tuple")
  case _: print("default")
}

This fails to compile with:

tuple pattern cannot match values of the non-tuple type 'Any'

So it looks like Swift does what you want here where any runtime type tests are clearly marked as such in patterns.

Scala

You can type annotate variables in patterns. When you do, it performs a runtime type test:

class A {}
class B extends A {}

val a: A = new B()
a match {
  case b: B => println("it was a B")
  case _ => println("it was not a B")
}

This prints "it was a B".

Likewise, tuple patterns will do a runtime type test that the object is a tuple if the matched value isn't statically known to be a tuple type:

val a: Object = (1, 2)
a match {
  case (_, _) => println("it was a tuple")
  case _ => println("it was not a tuple")
}

C#

C# has declaration patterns which use the normal type annotation syntax and perform a type test at runtime:

Object value = "a string";
switch (value)
{
  case String s:
    Console.WriteLine("it was a string");
    break;
  default:
    Console.WriteLine("it was not a string");
    break;
}

This compiles fine and primts "it was a string".

However, list patterns do not do a runtime type test. This is fine:

String[] value = {"a string"};
switch (value)
{
  case [String s]:
    Console.WriteLine("it was a list of string");
    break;
  default:
    Console.WriteLine("it was not a string");
    break;
}

But if we change the matched value's type:

String[] array = {"a string"};
Object value = array;
switch (value)
{
  case [String s]:
    Console.WriteLine("it was a list of string");
    break;
  default:
    Console.WriteLine("it was not a string");
    break;
}

You get a compile error:

Compilation error (line 12, col 9): List patterns may not be used for a value of type 'object'. No suitable 'Length' or 'Count' property was found.
Compilation error (line 12, col 9): Cannot apply indexing with [] to an expression of type 'object'

Other languages

There aren't too many other languages we can use as a reference. To be relevant, the language needs to:

  1. Be statically typed.
  2. Have subtyping.
  3. Have pattern matching.

Rust doesn't have subtyping (except for with regard to lifetime annotations) as I understand it.

Elixir is dynamically typed.

Haxe has all three. It doesn't seem to have any type test patterns or type annotation patterns as far as I can tell.

OCaml has all three but objects are sorted of bolted on. As far as I can tell, it doesn't use normal type annotations (which are supported in patterns) to match on different polymorphic variants (which are I think it's name for inheritance?).

Dart

When I first designed this part of the proposal, I was mostly looking at Scala which does allow this. I can definitely see the concern here. It's a little odd that type annotations have no runtime effect anywhere else in the program but affect control flow when used in a refutable pattern.

If we don't want type annotated variables to type test, then we need either:

  1. A general type test pattern that takes an arbitrary subpattern. Maybe something like:

    subpattern is type
    
  2. A specifically typed tested variable pattern. Perhaps something like:

    is type identifier
    

Either way, we also have to think about the other patterns that do type tests too: lists, maps, records, and objects. You could argue equally well that you might use a list pattern when you don't intend it to do a type test and not realize that it does if the matched value's type is some non-List type as in:

Object obj = ...
switch (obj) {
  case [var a, var b]: print('two elements');
}

If you do want to test that something is a list and then apply a list pattern to destructure it, then we need either some general "test type and then match subpattern" form like the is one above. That would I guess look like:

Object obj = ...
switch (obj) {
  case [var a, var b] is List<Object?>: print('two elements');
}

That's pretty verbose. Or we could maybe have explicit type tested forms of all of the patterns. So there's be a list pattern and a tested list pattern, map pattern and tested map pattern, etc. Maybe a prefix is:

Object obj = ...
switch (obj) {
  case is [var a, var b]: print('list with two elements');
  case is (var a, var b): print('record with two fields');
  case is {'a': var a, 'b': var b): print('map with two entries');
}

I don't hate it. If we did this, we could also say that the type tested variable pattern can let you omit the identifier if you just want to type test but not bind:

Object obj = ...
switch (obj) {
  case is num: print('a number');
}

Note that this is technically more verbose than using an object pattern or using today's type testing variables:

Object obj = ...
switch (obj) {
  case is num: print('a number');
  case num(): print('a number');
  case num _: print('a number');
}

But the difference is marginal and I like that is num is pretty clear about what it means. It's also consistent with the other patterns for infix operators like == and <.

This would mean that when you know the type of what you're matching on, you can use the destructuring patterns directly:

switch ([1, 2, 3]) {
  case [var a, var b, var c]: print('three');
}

switch ((1, 2, 3)) {
  case (var a, var b): print('three');
}

switch ({'a': 1, 'b': 2, 'c': 3}) {
  case {'a': var a, 'b': var b, 'c': var c}: print('three');
}

But if the matched value is a supertype, you need the explicit is forms:

Object? json = ...;
switch (json) {
  case is [var a, var b]: print('list');
  case is {'a': var a, 'b' var b}: print('map');
  case is num n: print('number');
  case is String s: print('string');
}

is expressions

Of course, if we do this, users will naturally want to be able to use these patterns in is expressions. That might actually be doable? The main challenge with allowing patterns after is is that identifier patterns are already valid patterns to test against a named constant.

But here we wouldn't allow an arbitrary pattern after is. We'd only allow the pattern forms that start with is. So you'd get:

if (obj is String s) // Variable.
if (obj is [1, 1])   // List.
if (obj is {'a': 1}) // Map.
if (obj is (1, 2))   // Record.

But not:

if (obj is 2)       // Literal.
if (obj is math.pi) // Constant.

The lack of the latter avoids ambiguity with existing is expressions. I know @leafpetersen has suggested this before. My main concern then and now is the irregularity between the outermost and nested patterns:

print(1 is int); // true.
print([1] is [int]); // false (because `1` is not equal to the *type literal* `int`)
print([1] is [is int]); // true (what you actually have to write)

That may make this a bad approach.

is in if conditions

If we allow some patterns in is expressions, then if you use an is expression as an if condition, we'd say any variables the pattern binds are in scope in the then branch:

if (obj is [var a, var b]) {
  print('list with elements $a and $b');
}

Of course, this means there's no way to use this form when you want to destructure but not test the type. (I am certain we don't want to say that if (obj is [1]) only destructures and you have to write if (obj is is [1]) if you want to test the type.)

The odd one out here is object patterns. The main point of an object pattern is to type test. If you don't need to type test, I would like you to be able to omit the name (#2563). So it would be redundant to require:

case is SomeClass():

Instead of just:

case SomeClass():

Also, I think it's generally important to support the latter syntax verbatim because that's what folks who want to program in an algebraic datatype style expect.

We could maybe special case this by allowing object patterns in is expressions too even though it isn't as regular. But I don't know if that makes things too weird.

Overall, the current state of the proposal doesn't bother me. Yes, it could potentially lead to some unintented type tests. But I think the odds of that are slow. Scala and C# work that way, so it doesn't seem untenable.

But I don't mind the way the leading is patterns look. And generalizing from those potentially lets us solve two minor problems in the current proposal:

  1. No nice idiomatic way to just test a type. You either do SomeType _ or SomeType(). Having is SomeType is longer but clearer.

  2. A better syntax than the not-much-loved if (expr case pattern) form.

I'm interested in exploring this, but I worry a lot about whether we have the time to make this change.

@lrhn
Copy link
Member

lrhn commented Dec 2, 2022

The problem here is that we cannot distinguish an up-cast from a type check.
Which means we can't give the author a warning if an intended up-cast is actually a (probably failing) type check.
Using the same syntax for both prevents the author from showing intent.

We have the same problem with casts today, where you have to use an as cast for an up-cast in expression position. If you do it correctly, the cast should be compiled to a no-op, so you are wasting good syntax on doing nothing. If you make a mistake, it'll be a runtime error.

We are using an operation which can fail at runtime, to do something which we know cannot fail.

And that hurts when you make mistakes, because it becomes a run-time error.
That's true for

int x = foo;
var y = x as double; // whoops, meant `num`. And it works on the web too!

It's true for:

if (o case Object(:double hashCode)) ...

which will never work. (Except on the web, whoops.)

OK, I admit it's a real problem. I like the terseness of the current syntax, but it does mean you have to conflate different intentions into the same syntax.

Using a suffix pattern is type gets too wordy, num name is int is four words with no punctuation, only spaces. That's hard to read.

Prefix is type is slightly better, but not as general. We'd need is int num x to match an int and upcast to num in one test. Probably have to require is int (num x) for that, and only allow is int x to bind to the same type as matched.
Reasonable for the common case, upcasting after a type check is going to be very rare (why not just match num?).

I like it (and I'll say more below). I like using is instead of case for if-patterns. (I'd want it as a general expression to be used anywhere an e is type can be used today, but it'll probably need parentheses outside of very limited contexts).

The cost is more verbosity.

case (int x, int y): ...

becomes

case (is int x, is int y): ...
// or
case is (int, int) (var x, var y): ...

Unless we can do:

case is (int x, int y): ...

and infer the required type of (int x, int y) as the type to test for. (We totally should!)

So:

  • is type (id | pattern)? means check the value is the type type, then bind it to id with type type or match it against pattern with matched value type type. (And is type pattern is just the same as is type & pattern).
  • is pattern means: Let S be the type schema of pattern. (Find the matched value type if necessary.) Fill out any holes in the schema based on the matched value type, call it T. Then treat as is T (pattern). The pattern cannot be a type literal, since then it matches the first item instead.
  • A plain typed binding pattern of type id must be valid against the matched value type. It's not a type check.
  • An object pattern of the form SomeType(...) is also not a type check. You can want to extract properties from a supertype of the matched value type. With inline class types, or even just extension methods, that might change the available getters. So it should also require a leading is.
  • A list or map pattern should probably be type checking inherently. It's rare that you'd want to up-cast to List or Map (but you may want to up-cast the type parameters, again because of non-virtual methods, so maybe those require is too, and compile-time failures if the cast from matched value type to the list/map type is not an upcast.)

It starts to feel like there will be entire switches where every case starts with is. Should we allow those to use is instead of case? So:

switch (e) {
  case 0: ....; // A constant
  is [int x, int y]: // ...  type schema is List<int>, so that's what we check for.
  is Foo(: var x, : var y): // ...
  case null: ....
  case _: .... // Binding with no type check, but also cannot be a type failure.
}

I don't know. It gets complicated. The ability to distinguish up-casts from type-check means that we have to be explicit about it in every pattern. Most patterns are not up-casting, so requiring the new/extra syntax for type checks might be going in the wrong direction.
But the failure mode of making a mistake is to do a type check, so it should be very you have to opt in.
It's ... complicated.

About is in if-patterns:
We don't need to switch to is in if-patterns, we can do everything the same with case instead.
Switching to is should be because it's better.
Allowing both if (e is int) and if (e is int ix) feels nice to me. It may not look much better than case for more complicated checks, but probably also not worse.

The main worry would be that int now means something different in if (e is int) and switch (e) { case int: (and var int = e;). The context-sensitivity still mainly applies to raw identifiers, but now also extends to entire types with type arguments (which can be constants - damn you, type literals!).
They would all mean the same thing as what they means today, which is a bonus, and we don't have to explain why it's e is T but e case T x.
And you can still switch/add context by doing switch (e) case is int: ..., var (int _) = e; or is (int).

I like it that. Even if it's just allowing e is type-pattern in if-patterns. I think it can hold water. I'd like to explore it more! :)

For generally requiring is for any type check, so that we can catch mis-written upcasts, I fear it brings us too far from the syntactic sweet spot, and from other pattern languages.

@munificent
Copy link
Member

OK, I spent some time thinking about this more. I'm definitely sympathetic with the desire to distinguish type annotations from type tests. But given how many patterns do type tests now, I think it would be a very large effort to change the proposal to make this distinction. I think it's very unlikely we could come up with a coherent, complete design and get all the issues with it shaken out in time.

Even if we could, it's not clear to me that doing so would be an improvement. The current proposal is fairly similar to how Scala and C# behave and it's seemed to work out well for those languages.

Unless others feels strongly, I'm inclined to say that we should leave the current proposal as it is. We've had a lot of eyeballs on it from the community, and this aspect isn't one that I can recall many raising issues with, so I suspect the current design is OK.

@munificent
Copy link
Member

Closing this because we're going to stick with the current design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patterns Issues related to pattern matching. question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants