Skip to content

Latest commit

 

History

History
340 lines (256 loc) · 14.3 KB

correctness-exhaustive-pattern-matching.md

File metadata and controls

340 lines (256 loc) · 14.3 KB
layout title description nav seriesId seriesOrder categories
post
Exhaustive pattern matching
A powerful technique to ensure correctness
why-use-fsharp
Why use F#?
20
Correctness
Patterns

We briefly noted earlier that when pattern matching there is a requirement to match all possible cases. This turns out be a very powerful technique to ensure correctness.

Let's compare some C# to F# again. Here's some C# code that uses a switch statement to handle different types of state.

enum State { New, Draft, Published, Inactive, Discontinued }
void HandleState(State state)
{
    switch (state)
    {
        case State.Inactive: // code for Inactive
            break;
        case State.Draft: // code for Draft
            break;
        case State.New: // code for New
            break;
        case State.Discontinued: // code for Discontinued
            break;
    }
}

This code will compile, but there is an obvious bug! The compiler couldn't see it ? can you? If you can, and you fixed it, would it stay fixed if I added another State to the list?

Here's the F# equivalent:

type State = New | Draft | Published | Inactive | Discontinued
let handleState state = 
   match state with
   | Inactive -> () // code for Inactive
   | Draft -> () // code for Draft
   | New -> () // code for New
   | Discontinued -> () // code for Discontinued

Now try running this code. What does the compiler tell you?

The fact that exhaustive matching is always done means that certain common errors will be detected by the compiler immediately:

  • A missing case (often caused when a new choice has been added due to changed requirements or refactoring).
  • An impossible case (when an existing choice has been removed).
  • A redundant case that could never be reached (the case has been subsumed in a previous case -- this can sometimes be non-obvious).

Now let's look at some real examples of how exhaustive matching can help you write correct code.

Avoiding nulls with the Option type

We'll start with an extremely common scenario where the caller should always check for an invalid case, namely testing for nulls. A typical C# program is littered with code like this:

if (myObject != null)
{
  // do something
}

Unfortunately, this test is not required by the compiler. All it takes is for one piece of code to forget to do this, and the program can crash. Over the years, a huge amount of programming effort has been devoted to handling nulls ? the invention of nulls has even been called a billion dollar mistake!

In pure F#, nulls cannot exist accidentally. A string or object must always be assigned to something at creation, and is immutable thereafter.

However, there are many situations where the design intent is to distinguish between valid and invalid values, and you require the caller to handle both cases.

In C#, this can be managed in certain situations by using nullable value types (such as Nullable<int>) to make the design decision clear. When a nullable is encountered the compiler will force you to be aware of it. You can then test the validity of the value before using it. But nullables do not work for standard classes (i.e. reference types), and it is easy to accidentally bypass the tests too and just call Value directly.

In F# there is a similar but more powerful concept to convey the design intent: the generic wrapper type called Option, with two choices: Some or None. The Some choice wraps a valid value, and None represents a missing value.

Here's an example where Some is returned if a file exists, but a missing file returns None.

let getFileInfo filePath =
   let fi = new System.IO.FileInfo(filePath)
   if fi.Exists then Some(fi) else None

let goodFileName = "good.txt"
let badFileName = "bad.txt"

let goodFileInfo = getFileInfo goodFileName // Some(fileinfo)
let badFileInfo = getFileInfo badFileName   // None

If we want to do anything with these values, we must always handle both possible cases.

match goodFileInfo with
  | Some fileInfo -> 
      printfn "the file %s exists" fileInfo.FullName
  | None -> 
      printfn "the file doesn't exist" 

match badFileInfo with
  | Some fileInfo -> 
      printfn "the file %s exists" fileInfo.FullName
  | None -> 
      printfn "the file doesn't exist" 

We have no choice about this. Not handling a case is a compile-time error, not a run-time error. By avoiding nulls and by using Option types in this way, F# completely eliminates a large class of null reference exceptions.

Caveat: F# does allow you to access the value without testing, just like C#, but that is considered extremely bad practice.

Exhaustive pattern matching for edge cases

Here's some C# code that creates a list by averaging pairs of numbers from an input list:

public IList<float> MovingAverages(IList<int> list)
{
    var averages = new List<float>();
    for (int i = 0; i < list.Count; i++)
    {
        var avg = (list[i] + list[i+1]) / 2;
        averages.Add(avg);
    }
    return averages;
}

It compiles correctly, but it actually has a couple of issues. Can you find them quickly? If you're lucky, your unit tests will find them for you, assuming you have thought of all the edge cases.

Now let's try the same thing in F#:

let rec movingAverages list = 
    match list with
    // if input is empty, return an empty list
    | [] -> []
    // otherwise process pairs of items from the input 
    | x::y::rest -> 
        let avg = (x+y)/2.0 
        //build the result by recursing the rest of the list
        avg :: movingAverages (y::rest)

This code also has a bug. But unlike C#, this code will not even compile until I fix it. The compiler will tell me that I haven't handled the case when I have a single item in my list. Not only has it found a bug, it has revealed a gap in the requirements: what should happen when there is only one item?

Here's the fixed up version:

let rec movingAverages list = 
    match list with
    // if input is empty, return an empty list
    | [] -> []
    // otherwise process pairs of items from the input 
    | x::y::rest -> 
        let avg = (x+y)/2.0 
        //build the result by recursing the rest of the list
        avg :: movingAverages (y::rest)
    // for one item, return an empty list
    | [_] -> []

// test
movingAverages [1.0]
movingAverages [1.0; 2.0]
movingAverages [1.0; 2.0; 3.0]

As an additional benefit, the F# code is also much more self-documenting. It explicitly describes the consequences of each case. In the C# code, it is not at all obvious what happens if a list is empty or only has one item. You would have to read the code carefully to find out.

Exhaustive pattern matching as an error handling technique

The fact that all choices must be matched can also be used as a useful alternative to throwing exceptions. For example consider the following common scenario:

  • There is a utility function in the lowest tier of your app that opens a file and performs an arbitrary operation on it (that you pass in as a callback function)
  • The result is then passed back up through to tiers to the top level.
  • A client calls the top level code, and the result is processed and any error handling done.

In a procedural or OO language, propagating and handling exceptions across layers of code is a common problem. Top level functions are not easily able to tell the difference between an exception that they should recover from (FileNotFound say) vs. an exception that they needn't handle (OutOfMemory say). In Java, there has been an attempt to do this with checked exceptions, but with mixed results.

In the functional world, a common technique is to create a new structure to hold both the good and bad possibilities, rather than throwing an exception if the file is missing.

// define a "union" of two different alternatives
type Result<'a, 'b> = 
    | Success of 'a  // 'a means generic type. The actual type
                     // will be determined when it is used.
    | Failure of 'b  // generic failure type as well

// define all possible errors
type FileErrorReason = 
    | FileNotFound of string
    | UnauthorizedAccess of string * System.Exception

// define a low level function in the bottom layer
let performActionOnFile action filePath =
   try
      //open file, do the action and return the result
      use sr = new System.IO.StreamReader(filePath:string)
      let result = action sr  //do the action to the reader
      sr.Close()
      Success (result)        // return a Success
   with      // catch some exceptions and convert them to errors
      | :? System.IO.FileNotFoundException as ex 
          -> Failure (FileNotFound filePath)      
      | :? System.Security.SecurityException as ex 
          -> Failure (UnauthorizedAccess (filePath,ex))  
      // other exceptions are unhandled

The code demonstrates how performActionOnFile returns a Result object which has two alternatives: Success and Failure. The Failure alternative in turn has two alternatives as well: FileNotFound and UnauthorizedAccess.

Now the intermediate layers can call each other, passing around the result type without worrying what its structure is, as long as they don't access it:

// a function in the middle layer
let middleLayerDo action filePath = 
    let fileResult = performActionOnFile action filePath
    // do some stuff
    fileResult //return

// a function in the top layer
let topLayerDo action filePath = 
    let fileResult = middleLayerDo action filePath
    // do some stuff
    fileResult //return

Because of type inference, the middle and top layers do not need to specify the exact types returned. If the lower layer changes the type definition at all, the intermediate layers will not be affected.

Obviously at some point, a client of the top layer does want to access the result. And here is where the requirement to match all patterns is enforced. The client must handle the case with a Failure or else the compiler will complain. And furthermore, when handling the Failure branch, it must handle the possible reasons as well. In other words, special case handling of this sort can be enforced at compile time, not at runtime! And in addition the possible reasons are explicitly documented by examining the reason type.

Here is an example of a client function that accesses the top layer:

/// get the first line of the file
let printFirstLineOfFile filePath = 
    let fileResult = topLayerDo (fun fs->fs.ReadLine()) filePath

    match fileResult with
    | Success result -> 
        // note type-safe string printing with %s
        printfn "first line is: '%s'" result   
    | Failure reason -> 
       match reason with  // must match EVERY reason
       | FileNotFound file -> 
           printfn "File not found: %s" file
       | UnauthorizedAccess (file,_) -> 
           printfn "You do not have access to the file: %s" file

You can see that this code must explicitly handle the Success and Failure cases, and then for the failure case, it explicitly handles the different reasons. If you want to see what happens if it does not handle one of the cases, try commenting out the line that handles UnauthorizedAccess and see what the compiler says.

Now it is not required that you always handle all possible cases explicitly. In the example below, the function uses the underscore wildcard to treat all the failure reasons as one. This can considered bad practice if we want to get the benefits of the strictness, but at least it is clearly done.

/// get the length of the text in the file
let printLengthOfFile filePath = 
   let fileResult = 
     topLayerDo (fun fs->fs.ReadToEnd().Length) filePath

   match fileResult with
   | Success result -> 
      // note type-safe int printing with %i
      printfn "length is: %i" result       
   | Failure _ -> 
      printfn "An error happened but I don't want to be specific"

Now let's see all this code work in practice with some interactive tests.

First set up a good file and a bad file.

/// write some text to a file
let writeSomeText filePath someText = 
    use writer = new System.IO.StreamWriter(filePath:string)
    writer.WriteLine(someText:string)
    writer.Close()

let goodFileName = "good.txt"
let badFileName = "bad.txt"

writeSomeText goodFileName "hello"

And now test interactively:

printFirstLineOfFile goodFileName 
printLengthOfFile goodFileName 

printFirstLineOfFile badFileName 
printLengthOfFile badFileName 

I think you can see that this approach is very attractive:

  • Functions return error types for each expected case (such as FileNotFound), but the handling of these types does not need to make the calling code ugly.
  • Functions continue to throw exceptions for unexpected cases (such as OutOfMemory), which will generally be caught and logged at the top level of the program.

This technique is simple and convenient. Similar (and more generic) approaches are standard in functional programming.

It is feasible to use this approach in C# too, but it is normally impractical, due to the lack of union types and the lack of type inference (we would have to specify generic types everywhere).

Exhaustive pattern matching as a change management tool

Finally, exhaustive pattern matching is a valuable tool for ensuring that code stays correct as requirements change, or during refactoring.

Let's say that the requirements change and we need to handle a third type of error: "Indeterminate". To implement this new requirement, change the first Result type as follows, and re-evaluate all the code. What happens?

type Result<'a, 'b> = 
    | Success of 'a 
    | Failure of 'b
    | Indeterminate

Or sometimes a requirements change will remove a possible choice. To emulate this, change the first Result type to eliminate all but one of the choices.

type Result<'a> = 
    | Success of 'a 

Now re-evaluate the rest of the code. What happens now?

This is very powerful! When we adjust the choices, we immediately know all the places which need to be fixed to handle the change. This is another example of the power of statically checked type errors. It is often said about functional languages like F# that "if it compiles, it must be correct".