Unique specifier for types leading to using different fields in the backend #1219

sakehl · 2024-07-01T12:21:38Z

The wiki is updated in accordance with the changes in this PR. For example: syntax changes, semantics changes, VerCors flags changes, etc.

PR Description

Having many arrays in scope at the same time leads to poor performance in Silicon, to the point where it is not possible to verify files with many arrays present. A workaround is to use different fields for different arrays if you are sure the arrays will never overlap.

With this pull request, I add the ability to do this, by marking the type of an array as unique, indicating that it does not overlap with other arrays. In the backend, this will generate different types of fields for each array.

If you indicate that a type is unique with respect to another type, it is like a 'promise' that the function/program you are written uses the type like they are separate memory entries. Types with different uniqueness can never be assigned to each other, never be compared, etc. I.e. there is no coercion between them. Although, we make an exception with function calls, see below for an explanation.

Important: unique<1> is not fundamentally different from unique<2>. It is just the easiest way to indicate that two types are not of the same uniqueness. The actual numbers do not really matter. The same with the difference between int * and unique<1> int*.

In the frontend it looks like this

/*@
  context_everywhere x0 != NULL ** \pointer_length(x0) == n ** (\forall* int i; 0<=i && i<n; Perm(&x0[i], write));
  context_everywhere x1 != NULL ** \pointer_length(x1) == n ** (\forall* int i; 0<=i && i<n; Perm(&x1[i], 1\2));
  context_everywhere x2 != NULL ** \pointer_length(x2) == n ** (\forall* int i; 0<=i && i<n; Perm(&x2[i], 1\2));
  ensures (\forall int i; 0<=i && i<n; x0[i] == x1[i] + x2[i] );
@*/
int main(int n, /*@ unique<1> @*/ int* x0, /*@ unique<2> @*/ int* x1, /*@ unique<2> @*/ int* x2){
  /*@
    loop_invariant 0 <= j && j <= n;
    loop_invariant (\forall int i; 0<=i && i<j; x0[i] == x1[i] + x2[i] );
  @*/
  for(int j=0; j<n; j++){
    x0[j] = x1[j] + x2[j];
  }
}

In the generated viper file we would like to have the following fields and permissions:

field int1: Int
field int2: Int

(forall i: Int ::
      { ptrDeref(ptrAdd(optGet2(x0), i)).int1 }
      0 <= i && i < n ==>
      acc(ptrDeref(optGet2((some(ptrAdd(optGet2(x0), i)): Option[Pointer]))).int1, write))

(forall i: Int ::
      { ptrDeref(ptrAdd(optGet2(x1), i)).int2 }
      0 <= i && i < n ==>
      acc(ptrDeref(optGet2((some(ptrAdd(optGet2(x1), i)): Option[Pointer]))).int2, write))

The above works as intended.

Pointer structs fields

Since many of my examples I wanted to verify, contained structs with pointer fields, I wanted to make this work as well.

struct vec {
  int* xs;
};

/*@
  context xs != NULL;
  context x1 != NULL ** \pointer_length(x1)==1 ** Perm(x1, write) ** Perm(*x1, write);
@*/
int f(/*@unique_pointer_field<xs, 2>@*/ struct vec*  x1, /*@ unique<2> @*/ int* xs){
  x1->xs = xs;
  //@ assert x1->xs != NULL;
  return 0;
}

Here I indicated that field xs of struct vec has an unique type. Thus, it is stored in field int2 in Viper.

Function coercions

There are probably a lot of (library) functions, which do not contain unique types. To rewrite them all such that they work for a specific uniqueness type is unfeasible. Especially since there is not fundamental difference between the types unique<1> int*, unique<2> int* or int *, only that these types should not interact in the program.

Therefor, we make sure that the following programs works:

/*@
  context n > 0;
  context x0 != NULL ** \pointer_length(x0) == n ** (\forall* int i; 0<=i && i<n; Perm(&x0[i], 1\2));
  context x1 != NULL ** \pointer_length(x1) == n ** (\forall* int i; 0<=i && i<n; Perm(&x1[i], 1\2));
  ensures \result == x0[0] + x1[0];
  @*/
  int f(int n, /*@ unique<1> @*/ int* x0, /*@ unique<2> @*/ int* x1){
    return h(x0) + h(x1);
  }

  /*@
    context x != NULL ** \pointer_length(x) > 0 ** Perm(&x[0], 1\2);
    ensures \result == x[0];
  @*/
  int h(int* x){
    return x[0];
  }

Internally, there are created two abstract copies of h. We have the original h(int* x) which function body is checked w.r.t. its contract.
Then we make the copies h(unique<1> int* x) and h(unique<2> int* x). Both are created without body, since its body is already checked in the original.

When is a function coercion correct.

What we want to prevent is that we can switch between pointer types (memory types) of different uniqueness after a function call. So for instance:

/*@
  context  x != NULL && y != NULL;
  ensures \old(x)==y && \old(y)==x;
@*/
void swap(int* x, int* y){
  int* temp = x;
  x = y;
  y = temp;
}

/*@
  context n > 0;
  context x0 != NULL ** \pointer_length(x0) == n ** (\forall* int i; 0<=i && i<n; Perm(&x0[i], 1\2));
  context x1 != NULL ** \pointer_length(x1) == n ** (\forall* int i; 0<=i && i<n; Perm(&x1[i], 1\2));
@*/
int f(int n, /*@ unique<1> @*/ int* x0, /*@ unique<2> @*/ int* x1){
  swap(x0, x1);
  // Do other stuff with x0 and x1
}

should never be allowed.

So I have the following rule for when I do allow function coercion's:

Suppose original function h has type unique<A> x* somewhere in its function. And a copy of this function would require unique<B> x*. Then any other instance of unique<A> x* which is in the function should also be coerced towards unique<B> b*.
NB: Instead of unique<A> x* or unique<B> x*, x* is also valid (no unique modifier).

This does mean that we can end up with the following which we allow:

/*@
  context x != NULL ** \pointer_length(x) >0 ** (\forall* int i; 0<=i && i<\pointer_length(x); Perm(&x[i], 1\4));
  context y != NULL ** \pointer_length(y) >0 ** (\forall* int i; 0<=i && i<\pointer_length(y); Perm(&y[i], 1\4));
  ensures \result == x[0] + y[0];
@*/
int h(/*@ unique<1> @*/ int* x, /*@ unique<2> @*/ int* y){
  return x[0] + y[0];
}

/*@
  context n > 0;
  context x0 != NULL ** \pointer_length(x0) == n ** (\forall* int i; 0<=i && i<n; Perm(&x0[i], 1\2));
  context x1 != NULL ** \pointer_length(x1) == n ** (\forall* int i; 0<=i && i<n; Perm(&x1[i], 1\2));
  ensures \result == 2*x0[0] + 2*x1[0];
@*/
int f(int n, /*@ unique<1> @*/ int* x0, /*@ unique<2> @*/ int* x1){
  return h(x0, x0) + h(x1, x1);
}

The copies of function h have type signature int h(/*@ unique<1> @*/ int* x, /*@ unique<1> @*/ int* y) and int h(/*@ unique<2> @*/ int* x, /*@ unique<2> @*/ int* y). This is fine. Since different uniqueness types only set restrictions on how the function can use those pointers. So we already checked that h treats these pointers as being stored in different memory. That they are actually stored in the same memory, just loosens the restriction, but does not matter.

To check if all pointer types are correctly coerced, we do a scan of the following of a type signature:

All parameter types
The return type
The given/yield types
The out args (although I don't think this is ever used in C)

Additionally, any type can contain pointers. E.g. union<int *, void *> or seq<int *>or more importantly:struct v {int *x}andstruct w {struct v s}`. So we traverse each type to check what pointers it contains.

Const type qualifier

The unique type is a type qualifier, it was easy to add the const type qualifier as well. I also had some ideas to add const pointers, pointer which elements cannot be changed. Since these kind of pointers could be modeled using sequence.

Thus the following example verifies:

#include <stdlib.h>

/*@
  context_everywhere n > 0;
  context_everywhere x0 != NULL ** \pointer_length(x0) == n;
  context_everywhere x1 != NULL ** \pointer_length(x1) == n;
  ensures \result != NULL ** \pointer_length(\result) == n ** (\forall* int i; 0<=i && i<n; Perm(&\result[i], write));
  ensures (\forall int i; 0 <= i && i <n; \result[i] == x0[i] + x1[i]);
@*/
int* add(int n, const int* x0, const int* x1){
  int* res = (int*) malloc(n*sizeof(int));
  //@ assume res != NULL;
  /*@
    loop_invariant 0 <= i && i <= n;
    loop_invariant res != NULL ** \pointer_length(res) == n ** (\forall* int i; 0<=i && i<n; Perm(&res[i], write));
    loop_invariant (\forall int j; 0 <= j && j < i; res[j] == x0[j] + x1[j]);
  @*/
  for(int i=0; i<n; i++){
    res[i] = x0[i] + x1[i];
  }
  return res;
}

And internally uses sequences for x0 and x1.

Any coercion from regular pointer towards constant pointer I disallow at the moment. Mostly since I think you need to take an arbitrary read permission to cast from int * towards const int*, and I'm not sure how to model that. Also, if this coercion/cast happens like const int* const_xs = (int * xs) xs you want to return the taken read permission whenever const_xs is no longer in scope. (or when you call a function which has const int* as parameter, you want to return the taken read permission after the function call).

Also things like const int x = 0 is now possible, which will error when you try to do x =1 later on, because of const-ness.

Possible extensions

Since struct fields are now also always added as pointers with the same Viper field name as regular pointers, we could add the type qualifier /@unique_field<fieldName, uniqueNumber>@/.
The 'hard' logic is now added. It works for C. But it should be really straightforward to add this towards PVL, C++. Maybe Java as well, although I'm less versed in Java specifics.

pieter-bos · 2024-07-01T12:27:22Z

Nice! You should know that @superaxander (and sometimes me) are working on some proposed silicon changes that address similar concerns, although we've focused on singleton heap chunks (i.e. plain permission without quantification), but I don't see a strong reason why e.g. the proposed chunk ordering heuristics would not improve things. Maybe we should have a look at some slow array examples again with our recently gained knowledge.

superaxander · 2024-07-01T12:52:31Z

Yes this would be quite useful. Ideally there'd be a way we could give Viper a better view of which fields may alias and which may not instead of simply relying on the name of the field. For example I want to be able to say the field int_1 and int_2 never alias but both of them may alias with int_3 which would drastically reduce the amount of heap chunks viper has to consider when dealing with a pointer to a field int_1 or int_2. (additionally during state consolidation Viper will take every combination of two heap chunks with the same field name to attempt to prove that they do or don't alias, this generates an exponential amount of proof goals)

Your example of wanting to specify that the two fields are unique is basically the exact thing I've been working on for the past two weeks with my by-value classes. Because I'm encoding structs as ADTs with pointers inside of them we have for a struct:

struct A {
   int a;
   int b;
}

That we know that &a and &b must always be different. (and this is also true between different structs and their fields) However, a normal int pointer may alias with these field pointers therefore in the general case we cannot give them a separate field name in viper. Being able to annotate/automatically detect the easy cases where no aliasing may occur might help a lot though.

As for specifying a ghost type of e.g. void pointers I have also been thinking about that since this kind of pattern occurs everywhere in LLVM. (all pointers are untyped in LLVM) My idea so far was to extend the notion of permission to being "permission to access as if it were a certain type". For example:

//@ requires Perm(a, write, int)
void foo(void *a)  {
    int b = *((int *) a)
}

You could then call this function with many different kinds of values which can all be treated as int pointers:

struct A a;
a.a = 5;
// A pointer to a value "struct A" can be treated as an int pointer
foo(&a);
// This is already an int pointer
foo(&a.b);

src/rewrite/vct/rewrite/EncodeArrayValues.scala

sakehl · 2025-02-23T20:32:19Z

So this is almost ready to merge.
I added /*@ unique_pointer_field<0>(xs) @*/ where a field of a struct which contains a pointer can be given an unique type.

Adding /@ unique_field @/ should be relatively straightforward, but since I'm on a bit of deadline towards the end of my PhD I don't want to add it in at the moment.

Three question, specifically for @superaxander

All fields of a struct are now stored in the same Viper field as pointer right?
So with struct v {int x, int* xs} v.x we get field "int" in Viper. But also 'v.xs[0]' is stored in Viper field int right? If so I need to edit a thing, and probably change some tests.
I think AddrOff was now added in a more structered way. So there is some extra ways to introduce pointers in a program. Could you maybe give a quick explination so I'm sure I'm not forgetting some corner case for my unique types.
see question in the code comment.

sakehl · 2025-02-24T07:25:41Z

Note: I had to run the build action again on GiHub, since the first time it said "Out of heap memory" or something similar. The same I indicated on the VerCors chat a few days ago.
See here: https://github.com/sakehl/vercors/actions/runs/13486688818/job/37678993996

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "make -j 2 all stdout thread"
java.lang.OutOfMemoryError: Java heap space
Error: Process completed with exit code 1.

Maybe we need to increase the value of "-Xmx2G" to "-Xmx4G"? in ".mill-jvm-opts"

superaxander · 2025-02-24T08:00:46Z

All fields of a struct are now stored in the same Viper field as pointer right?
So with struct v {int x, int* xs} v.x we get field "int" in Viper. But also 'v.xs[0]' is stored in Viper field int right? If so I need to edit a thing, and probably change some tests.

True

I think AddrOff was now added in a more structered way. So there is some extra ways to introduce pointers in a program. Could you maybe give a quick explination so I'm sure I'm not forgetting some corner case for my unique types.

You can now take a pointer to anything you can take a pointer to in C, so heap variables, locals, and struct fields. For locals we turn their type into a pointer type whenever you tried to take their address, for fields they are always pointers, and for heap variables I currently do the same things as for locals but I'm looking at changing that in #1317. (I want to make everything a pointer unless we can determine we can lower it to a normal value because no one takes the address. This is actually how locals that store a struct value already work)

The final way you can get a pointer is by casting an integer to a pointer, but I think casting in general, other than the identity, should probably just be disallowed for unique pointers.

sakehl · 2025-02-24T11:21:04Z

Ok I think I got everything covered now. Everything we discussed above I've fixed now, and there are test present for those things.

I've updated the original PR description to reflect precisely what is changed. The test seem to be passing.

So if something wants to review this we can merge this?

If there is anything I can help with, I'm happy to, since 3000 LoC is quite big.

@superaxander or @bobismijnnaam how should we proceed?

src/col/vct/col/ast/expr/heap/read/AddrOfImpl.scala

src/col/vct/col/ast/expr/heap/read/DerefImpl.scala

src/col/vct/col/origin/Blame.scala

src/rewrite/vct/rewrite/adt/ImportConstPointer.scala

src/col/vct/col/ast/Node.scala

sakehl · 2025-02-24T15:02:19Z

I tried to better fix those decrease failures. Should close
#1232 and #1065

Although getting something like

public class MyClass {

    //@ decreases;
    private int f() {
        //@ decreases;
        while (true) {}
        return 5;
    }

}

to error nicely was a bit to difficult atm.

superaxander · 2025-02-24T15:06:32Z

I tried to better fix those decrease failures. Should close #1232 and #1065

So I guess you mean #1133 and #1065, since #1232 is the example you say was too difficult to get to work.

sakehl · 2025-02-24T15:17:32Z

No I meant #1232

notice the small but important difference between these two:

public class MyClass {

    //@ decreases;
    private int f() {
        //@ decreases;
        while (true) {}
        return 5;
    }

}

public class MyClass {

    //@ decreases;
    private int f()
        while (true) {}
        return 5;
    }

}

The first one is harder to fix, since we get a single "decrease" clause from Viper as what was the reason. And I did not know how to redirect blame towards it (the current code).

In other words this code: https://github.com/utwente-fmt/vercors/blob/1e37de3ebfa7b5f4a85c1c26b025b1b634529385/src/viper/viper/api/backend/SilverBackend.scala#L303C1-L306C68 does not work, since info(node).invariant.get is in error since we try to get a None.

sakehl · 2025-02-24T15:19:18Z

#1133 was just a very different kind of beast. You have to check that the post-condition of a call only has a self reference towards the call, if the call also has an decrease clause. Thus a program analyses that checks this should be added.

superaxander · 2025-02-24T16:21:30Z

The first one is harder to fix, since we get a single "decrease" clause from Viper as what was the reason. And I did not know how to redirect blame towards it (the current code).

In other words this code: https://github.com/utwente-fmt/vercors/blob/1e37de3ebfa7b5f4a85c1c26b025b1b634529385/src/viper/viper/api/backend/SilverBackend.scala#L303C1-L306C68 does not work, since info(node).invariant.get is in error since we try to get a None.

Right for this one it's simply that invariant is not set for the decreases nodes. It's just a matter of changing these lines:

vercors/src/viper/viper/api/transform/ColToSilver.scala

Lines 297 to 311 in 1e37de3

    
           clause match { 
        
             case col.DecreasesClauseAssume() => 
        
               DecreasesWildcard(condition = None)( 
        
                 pos = pos(clause), 
        
                 info = NodeInfo(clause), 
        
               ) 
        
             case col.DecreasesClauseNoRecursion() => 
        
               DecreasesTuple(Nil, condition = None)( 
        
                 pos = pos(clause), 
        
                 info = NodeInfo(clause), 
        
               ) 
        
             case col.DecreasesClauseTuple(exprs) => 
        
               DecreasesTuple(exprs.map(exp), condition = None)( 
        
                 pos = pos(clause), 
        
                 info = NodeInfo(clause),

To set the info to expInfo(clause) instead of NodeInfo(clause)

Apply comment alexander

sakehl · 2025-02-24T17:57:19Z

Thanks works now!

Syntax and parsing of unique specifier

c957125

sakehl self-assigned this Jul 1, 2024

sakehl marked this pull request as draft July 1, 2024 12:22

sakehl added 18 commits July 2, 2024 16:26

First version working for function parameters

020fb35

Small fix for array types

8b43f3f

Added qualifers (const, unique), const pointers, decrease fix

d4b7900

Merge remote-tracking branch 'origin/dev' into unique-fields

135e23d

Fixes for tests

58533fa

Create abstract methods when coercing between unique pointers

b36382d

simpler pointer coercions

db267fc

Coerce return, support methods unique coercion, fix for void

a084769

Merge remote-tracking branch 'origin/dev' into unique-fields

f32a97a

Fix free function defect

2d925c6

Support for qualified functions

529f4b4

Typedef struct works properly now

839a13c

WIP unique fields

3335ff9

Qualifiers with structs now get coerced

fd961dc

Merge remote-tracking branch 'origin/dev' into unique-fields

29ed9c4

Compiling after resolving merge

6d4b91e

Add struct test

bd4018b

Fixed issues, almost ready to merge

853ec53

sakehl marked this pull request as ready for review February 23, 2025 20:23

sakehl commented Feb 23, 2025

View reviewed changes

src/rewrite/vct/rewrite/EncodeArrayValues.scala Show resolved Hide resolved

Move from unsorted

093c99e

More heap mem for mill & cast+field fixes for qualifiers

440d688

Add int to unique pointer cast test

0a6a365

Removed some not needed code (after code coverage test)

e79de04

superaxander reviewed Feb 24, 2025

View reviewed changes

sakehl added 2 commits February 24, 2025 14:39

Fixes after review

2d3c12b

Some decrease error message fix

7f8bf1c

Get loop decrease blame working

36c700d

Apply comment alexander

sakehl merged commit ca0eb76 into utwente-fmt:dev Feb 25, 2025
13 checks passed

This was referenced Feb 25, 2025

Crash report: class vct.col.ast.Loop cannot be cast to class vct.col.ast.Invocation (vct.col.ast.Loop and vct.col. #1232

Closed

Missing termination conditions in functions are encoded with false termination condition #1065

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique specifier for types leading to using different fields in the backend #1219

Unique specifier for types leading to using different fields in the backend #1219

sakehl commented Jul 1, 2024 •

edited

Loading

pieter-bos commented Jul 1, 2024

superaxander commented Jul 1, 2024

sakehl commented Feb 23, 2025

sakehl commented Feb 24, 2025 •

edited

Loading

superaxander commented Feb 24, 2025

sakehl commented Feb 24, 2025

sakehl commented Feb 24, 2025

superaxander commented Feb 24, 2025

sakehl commented Feb 24, 2025

sakehl commented Feb 24, 2025

superaxander commented Feb 24, 2025 •

edited

Loading

sakehl commented Feb 24, 2025

Unique specifier for types leading to using different fields in the backend #1219

Unique specifier for types leading to using different fields in the backend #1219

Conversation

sakehl commented Jul 1, 2024 • edited Loading

PR Description

Pointer structs fields

Function coercions

When is a function coercion correct.

Const type qualifier

Possible extensions

pieter-bos commented Jul 1, 2024

superaxander commented Jul 1, 2024

sakehl commented Feb 23, 2025

sakehl commented Feb 24, 2025 • edited Loading

superaxander commented Feb 24, 2025

sakehl commented Feb 24, 2025

sakehl commented Feb 24, 2025

superaxander commented Feb 24, 2025

sakehl commented Feb 24, 2025

sakehl commented Feb 24, 2025

superaxander commented Feb 24, 2025 • edited Loading

sakehl commented Feb 24, 2025

sakehl commented Jul 1, 2024 •

edited

Loading

sakehl commented Feb 24, 2025 •

edited

Loading

superaxander commented Feb 24, 2025 •

edited

Loading