Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique specifier for types leading to using different fields in the backend #1219

Merged
merged 26 commits into from
Feb 25, 2025

Conversation

sakehl
Copy link
Contributor

@sakehl sakehl commented Jul 1, 2024

  • The wiki is updated in accordance with the changes in this PR. For example: syntax changes, semantics changes, VerCors flags changes, etc.

PR Description

Having many arrays in scope at the same time leads to poor performance in Silicon, to the point where it is not possible to verify files with many arrays present. A workaround is to use different fields for different arrays if you are sure the arrays will never overlap.

With this pull request, I add the ability to do this, by marking the type of an array as unique, indicating that it does not overlap with other arrays. In the backend, this will generate different types of fields for each array.

If you indicate that a type is unique with respect to another type, it is like a 'promise' that the function/program you are written uses the type like they are separate memory entries. Types with different uniqueness can never be assigned to each other, never be compared, etc. I.e. there is no coercion between them. Although, we make an exception with function calls, see below for an explanation.

Important: unique<1> is not fundamentally different from unique<2>. It is just the easiest way to indicate that two types are not of the same uniqueness. The actual numbers do not really matter. The same with the difference between int * and unique<1> int*.

In the frontend it looks like this

/*@
  context_everywhere x0 != NULL ** \pointer_length(x0) == n ** (\forall* int i; 0<=i && i<n; Perm(&x0[i], write));
  context_everywhere x1 != NULL ** \pointer_length(x1) == n ** (\forall* int i; 0<=i && i<n; Perm(&x1[i], 1\2));
  context_everywhere x2 != NULL ** \pointer_length(x2) == n ** (\forall* int i; 0<=i && i<n; Perm(&x2[i], 1\2));
  ensures (\forall int i; 0<=i && i<n; x0[i] == x1[i] + x2[i] );
@*/
int main(int n, /*@ unique<1> @*/ int* x0, /*@ unique<2> @*/ int* x1, /*@ unique<2> @*/ int* x2){
  /*@
    loop_invariant 0 <= j && j <= n;
    loop_invariant (\forall int i; 0<=i && i<j; x0[i] == x1[i] + x2[i] );
  @*/
  for(int j=0; j<n; j++){
    x0[j] = x1[j] + x2[j];
  }
}

In the generated viper file we would like to have the following fields and permissions:

field int1: Int
field int2: Int

(forall i: Int ::
      { ptrDeref(ptrAdd(optGet2(x0), i)).int1 }
      0 <= i && i < n ==>
      acc(ptrDeref(optGet2((some(ptrAdd(optGet2(x0), i)): Option[Pointer]))).int1, write))

(forall i: Int ::
      { ptrDeref(ptrAdd(optGet2(x1), i)).int2 }
      0 <= i && i < n ==>
      acc(ptrDeref(optGet2((some(ptrAdd(optGet2(x1), i)): Option[Pointer]))).int2, write))

The above works as intended.

Pointer structs fields

Since many of my examples I wanted to verify, contained structs with pointer fields, I wanted to make this work as well.

struct vec {
  int* xs;
};

/*@
  context xs != NULL;
  context x1 != NULL ** \pointer_length(x1)==1 ** Perm(x1, write) ** Perm(*x1, write);
@*/
int f(/*@unique_pointer_field<xs, 2>@*/ struct vec*  x1, /*@ unique<2> @*/ int* xs){
  x1->xs = xs;
  //@ assert x1->xs != NULL;
  return 0;
}

Here I indicated that field xs of struct vec has an unique type. Thus, it is stored in field int2 in Viper.

Function coercions

There are probably a lot of (library) functions, which do not contain unique types. To rewrite them all such that they work for a specific uniqueness type is unfeasible. Especially since there is not fundamental difference between the types unique<1> int*, unique<2> int* or int *, only that these types should not interact in the program.

Therefor, we make sure that the following programs works:

/*@
  context n > 0;
  context x0 != NULL ** \pointer_length(x0) == n ** (\forall* int i; 0<=i && i<n; Perm(&x0[i], 1\2));
  context x1 != NULL ** \pointer_length(x1) == n ** (\forall* int i; 0<=i && i<n; Perm(&x1[i], 1\2));
  ensures \result == x0[0] + x1[0];
  @*/
  int f(int n, /*@ unique<1> @*/ int* x0, /*@ unique<2> @*/ int* x1){
    return h(x0) + h(x1);
  }

  /*@
    context x != NULL ** \pointer_length(x) > 0 ** Perm(&x[0], 1\2);
    ensures \result == x[0];
  @*/
  int h(int* x){
    return x[0];
  }

Internally, there are created two abstract copies of h. We have the original h(int* x) which function body is checked w.r.t. its contract.
Then we make the copies h(unique<1> int* x) and h(unique<2> int* x). Both are created without body, since its body is already checked in the original.

When is a function coercion correct.

What we want to prevent is that we can switch between pointer types (memory types) of different uniqueness after a function call. So for instance:

/*@
  context  x != NULL && y != NULL;
  ensures \old(x)==y && \old(y)==x;
@*/
void swap(int* x, int* y){
  int* temp = x;
  x = y;
  y = temp;
}

/*@
  context n > 0;
  context x0 != NULL ** \pointer_length(x0) == n ** (\forall* int i; 0<=i && i<n; Perm(&x0[i], 1\2));
  context x1 != NULL ** \pointer_length(x1) == n ** (\forall* int i; 0<=i && i<n; Perm(&x1[i], 1\2));
@*/
int f(int n, /*@ unique<1> @*/ int* x0, /*@ unique<2> @*/ int* x1){
  swap(x0, x1);
  // Do other stuff with x0 and x1
}

should never be allowed.

So I have the following rule for when I do allow function coercion's:

  1. Suppose original function h has type unique<A> x* somewhere in its function. And a copy of this function would require unique<B> x*. Then any other instance of unique<A> x* which is in the function should also be coerced towards unique<B> b*.
    NB: Instead of unique<A> x* or unique<B> x*, x* is also valid (no unique modifier).

This does mean that we can end up with the following which we allow:

/*@
  context x != NULL ** \pointer_length(x) >0 ** (\forall* int i; 0<=i && i<\pointer_length(x); Perm(&x[i], 1\4));
  context y != NULL ** \pointer_length(y) >0 ** (\forall* int i; 0<=i && i<\pointer_length(y); Perm(&y[i], 1\4));
  ensures \result == x[0] + y[0];
@*/
int h(/*@ unique<1> @*/ int* x, /*@ unique<2> @*/ int* y){
  return x[0] + y[0];
}

/*@
  context n > 0;
  context x0 != NULL ** \pointer_length(x0) == n ** (\forall* int i; 0<=i && i<n; Perm(&x0[i], 1\2));
  context x1 != NULL ** \pointer_length(x1) == n ** (\forall* int i; 0<=i && i<n; Perm(&x1[i], 1\2));
  ensures \result == 2*x0[0] + 2*x1[0];
@*/
int f(int n, /*@ unique<1> @*/ int* x0, /*@ unique<2> @*/ int* x1){
  return h(x0, x0) + h(x1, x1);
}

The copies of function h have type signature int h(/*@ unique<1> @*/ int* x, /*@ unique<1> @*/ int* y) and int h(/*@ unique<2> @*/ int* x, /*@ unique<2> @*/ int* y). This is fine. Since different uniqueness types only set restrictions on how the function can use those pointers. So we already checked that h treats these pointers as being stored in different memory. That they are actually stored in the same memory, just loosens the restriction, but does not matter.

To check if all pointer types are correctly coerced, we do a scan of the following of a type signature:

  • All parameter types
  • The return type
  • The given/yield types
  • The out args (although I don't think this is ever used in C)

Additionally, any type can contain pointers. E.g. union<int *, void *> or seq<int *>or more importantly:struct v {int *x}andstruct w {struct v s}`. So we traverse each type to check what pointers it contains.

Const type qualifier

The unique type is a type qualifier, it was easy to add the const type qualifier as well. I also had some ideas to add const pointers, pointer which elements cannot be changed. Since these kind of pointers could be modeled using sequence.

Thus the following example verifies:

#include <stdlib.h>

/*@
  context_everywhere n > 0;
  context_everywhere x0 != NULL ** \pointer_length(x0) == n;
  context_everywhere x1 != NULL ** \pointer_length(x1) == n;
  ensures \result != NULL ** \pointer_length(\result) == n ** (\forall* int i; 0<=i && i<n; Perm(&\result[i], write));
  ensures (\forall int i; 0 <= i && i <n; \result[i] == x0[i] + x1[i]);
@*/
int* add(int n, const int* x0, const int* x1){
  int* res = (int*) malloc(n*sizeof(int));
  //@ assume res != NULL;
  /*@
    loop_invariant 0 <= i && i <= n;
    loop_invariant res != NULL ** \pointer_length(res) == n ** (\forall* int i; 0<=i && i<n; Perm(&res[i], write));
    loop_invariant (\forall int j; 0 <= j && j < i; res[j] == x0[j] + x1[j]);
  @*/
  for(int i=0; i<n; i++){
    res[i] = x0[i] + x1[i];
  }
  return res;
}

And internally uses sequences for x0 and x1.

Any coercion from regular pointer towards constant pointer I disallow at the moment. Mostly since I think you need to take an arbitrary read permission to cast from int * towards const int*, and I'm not sure how to model that. Also, if this coercion/cast happens like const int* const_xs = (int * xs) xs you want to return the taken read permission whenever const_xs is no longer in scope. (or when you call a function which has const int* as parameter, you want to return the taken read permission after the function call).

Also things like const int x = 0 is now possible, which will error when you try to do x =1 later on, because of const-ness.

Possible extensions

  • Since struct fields are now also always added as pointers with the same Viper field name as regular pointers, we could add the type qualifier /@unique_field<fieldName, uniqueNumber>@/.
  • The 'hard' logic is now added. It works for C. But it should be really straightforward to add this towards PVL, C++. Maybe Java as well, although I'm less versed in Java specifics.

@sakehl sakehl self-assigned this Jul 1, 2024
@sakehl sakehl marked this pull request as draft July 1, 2024 12:22
@pieter-bos
Copy link
Member

Nice! You should know that @superaxander (and sometimes me) are working on some proposed silicon changes that address similar concerns, although we've focused on singleton heap chunks (i.e. plain permission without quantification), but I don't see a strong reason why e.g. the proposed chunk ordering heuristics would not improve things. Maybe we should have a look at some slow array examples again with our recently gained knowledge.

@superaxander
Copy link
Member

Yes this would be quite useful. Ideally there'd be a way we could give Viper a better view of which fields may alias and which may not instead of simply relying on the name of the field. For example I want to be able to say the field int_1 and int_2 never alias but both of them may alias with int_3 which would drastically reduce the amount of heap chunks viper has to consider when dealing with a pointer to a field int_1 or int_2. (additionally during state consolidation Viper will take every combination of two heap chunks with the same field name to attempt to prove that they do or don't alias, this generates an exponential amount of proof goals)

Your example of wanting to specify that the two fields are unique is basically the exact thing I've been working on for the past two weeks with my by-value classes. Because I'm encoding structs as ADTs with pointers inside of them we have for a struct:

struct A {
   int a;
   int b;
}

That we know that &a and &b must always be different. (and this is also true between different structs and their fields) However, a normal int pointer may alias with these field pointers therefore in the general case we cannot give them a separate field name in viper. Being able to annotate/automatically detect the easy cases where no aliasing may occur might help a lot though.

As for specifying a ghost type of e.g. void pointers I have also been thinking about that since this kind of pattern occurs everywhere in LLVM. (all pointers are untyped in LLVM) My idea so far was to extend the notion of permission to being "permission to access as if it were a certain type". For example:

//@ requires Perm(a, write, int)
void foo(void *a)  {
    int b = *((int *) a)
}

You could then call this function with many different kinds of values which can all be treated as int pointers:

struct A a;
a.a = 5;
// A pointer to a value "struct A" can be treated as an int pointer
foo(&a);
// This is already an int pointer
foo(&a.b);

@sakehl sakehl marked this pull request as ready for review February 23, 2025 20:23
@sakehl
Copy link
Contributor Author

sakehl commented Feb 23, 2025

So this is almost ready to merge.
I added /*@ unique_pointer_field<0>(xs) @*/ where a field of a struct which contains a pointer can be given an unique type.

Adding /@ unique_field @/ should be relatively straightforward, but since I'm on a bit of deadline towards the end of my PhD I don't want to add it in at the moment.

Three question, specifically for @superaxander

  1. All fields of a struct are now stored in the same Viper field as pointer right?
    So with struct v {int x, int* xs} v.x we get field "int" in Viper. But also 'v.xs[0]' is stored in Viper field int right? If so I need to edit a thing, and probably change some tests.
  2. I think AddrOff was now added in a more structered way. So there is some extra ways to introduce pointers in a program. Could you maybe give a quick explination so I'm sure I'm not forgetting some corner case for my unique types.
  3. see question in the code comment.

@sakehl
Copy link
Contributor Author

sakehl commented Feb 24, 2025

Note: I had to run the build action again on GiHub, since the first time it said "Out of heap memory" or something similar. The same I indicated on the VerCors chat a few days ago.
See here: https://github.com/sakehl/vercors/actions/runs/13486688818/job/37678993996

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "make -j 2 all stdout thread"
java.lang.OutOfMemoryError: Java heap space
Error: Process completed with exit code 1.

Maybe we need to increase the value of "-Xmx2G" to "-Xmx4G"? in ".mill-jvm-opts"

@superaxander
Copy link
Member

  1. All fields of a struct are now stored in the same Viper field as pointer right?
    So with struct v {int x, int* xs} v.x we get field "int" in Viper. But also 'v.xs[0]' is stored in Viper field int right? If so I need to edit a thing, and probably change some tests.

True

  1. I think AddrOff was now added in a more structered way. So there is some extra ways to introduce pointers in a program. Could you maybe give a quick explination so I'm sure I'm not forgetting some corner case for my unique types.

You can now take a pointer to anything you can take a pointer to in C, so heap variables, locals, and struct fields. For locals we turn their type into a pointer type whenever you tried to take their address, for fields they are always pointers, and for heap variables I currently do the same things as for locals but I'm looking at changing that in #1317. (I want to make everything a pointer unless we can determine we can lower it to a normal value because no one takes the address. This is actually how locals that store a struct value already work)

The final way you can get a pointer is by casting an integer to a pointer, but I think casting in general, other than the identity, should probably just be disallowed for unique pointers.

@sakehl
Copy link
Contributor Author

sakehl commented Feb 24, 2025

Ok I think I got everything covered now. Everything we discussed above I've fixed now, and there are test present for those things.

I've updated the original PR description to reflect precisely what is changed. The test seem to be passing.

So if something wants to review this we can merge this?

If there is anything I can help with, I'm happy to, since 3000 LoC is quite big.

@superaxander or @bobismijnnaam how should we proceed?

@sakehl
Copy link
Contributor Author

sakehl commented Feb 24, 2025

I tried to better fix those decrease failures. Should close
#1232 and #1065

Although getting something like

public class MyClass {

    //@ decreases;
    private int f() {
        //@ decreases;
        while (true) {}
        return 5;
    }

}

to error nicely was a bit to difficult atm.

@superaxander
Copy link
Member

I tried to better fix those decrease failures. Should close #1232 and #1065

So I guess you mean #1133 and #1065, since #1232 is the example you say was too difficult to get to work.

@sakehl
Copy link
Contributor Author

sakehl commented Feb 24, 2025

No I meant #1232

notice the small but important difference between these two:

public class MyClass {

    //@ decreases;
    private int f() {
        //@ decreases;
        while (true) {}
        return 5;
    }

}
public class MyClass {

    //@ decreases;
    private int f()
        while (true) {}
        return 5;
    }

}

The first one is harder to fix, since we get a single "decrease" clause from Viper as what was the reason. And I did not know how to redirect blame towards it (the current code).

In other words this code: https://github.com/utwente-fmt/vercors/blob/1e37de3ebfa7b5f4a85c1c26b025b1b634529385/src/viper/viper/api/backend/SilverBackend.scala#L303C1-L306C68 does not work, since info(node).invariant.get is in error since we try to get a None.

@sakehl
Copy link
Contributor Author

sakehl commented Feb 24, 2025

#1133 was just a very different kind of beast. You have to check that the post-condition of a call only has a self reference towards the call, if the call also has an decrease clause. Thus a program analyses that checks this should be added.

@superaxander
Copy link
Member

superaxander commented Feb 24, 2025

The first one is harder to fix, since we get a single "decrease" clause from Viper as what was the reason. And I did not know how to redirect blame towards it (the current code).

In other words this code: https://github.com/utwente-fmt/vercors/blob/1e37de3ebfa7b5f4a85c1c26b025b1b634529385/src/viper/viper/api/backend/SilverBackend.scala#L303C1-L306C68 does not work, since info(node).invariant.get is in error since we try to get a None.

Right for this one it's simply that invariant is not set for the decreases nodes. It's just a matter of changing these lines:

clause match {
case col.DecreasesClauseAssume() =>
DecreasesWildcard(condition = None)(
pos = pos(clause),
info = NodeInfo(clause),
)
case col.DecreasesClauseNoRecursion() =>
DecreasesTuple(Nil, condition = None)(
pos = pos(clause),
info = NodeInfo(clause),
)
case col.DecreasesClauseTuple(exprs) =>
DecreasesTuple(exprs.map(exp), condition = None)(
pos = pos(clause),
info = NodeInfo(clause),

To set the info to expInfo(clause) instead of NodeInfo(clause)

Apply comment alexander
@sakehl
Copy link
Contributor Author

sakehl commented Feb 24, 2025

Thanks works now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants