Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage Cargo #1138

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

Leverage Cargo #1138

wants to merge 17 commits into from

Conversation

nspin
Copy link

@nspin nspin commented May 23, 2024

This PR proposes integration with Cargo.

The primary changes in this PR are the addition of a custom Cargo subcommand called cargo-verus and a new CLI for rust_verify called verus-driver. These additions leverage Cargo to make using Verus just as easy and ergonomic as using rustc itself from a tooling perspective. This PR could also simplify the build system of the Verus project itself.

An invocation of cargo verus wraps Cargo to process a few Verus-specific flags, and then invoke cargo build or cargo check with RUSTC_WRAPPER set to another new binary called verus-driver. For the sake of modularity, I've split the CLI logic out of rust_verify and into verus-driver, which also handles all of the new Cargo integration logic. This addition required very little modification of rust_verify, but, at least in the first draft of this PR, there is now some redundancy between verus-driver and rust_verify, as the CLI in rust_verify has been left unchanged. Note that verus-driver is designed to also be useful as a rustc driver when used directly, without Cargo.

For each crate in the cargo build or cargo check build plan, cargo-verus uses package metadata tables to determine whether it should be compiled normally by rustc or treated specially by Verus, and informs verus-driver via environment variables.

The result of all of this is a much simpler and more familiar flow for Verus users. The ./source/cargo-verus-illustration/illustrate.sh script in this draft demonstrates such a flow. Note that this script can be run in a fresh checkout of the Verus repository. That is, it does not depend on Vargo or any other current Verus project build system code infrastructure. It does, however, require $VERUS_Z3_PATH and $VERUS_SINGULAR_PATH to be set.

The design implemented in this PR supports two ways of depending on builtin/builtin_macros/vstd. They can be included as normal Cargo project dependencies, or pre-built and bundled into a "Verus sysroot". illustrate.sh demonstrates both.

It seems to me that this integration with Cargo could replace some or even much of the functionality in the custom scripts in this repository (vargo, vstd_build, and verus).

Note that, in this draft, the proposed new tooling does not yet include all features of the verus and rust_verify tools. In particular, history recording and stat reporting have not been implemented.

The design and implementation of this integration was guided that of by Clippy.

I recognize that this PR entails quite a significant proposal for a first-time contributor, so here is some context: I work on Rust support in userspace for seL4, the formally verified microkernel. As folks building high-assurance systems on top of seL4 have been increasingly reaching for Rust, they are looking to tools like Verus. I hope that with this PR, and beyond, I am able to facilitate that adoption.

@nspin nspin force-pushed the pr/cargo-integration branch 3 times, most recently from 3bb30c5 to acbd12f Compare May 24, 2024 04:08
@jaybosamiya
Copy link
Contributor

First off, high-level, this is awesome! I've been meaning to implement (or even write down my thoughts on how one might implement) cargo verus at some point, for quite a while now, but have been busy with other things. I've spoken with some folks about the design, and this is already fairly similar to what I was considering, so that's great. Also, this does save me some time, thanks! :D

I haven't had a chance to dive into the code yet (earliest I can get to a proper read might take quite some time; so I'll let others comment on it, esp wrt specifics), but I just wanted to write down my high-level thoughts as I read the proposal.

  • The package.metadata.verus table is already being used for the IDE (e.g., see here in the Emacs integration, where I'm using the package.metadata.verus.ide portion as a detector of it being a Verus crate), so using it more generally as a detector of Verus-or-not wrt the build plan makes sense to me.

  • With the introduction of verus-driver, it seems that there would now be 3 tools around (rust_verify, verus, and verus-driver) that run Verus (at the level of rustc; cargo-verus is basically at the level of cargo build, so I don't count it in the three). We probably want to consider something that reduces the complexity wrt maintainability (at a high level I don't think this should itself be a blocker to this PR, but definitely a point to consider wrt longer term maintenance).

  • The illustate.sh script as it currently stands does look convenient overall. A couple of questions about it though:

    1. the export RUSTC_BOOTSTRAP=1 early on makes it unclear how much requires that env variable; my guess is that only the first build should, and none of the runs---is that correct?
    2. I expect most people will pull cargo-verus into their PATH, could you include examples of what that might look like?---I expect it'll end up looking nicer than the cargo run -p cargo-verus -- ... invocations, and might make it easier to illustrate your point :)
  • I expect cargo-verus to change less often than Verus itself, but it might itself need upgrades from time to time, and it'd be nice to have version checking built in early on itself (I haven't checked the code to see if you've done this already, but mentioning it as something that should be checked)---potentially similar to how vargo detects that it is outdated if you attempt to run an outdated vargo build (see here), or alternatively, how it checks if verusfmt is up to date (see here), it might be good to do that kind of check for any newly introduced binary (that has a chance of accidentally getting out of date). Similarly, I assume verus-driver should also probably have some sort of versioning check from the get-go.

Overall, I support the addition of cargo verus as a command, since it improves the usability and experience of folks both jumping into Verus, as well as those of us who've been using it a while (and have written some horrendous fun sed scripts to wrangle various things together to connect up various crates, both Verus and not). Thanks for working on this! Hopefully I can get to doing a read of the code soon, but I'm sure others will take a look sooner than that.

nspin added 3 commits May 24, 2024 07:17
To prepare for including vstd in the workspace. Once it is included in
the workspace, it will be visited by cargo fmt.

We guard the rustfmt::skip attributes with:

'#![cfg_attr(rustfmt, rustfmt::skip)]'

because custom inner attributes are unstable.
Just to aid discussion and review.
@nspin
Copy link
Author

nspin commented May 24, 2024

Thank you for this feedback!

i. the export RUSTC_BOOTSTRAP=1 early on makes it unclear how much requires that env variable; my guess is that only the first build should, and none of the runs---is that correct?

When you posted your comment, RUSTC_BOOTSTRAP=1 was required both to build verus-driver and to run it. Building verus-driver requires it because the rustc_* crates are unstable, and running verus-driver requires it because verus! {} code uses unstable features. So, at that time, the only step that didn't require RUSTC_BOOTSTRAP=1 to be set externally was building cargo-verus.

I just modified verus-driver to set RUSTC_BOOTSTRAP=1 for itself when necessary before calling into rust_verify. I think this is a reasonable thing to do. Now, externally setting RUSTC_BOOTSTRAP=1 is no longer required to run verus-driver.

ii. I expect most people will pull cargo-verus into their PATH, could you include examples of what that might look like?---I expect it'll end up looking nicer than the cargo run -p cargo-verus -- ... invocations, and might make it easier to illustrate your point :)

Great point! I've modified illustrate.sh to demonstrate invoking cargo-verus via Cargo.

I expect cargo-verus to change less often than Verus itself, but it might itself need upgrades from time to time, and it'd be nice to have version checking built in early on itself (I haven't checked the code to see if you've done this already, but mentioning it as something that should be checked)---potentially similar to how vargo detects that it is outdated if you attempt to run an outdated vargo build (see here)... Similarly, I assume verus-driver should also probably have some sort of versioning check from the get-go.

To me, this seems like a perfect use-case for cargo run. Wouldn't cargo run -p cargo-verus -- ... and cargo run -p verus-driver -- ... serve this purpose?

As for ensuring that verus-driver compilation output is recognized as dirty after verus-driver is rebuilt, I copied a neat trick from clippy-driver:

https://github.com/nspin/verus/blob/58f656198def6e7a5d27dfb0b04fe5979881d3fb/source/verus-driver/src/main.rs#L106-L112

...or alternatively, how it checks if verusfmt is up to date (see here), it might be good to do that kind of check for any newly introduced binary (that has a chance of accidentally getting out of date).

Yes, I think this is important. I just added such a check in 58f6561.

@utaal utaal self-requested a review May 24, 2024 12:20
@utaal
Copy link
Collaborator

utaal commented May 24, 2024

Hi @nspin! Thank you for working on this and for the PR. I'm self-assigning as a reviewer because I've been responsible for the build process, and I'd like to take an in-depth look at this, and thank you @jaybosamiya for the preliminary review, your input is very much appreciated.

It may also take me a little bit to properly get to this, but I'll try and prioritize it as much as I can because we'd definitely want (something like) this (and as @jaybosamiya said, we've wanted it for a while now), and I really appreciate the effort you put into this.

@tjhance
Copy link
Collaborator

tjhance commented May 24, 2024

What's the difference between carto verus --check and cargo verus --check --just-verify?

From the illustrate.sh script, it looks to me like both of them do 'verification without codegen'.

@nspin
Copy link
Author

nspin commented May 24, 2024

What's the difference between cargo verus --check and cargo verus --check --just-verify?

cargo verus --check just selects cargo check instead of cargo build, meaning that rustc is instructed to stop after typechecking and borrow checking and skip codegen (i.e. to emit .rmeta only, no .rlib).

To rust_verify, the --compile flag means "do a normal rustc pass" (in addition to a verification pass), which could mean just typechecking and borrow checking, or codegen too, depending on the --emit flags present.

cargo verus ensures that --compile is always present for dependencies in the build plan. --just-verify controls whether --compile is present for leaf crates in the build plan. When --just-verify is present, --compile is omitted for leaf crates, meaning they get a verification pass but not a normal rustc pass.

@jonhnet
Copy link
Collaborator

jonhnet commented May 25, 2024

Jumping in to say this is super cool. Thanks for the contribution!

@panda2134
Copy link
Contributor

panda2134 commented May 28, 2024

Thank you so much for this PR! My current filesystem verification project does hugely benefit from cargo integration, as it makes use of external crates. Looking forward to its merge!

This trick makes uses of a mechanism designed for building the Rust
sysroot to ensure that the output of verus-driver via cargo-verus is
distinguished from the output of rustc.

The value of the __CARGO_DEFAULT_LIB_METADATA is incorporated into the
hashes that are included in compilation output filenames.
@tjhance
Copy link
Collaborator

tjhance commented May 31, 2024

Can you explain a bit more about how this integration handles dependencies?

When Verus runs on a crate, it can export the specifications in a .vir file, and Verus can then import it when handling other crates. So if crate A depends on crate B, we would need to export the B.vir file and then import it when running on A. Looking through the source code for verus-driver, I see that it appears to be doing something with the .vir files, but I'm not very familiar with cargo, so I need a bit of a walkthrough.

  • Am I right to assume that, when the user invokes cargo to verify crate A, cargo will automatically invoke Verus to generate the B.vir file, and then import that when invoking Verus on crate B?
  • If so, how is Verus invoked on crate B? Will it run full verification on B, or will it only generate the .vir file?
  • Does verification on crate B need to succeed in order to run verification on crate A?
  • I noticed there's a package.metadata.verus.imports field. What does this do (i.e., how does it affect the answers to the above questions)?

cargo verus --check just selects cargo check instead of cargo build, meaning that rustc is instructed to stop after typechecking and borrow checking and skip codegen (i.e. to emit .rmeta only, no .rlib).

So just to check my understanding: a dependency crate needs to emit the .rmeta, which requires us to pass --compile to Verus. Even when running Verus with --compile we can get the .rmeta while skipping codegen. As a result, it is useful to be able to run Verus with --compile without codegen. Thus, this is what it does for dependency crates, and it's also the default for leaf crates. Is that about right?

@nspin
Copy link
Author

nspin commented Jun 4, 2024

  • Am I right to assume that, when the user invokes cargo to verify crate A, cargo will automatically invoke Verus to generate the B.vir file, and then import that when invoking Verus on crate B?

Yes, this is correct.

  • If so, how is Verus invoked on crate B? Will it run full verification on B, or will it only generate the .vir file?

Cargo (via the cargo-verus wrapper) will apply the same process for invoking verus-driver for all targets in the Cargo build plan. That is, for each target, using information in the package.metadata.verus table. This includes "primary" packages (including those specified with -p), other workspace packages, and non-workspace dependencies (including remote ones). The only exception is when the the --just-verify flag is provided to cargo-verus, in which case, as described above, the --compile flag will be omitted for primary packages. However, thinking about this now, I don't think the --just-verify flag is sound (consider a case where a primary package is also a dependency of another package in the build plan). I'm not sure how useful it is anyways.

All of cargo-verus's verus-driver invocations can be controlled at once using arguments after --. For example cargo-verus -- --verus-arg=--no-verify will cause --no-verify to be passed to the rust_verify crate for all invocations of verus-driver (relevant, of course, only for packages with package.metadata.verus.verify = true).

  • Does verification on crate B need to succeed in order to run verification on crate A?

If a rustc or RUSTC_WRAPPER (in this case, verus-driver) invocation fails, Cargo will abort the rest of the build plan. So, if verification on crate B fails in a way that causes verus-driver to terminate with a non-zero exit code, then verification will not be run on crate A. I've been assuming that all verification errors are propagated out of the rust_verify crate, and will thus result in verus-driver terminating with a non-zero exit code.

  • I noticed there's a package.metadata.verus.imports field. What does this do (i.e., how does it affect the answers to the above questions)?

The answers below may sound like hacks, but they ultimately feel fine to me. I'd be curious to hear your takes on them. The intention behind them is for .vir files to tag along with .rmeta files in the filesystem.

For packages with package.meta.verus.verify = true, .vir files are emitted adjacent to .rmeta files.

Until the change I just force pushed, this is how imports worked:

For each package with package.metadata.verus.verify = true, cargo-verus would pass a --verus-driver-arg=--find-import=<name> argument to verus-drver for each package.metadata.verus.imports = [ <name>, ... ]. For each such --find-import argument, verus-driver would scan the rustc arguments for a corresponding --extern=<name>=<path_to_artifact>.{rmeta,rlib,etc}. It would then look for a .vir file using s/(rmeta|rlib)/vir, and then use that for a rust_verify --import argument.

I've just pushed a commit which changes the behavior to this:

For each package with package.metadata.verus.verify = true, cargo-verus identifies those of its dependencies with package.metadata.verus.verify = true. For each such dependency, it passes a --verus-driver-arg=--import-dep-if-present=<name> argument to verus-drver. For each such --import-dep-if-present argument, verus-driver scans the rustc arguments for a corresponding --extern=<name>=<path_to_artifact>.{rmeta,rlib,etc}. If a .vir file of the form s/(rmeta|rlib)/vir is present, a corresponding rust_verify --import argument is added. The purpose of the "-if-present" part is to handle cases where a package has multiple targets (e.g. library, tests, examples), and some target doesn't actually depend on all of the packages dependencies (e.g. library not depending on [dev-dependencies]).

So just to check my understanding: a dependency crate needs to emit the .rmeta, which requires us to pass --compile to Verus. Even when running Verus with --compile we can get the .rmeta while skipping codegen. As a result, it is useful to be able to run Verus with --compile without codegen. Thus, this is what it does for dependency crates, and it's also the default for leaf crates. Is that about right?

Yes, --compile is passed to dependencies, and is, by default, passed to leaf crates as well. However, codegen is carried out by default (via cargo build), but can be skipped with the --check argument to cargo verus, which replaces cargo build with cargo check, which skips codegen.

@tjhance
Copy link
Collaborator

tjhance commented Jun 5, 2024

Thanks for the detailed explanation.

I finally got around to checking out the PR and actually playing around in the cargo-verus-illustration directory to get a sense for how it all works.

To explain a bit how I'm thinking about this: I foresee that when running Verus in a multi-crate project, 99% of the time I'll want to do one of the following 3 things:

  1. Verify crate A only, potentially passingly along some specific options from the command line, while doing the minimal amount of work necessary on all dependency crates
  2. Verify crate A and all dependencies according to options in the verus metadata file. (Usually, I would want it to re-use cached results, but I might want to do verification from scratch sometimes.)
  3. Codegen

In short, it seems cargo-verus works solidly for (2) and (3) but much less so (1), which is probably the most common thing I want to do during active development.

Here are some issues:

  • --verify-module and --verify-function don't work. All arguments are passed to all crates, but these arguments only make sense for a specific crate. So if you supply --verify-module foo, it will get passed to a dependency as well.
  • Changing the diagnostics options (like --expand-errors) or smt options (like the --rlimit) are all things you might want to do during incremental development. However, changing these options will result in re-verifying all dependencies, which is undesirable.

My biggest (technical) question is: How much control do we have over cargo's decision to re-run on all dependencies when command line arguments are added or removed? Is this a decision cargo makes automatically, or do we have freedom to determine which options are considered "inputs" that should trigger a re-verify when changed?

@utaal
Copy link
Collaborator

utaal commented Jun 12, 2024

(Sorry for the wait -- I was busy with an event last week -- I'm going to start looking at this in detail now, hope to get through it by the end of the week, by Tuesday 18th.)

@utaal
Copy link
Collaborator

utaal commented Jun 18, 2024

A few high level questions to start:

For example cargo-verus -- --verus-arg=--no-verify will cause --no-verify to be passed to the rust_verify crate for all invocations of verus-driver

  1. How does --verus-arg= affect cargo's caching? Does changing --verus-arg invalidate the cache, so that one cannot accidentally rely on a crate that wasn't verified. Cargo handles similar situations by invalidating the cache when --features change for example.

However, thinking about this now, I don't think the --just-verify flag is sound (consider a case where a primary package is also a dependency of another package in the build plan). I'm not sure how useful it is anyways.

  1. Can you elaborate on this? I don't quite get why. If we can still get the rmeta for the crate dependencies without the Verus --compile flags I don't see what's the issue. Is it that cargo verus may somehow try and compile a leaf which depends on another primary package for which the rlib (or similar) wasn't built?

  2. Regarding this note by @tjhance :

  1. Verify crate A only, potentially passingly along some specific options from the command line, while doing the minimal amount of work necessary on all dependency crates
  2. Verify crate A and all dependencies according to options in the verus metadata file. (Usually, I would want it to re-use cached results, but I might want to do verification from scratch sometimes.)
  3. Codegen

2 may be an acceptable substitute for 1 assuming the results of 2 for dependent crates have been cached and don't need recompilation and re-verification. @tjhance does that make sense, or do you see specific value in running verification for just that one specific crate. (I'm asking because the way @nspin is describing current cargo verus matches the way regular cargo behaves, where changes in leaf crates only cause work on those leaf crates, and a minimal amount of work to check that the cache of dependency crates is valid).


Regarding @tjhance's issues:

  • --verify-module and --verify-function don't work. All arguments are passed to all crates, but these arguments only make sense for a specific crate. So if you supply --verify-module foo, it will get passed to a dependency as well.
  • Changing the diagnostics options (like --expand-errors) or smt options (like the --rlimit) are all things you might want to do during incremental development. However, changing these options will result in re-verifying all dependencies, which is undesirable.

I agree these are very important. Perhaps we can add a subcommand that allows for direct invocation of verus on a specific crate, so one can pass these arguments specifically to that crate. (This may also address the prior concern by @tjhance).


@nspin would you be up for hopping on a call later this week to discuss the design, and see if I can help with some of this?

Thanks again. I'm also starting to look at the code. I think we may be able to simplify a couple things a little bit now that we have binary releases: https://github.com/verus-lang/verus/releases

@tjhance
Copy link
Collaborator

tjhance commented Jun 18, 2024

@tjhance does that make sense, or do you see specific value in running verification for just that one specific crate.

I do think it's important, though it certainly doesn't need to be addressed immediately in this PR. Right now, I'm mostly trying to figure out what our technical options are, hence my earlier questions about how cargo handles the command line flags (which I am still wondering about).

The reason I think it's important is just a matter of our experience developing massive projects, which rarely proceeds in strict dependency order. It's not uncommon for large swathes of code to fail verification, but we don't want those issues to block work on other components.

@panda2134
Copy link
Contributor

Sorry to disturb, but what's the current progress of this PR? Is there anything other than the cache issue that prevents it from being merged?

@utaal
Copy link
Collaborator

utaal commented Jul 22, 2024

Hi @panda2134, thanks for bringing this up. There is some duplication with existing build tooling that I'd like to resolve before merging, so that we avoid having multiple ways to set up a tool chain that may get out of sync.

I hope to get to it this week soon.

@tchajed
Copy link
Collaborator

tchajed commented Sep 5, 2024

Andrea and I talked about this and essentially this PR has not been forgotten, but it does need some further work to make the feature maintainable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants