Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add #[diagnostic::blocking] attribute #3639

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

estebank
Copy link
Contributor

@estebank estebank commented May 17, 2024

#[diagnostic::blocking] is a marker attribute for functions that are considered (by their author) to be a blocking operation, and as such shouldn't be invoked from an async function. rustc, clippy and other tools can use this signal to lint against calling these functions in async contexts.

It would allow rustc, clippy and other tools to provide lints like

warning: async function `foo` can block
  --> $DIR/blocking-calls-in-async.rs:28:1
   |
28 | async fn foo() {
   | --------------
29 |     interesting();
   |     ^^^^^^^^^^^^^`foo` is determined to block because it calls into `interesting`
   |
note: `interesting` is considered to be blocking because it was explicitly marked as such
  --> $DIR/blocking-calls-in-async.rs:5:1
   |
5  | #[diagnostic::blocking]
   | ^^^^^^^^^^^^^^^^^^^^^^^
6  | fn interesting() {}
   | ----------------

Rendered

`#[diagnostic::blocking]` is a marker attribute for functions that are
considered (by their author) to be a blocking operation, and as such
shouldn't be invoked from an `async` function. `rustc`, `clippy` and other
tools can use this signal to lint against calling these functions in `async`
contexts.

It would allow `rustc`, `clippy` and other tools to provide lints like

```
warning: async function `foo` can block
  --> $DIR/blocking-calls-in-async.rs:28:1
   |
28 | async fn foo() {
   | --------------
29 |     interesting();
   |     ^^^^^^^^^^^^^`foo` is determined to block because it calls into `interesting`
   |
note: `interesting` is considered to be blocking because it was explicitly marked as such
  --> $DIR/blocking-calls-in-async.rs:5:1
   |
5  | #[diagnostic::blocking]
   | ^^^^^^^^^^^^^^^^^^^^^^^
6  | fn interesting() {}
   | ----------------
```
@compiler-errors
Copy link
Member

How does this interact with RFC #3014? Isn't must_not_suspend the same as diagnostic::blocking?

Would like to see that mentioned in the RFC -- if it's similar enough, then this RFC can just subsume that one.

@estebank
Copy link
Contributor Author

The must_not_suspend lint seems tangentially related. That one is to ensure that a value doesn't go through a yield point, while this is so that we don't call std::thread::sleep in an async function.

@davidbarsky
Copy link

How does this interact with RFC #3014? Isn't must_not_suspend the same as diagnostic::blocking?

As Estaban said, they're kinda related. For example,tracing's Span::enter() returns a guard that is must_not_suspend, but it's not actually blocking.

@compiler-errors
Copy link
Member

Yeah, I think the main distinction is that values must not suspend, but calls are blocking.

I am curious if these can be unified futher, but worst case must_not_suspend should likely be moved to the diagnostic namespace if and before it is stabilized.

@estebank
Copy link
Contributor Author

I am curious if these can be unified futher

It might be that this annotation on functions could influence the logic of the must_not_suspend lint.

worst case must_not_suspend should likely be moved to the diagnostic namespace if and before it is stabilized.

I agree.

@estebank estebank added the T-lang Relevant to the language team, which will review and decide on the RFC. label May 18, 2024
@estebank estebank changed the title Add #[diagnostic::blocking] attribute [RFC] Add #[diagnostic::blocking] attribute May 18, 2024
@kennytm
Copy link
Member

kennytm commented May 18, 2024

what functions besides std::thread::sleep() can be reliably marked as #[diagnostic::blocking]?

Comment on lines +72 to +77
# Unresolved questions
[unresolved-questions]: #unresolved-questions

- What parts of the design do you expect to resolve through the RFC process before this gets merged?
- What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fill out or delete the placeholder text?

@estebank
Copy link
Contributor Author

what functions besides std::thread::sleep() can be reliably marked as #[diagnostic::blocking]?

Effectively most of std::io and std::fs. Even println!() should likely be considered as a no-no from an async fn, but marking std::io::_print as blocking might be a bridge too far for some. Anything that performs IO of any kind in the stdlib should not be used and instead use tokio provided APIs.

@weiznich
Copy link
Contributor

As a semi related note: As far as I remember one of the main motivations for the specification of diagnostic attribute namespaces was to outline a set of common rules for all attributes in this namespace, so that adding a new attribute to this namespace does not need a new RFC anymore.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 21, 2024 via email

@SOF3
Copy link

SOF3 commented May 23, 2024

We could be more explicit with the definition of blocking.

  • thread::sleep is obviously blocking
  • anything that leads to futex(2) is probably agreed to be blocking (they should try_lock().expect("unlocked) if they are expected not to)
  • anything that waits for the filesystem or network is blocking.
    • so we agree that DNS resolution is blocking too, even if it might fetch from the system cache?
  • are CPU-bound operations non-blocking?
    • what if it is e.g. waiting for another core or waiting for a graphic card?
  • what about memory allocation? might memory allocation block?
    • if we write to program memory that is actually mmap'ed to a large file, it blocks a lot as well.
  • and probably every other syscall not mentioned here, the blocking behavior of which might even be platform-dependent.

Note that CPU-bound operations and memory access cannot be warned using this lint.

@the8472
Copy link
Member

the8472 commented May 23, 2024

what functions besides std::thread::sleep() can be reliably marked as #[diagnostic::blocking]?

Effectively most of std::io and std::fs. Even println!() should likely be considered as a no-no from an async fn, but marking std::io::_print as blocking might be a bridge too far for some. Anything that performs IO of any kind in the stdlib should not be used and instead use tokio provided APIs.

Using Read::read is perfectly non-blocking if you're doing polled IO. It's even explicitly supported in tokio via AsyncFd.

@kennytm
Copy link
Member

kennytm commented May 23, 2024

Using Read::read is perfectly non-blocking if you're doing polled IO. It's even explicitly supported in tokio via AsyncFd.

I don't think you'll run the poll loop directly inside asynccontext though. You wrap it inside a poll_fn() closure which the blocking lint shouldn't touch (??).

@kornelski
Copy link
Contributor

I like this idea, and I think it's valuable to add it even if it's used just for thread::sleep(), since this is a particularly bad footgun that novice users run into when they explore how async works.

However, there's a big gray area of what is blocking and isn't: https://blog.yoshuawuyts.com/what-is-blocking/

Maybe this diagnostic could support specifying a level of "severity", which could be configurable in clippy?

thread::sleep and thread.join() are very likely to be blocking, but println! may be too pedantic. io::Read and io::Write are "it depends", because they're non-blocking when used with Vec<u8>.

@riking
Copy link

riking commented May 23, 2024

File IO is definitely the category of functions that needs the most programmer education of "yes, this is actually blocking", due to the prevalence of both networked file systems and disk hardware errors leading to unbounded waits.

@the8472
Copy link
Member

the8472 commented May 23, 2024

needs the most programmer education of "yes, this is actually blocking"

That's quite generational, depending on whether one has exp

[seek head noises]

erienced what came before SSDs.

@programmerjake
Copy link
Member

programmerjake commented May 23, 2024

I expect some uses of Mutex and similar to not count as blocking, e.g. the following should not count imho:

pub struct AtomicId(Mutex<[u8; 32]>);

impl AtomicId {
    pub const fn new(v: [u8; 32]) -> Self {
        Self(Mutex::new(v))
    }
    pub fn load(&self) -> [u8; 32] {
        *self.0.lock().unwrap()
    }
    pub fn store(&self, v: [u8; 32]) {
        *self.0.lock().unwrap() = v;
    }
}

@JarredAllen
Copy link

Can we also put this information into rustdoc output? That would have saved me some headache in the past from crates that poorly document whether a function blocks or returns a WouldBlock error if there isn't anything yet.

@estebank
Copy link
Contributor Author

I expect some uses of Mutex and similar to not count as blocking, e.g. the following should not count imho

In the way that I'm doing analysis today, that use would indeed be caught (I'm doing transitive analysis), but in my case I have a complementary annotation to mark an item as "assume non blocking" and in this proposal the only checks being made would be for direct usages, so your wrapper would fly under the radar (regardless of whether it is really blocking or not). The correct analysis for Mutex would instead be if the value from .lock() goes past a yield point, but that's more complex than what's been proposed here.

@tmandry
Copy link
Member

tmandry commented May 24, 2024

The correct analysis for Mutex would instead be if the value from .lock() goes past a yield point, but that's more complex than what's been proposed here.

That's exactly what #[must_not_suspend] is for. Maybe this proposal shouldn't touch locks then.

@kpreid
Copy link

kpreid commented May 25, 2024

I'd love to have this annotation and any sort of analysis for it.

I expect some uses of Mutex and similar to not count as blocking,

IMO, the best approach to this problem is that Mutex::lock() is considered blocking, and usage of it from async (or other should-not-block code) code would include an attribute (an #[allow] or #[expect], perhaps) that effectively declares “I have made sure that this lock is held only momentarily” (much like an unsafe {} block, but for blocking analysis: “you can't prove this is OK, but I tell you it is”).

Similarly, filesystem and stdio operations must in general be considered blocking, but a specific application might hold the position “I don't care about the near-zero amount of blocking that will happen due to these very occasional bits of text I write to stderr under normal circumstances”.

(All of this is out of scope of the RFC — unless it somehow leads to a conclusion that blocking is too hopelessly ambiguous to introduce the attribute at all.)

The correct analysis for Mutex would instead be if the value from .lock() goes past a yield point

That would not catch all cases. If the mutex might be held by non-async code for a long time, then async code calling lock() would block. A Mutex is OK to lock from async code if it is only locked momentarily by all its users, regardless of whether those users are async or blocking code.

- What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?

# Future possibilities
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another future possibility that I think is worth mentioning: adding syntax to the attribute to (extensibly) classify the kind or degree of blocking, so as to be able to resolve “is this really a blocking operation” edge cases by writing down “it is if you call this blocking” in some fashion.

#[diagnostic::blocking(acquires_lock)]

#[diagnostic::blocking(filesystem)]

#[diagnostic::blocking(stderr)]

#[diagnostic::blocking(::rayon::block_on_task)]

A note on that last one: if you're using Rayon, you care a lot about the difference between “this might block on the completion of another task in the Rayon thread pool” (great when inside the pool due to being able to yield to other tasks using the Rayon-specific mechanism, but bad from an async function) and “this might block acquiring a lock” (very dangerous to do from in the pool, because it can lead to deadlocks, as well as more ordinary CPU starvation due to blocking inside a thread-per-core pool).

Much like async, if you're not already attuned to this class of issue, it can be easy to make a mistake by accident, and it's hard to audit for, so being able to use the same kind of “don't call X from Y because you could deadlock/starve” analysis but with a narrower classification would be quite useful.

@programmerjake
Copy link
Member

one place where blocking should be allowed is in things like genawaiter, which is a library for making generators built on top of the async/await mechanism, where it's like having an iterator that may block, which is fine.
there should probably be a way to opt-out of all blocking diagnostics for an async fn/block but still produce lints if there's another async fn/block nested inside. (actually, i've wanted that kind of allow but not recursively for other lints too)

@kornelski
Copy link
Contributor

To me uncontended Mutex isn't "blocking". There are many usage patterns where the critical section is small and quick.
I don't think it's feasible for Rust to know how long any mutex will take to lock. Warning about all of them may be too noisy.

@kennytm
Copy link
Member

kennytm commented May 26, 2024

one place where blocking should be allowed is in things like genawaiter, which is a library for making generators built on top of the async/await mechanism, where it's like having an iterator that may block, which is fine.

The genawaiter crate is a hack to make up for the lack of gen block, it shouldn't influence the decision made on this RFC.

And given that genawaiter is used like gen!({ ... }) the expanded macro could easily be #[allow(blocking)] async { ... }.

@programmerjake
Copy link
Member

And given that genawaiter is used like gen!({ ... }) the expanded macro could easily be #[allow(blocking)] async { ... }.

I'm saying there should be an attribute like #[allow(blocking)]:

#[nonrecursive_allow(blocking)]
async {
    blocking(); // won't warn
    foo(async {
        blocking(); // warns
    });
}

@SOF3
Copy link

SOF3 commented May 27, 2024

@kornelski technically an uncontended mutex should be .try_lock().unwrap() instead of .lock(). If you expect try_lock to fail, it means the mutex is not uncontended. It might be insignificantly blocking, but it is not uncontended.

@kornelski
Copy link
Contributor

Nah, try_lock introduces unnecessary unreliability. I don't mean literally never blocked, but rather not having many threads trying to lock it, and having critical section quick enough, so that threads won't wait long for the lock.


In practice, blocking is only an issue when poll takes longer than a maximum latency an application will tolerate.
This is why there's a big gray area in what counts as blocking, because many synchronous operations could finish quickly enough, depending on how they're used, and what latency is allowed (e.g. in a server for a realtime multiplayer game 1ms may be long enough to be blocking, but a media file server could have 100ms hiccups without causing problems).

@conradludgate
Copy link

conradludgate commented May 30, 2024

To add some further fuel: I am very much against Mutex::lock being 'blocking'. A mutex with a very short critical section will always be faster than an async lock like tokio's because tokio uses a std::sync::Mutex for the wait list:
https://docs.rs/tokio/1.37.0/src/tokio/sync/batch_semaphore.rs.html#36

If tokio considers a mutex to be fast enough, then so should we. It would be very annoying to have to annotate the many such uses in my codebases.

That being said, I do respect that many use cases of mutex in async code has the potential to be slow, and I am supportive of static analysis and lints to try and improve the status quo. However, even a hashmap realloc I have measured to be several ms. An async lock won't fix that if you do tokio::sync::Mutex<HashMap<K, V>> for your concurrent map.


Onto a slightly related but different topic, I'd like to see some investment into tools for debugging actual blocking in code. For instance, a watcher thread that monitors runtime threads, periodically polling them and sampling stack traces. If it detects that a thread has taken a while to yield, it can dump the stacktrace so you can debug uses of loop {} without any awaits.

@SOF3
Copy link

SOF3 commented May 30, 2024

what about we split them to different types of blocking functions (io, net, mutex) and just expose them as separate (maybe somehow parameterized) lint's?

@the8472
Copy link
Member

the8472 commented May 30, 2024

Those categories do not carve reality along its joints. File IO can be faster or slower than network IO. Mutexes depends on the critical section, contention, potential for priority inversion, ...
If you consider swap even a ptr::read can be blocking.

If you want to know if library calls are fast and compose well, then you need them to be

  • wait-free
  • constant-time regarding their inputs
  • have some latency estimate

Then you can add up all the constant latencies and see if they are below your responsiveness goal for a feature of your application.

If things are not constant-time then you can quickly get O(n²) behavior due to loops. If you do not know the expected latency of calls then all that nice constant-timeness is useless if the constants are huge. If they aren't wait-free then you run into starvation situations based on system state.

Making low-latency code legible to the compiler is hard. Perhaps there's some prior art in hard realtime programming that we can look at?

@JarredAllen
Copy link

For me personally, I'd prefer if mutexes were listed as blocking. If a mutex guards a short enough critical section, so then (using the lint reasons feature that's about to become stable) there will be a place where I can document that assumption in the code:

async fn something_with_mutex() {
    static SHARED_MUTEX: Mutex<_> = ..;

    #[allow(
        blocking_in_async,
        reason = "This mutex only guards `do_something` calls, which finish quickly enough."
    )]
    SHARED_MUTEX.lock().do_something();
}

To me, a good part of the Rust language is the many ways in which it forces me to do my homework and show that what I'm doing is correct, and this feels like another way to do that.

@lolbinarycat
Copy link
Contributor

Perhaps there's some prior art in hard realtime programming that we can look at?

i feel like once your latency requirements become strict enough the only solution is nonlocal control flow (signals, interrupts, or preemptive multitasking)

and to add some fuel to the "what does blocking even mean" fire, is std::thread::sleep(Duration::from_nanos(1)) a blocking operation?

i think trying to estimate performance using static analysis is outside the scope of the compiler, and would be better suited with dynamic analysis (measuring time between waits)

is there even a way to manually yield an async fn when doing long cpu bound processing like keygen? it seems like you would want to await a future that returns "Pending" exactly once.

@conradludgate
Copy link

is there even a way to manually yield an async fn when doing long cpu bound processing like keygen? it seems like you would want to await a future that returns "Pending" exactly once.

Yes. Tokio offers a yield_now function which is very easy to remake. It calls wake immediately and returns pending once. I have previously used that during password hashing, for instance: https://github.com/neondatabase/neon/blob/a5ecca976ec3abf97c8c3db4f78c230b4553b509/proxy/src/scram/exchange.rs#L98

Alternatively tokio has a concept of a "cooperative budget", which is not yet stable api tokio-rs/tokio#6622, but it will only return pending if the budget is fully consumed. Budget is acquired by most tokio functions like reading data from a TcpStream or sending into a channel (both of these otherwise could always be immediately ready in a loop and never yield)

@traviscross traviscross added the WG-async Relevant to the async working group. label Jun 10, 2024
The final version would be to implement a call-graph analyisis lint so that transitive calls to blockign operations also produce a warning, but this is explicitly out of scope of this RFC.

# Drawbacks
[drawbacks]: #drawbacks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential drawback is that, um, "enthusiastic" contributors might make unwelcome PRs against a large number of crates adding this annotation to basically every method. Thoughtful documentation/messaging around the feature might be able to mitigate that risk

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on a warning for unnecessarily applying #[diagnostic::blocking] on functions that can already be inferred to be blocking? That could help reduce some noise as well.

@HyeonuPark
Copy link

Prior discussion from IRLO based on runtime detection and panicking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC. WG-async Relevant to the async working group.
Projects
None yet
Development

Successfully merging this pull request may close these issues.