diff --git a/execution.bs b/execution.bs index 3dbdfdb..21ad5ff 100644 --- a/execution.bs +++ b/execution.bs @@ -135,23 +135,41 @@ This paper proposes a self-contained design for a Standard C++ framework for man ## Motivation ## {#motivation} -Today, C++ software is increasingly asynchronous and parallel, a trend that is likely to only continue going forward. -Asynchrony and parallelism appears everywhere, from processor hardware interfaces, to networking, to file I/O, to GUIs, to accelerators. -Every C++ domain and every platform needs to deal with asynchrony and parallelism, from scientific computing to video games to financial services, from the smallest mobile devices to your laptop to GPUs in the world's fastest supercomputer. - -While the C++ Standard Library has a rich set of concurrency primitives (`std::atomic`, `std::mutex`, `std::counting_semaphore`, etc) and lower level building blocks (`std::thread`, etc), we lack a Standard vocabulary and framework for asynchrony and parallelism that C++ programmers desperately need. -`std::async`/`std::future`/`std::promise`, C++11's intended exposure for asynchrony, is inefficient, hard to use correctly, and severely lacking in genericity, making it unusable in many contexts. -We introduced parallel algorithms to the C++ Standard Library in C++17, and while they are an excellent start, they are all inherently synchronous and not composable. - -This paper proposes a Standard C++ model for asynchrony, based around three key abstractions: schedulers, senders, and receivers, and a set of customizable asynchronous algorithms. +Today, C++ software is increasingly asynchronous and parallel, a trend that is +likely to only continue going forward. Asynchrony and parallelism appears +everywhere, from processor hardware interfaces, to networking, to file I/O, to +GUIs, to accelerators. Every C++ domain and every platform needs to deal with +asynchrony and parallelism, from scientific computing to video games to +financial services, from the smallest mobile devices to your laptop to GPUs in +the world's fastest supercomputer. + +While the C++ Standard Library has a rich set of concurrency primitives +(`std::atomic`, `std::mutex`, `std::counting_semaphore`, etc) and lower level +building blocks (`std::thread`, etc), we lack a Standard vocabulary and +framework for asynchrony and parallelism that C++ programmers desperately need. +`std::async`/`std::future`/`std::promise`, C++11's intended exposure for +asynchrony, is inefficient, hard to use correctly, and severely lacking in +genericity, making it unusable in many contexts. We introduced parallel +algorithms to the C++ Standard Library in C++17, and while they are an excellent +start, they are all inherently synchronous and not composable. + +This paper proposes a Standard C++ model for asynchrony based around three key +abstractions: schedulers, senders, and receivers, and a set of customizable +asynchronous algorithms. ## Priorities ## {#priorities} -* Be composable and generic, allowing users to write code that can be used with many different types of execution resources. -* Encapsulate common asynchronous patterns in customizable and reusable algorithms, so users don't have to invent things themselves. +* Be composable and generic, allowing users to write code that can be used with + many different types of execution resources. +* Encapsulate common asynchronous patterns in customizable and reusable + algorithms, so users don't have to invent things themselves. * Make it easy to be correct by construction. -* Support the diversity of execution resources and execution agents, because not all execution agents are created equal; some are less capable than others, but not less important. -* Allow everything to be customized by an execution resource, including transfer to other execution resources, but don't require that execution resources customize everything. +* Support the diversity of execution resources and execution agents, because not + all execution agents are created equal; some are less capable than others, + but not less important. +* Allow everything to be customized by an execution resource, including transfer + to other execution resources, but don't require that execution resources + customize everything. * Care about all reasonable use cases, domains and platforms. * Errors must be propagated, but error handling must not present a burden. * Support cancellation, which is not an error. @@ -160,7 +178,11 @@ This paper proposes a Standard C++ model for asynchrony, based around three key ## Examples: End User ## {#example-end-user} -In this section we demonstrate the end-user experience of asynchronous programming directly with the sender algorithms presented in this paper. See [[#design-sender-factories]], [[#design-sender-adaptors]], and [[#design-sender-consumers]] for short explanations of the algorithms used in these code examples. +In this section we demonstrate the end-user experience of asynchronous +programming directly with the sender algorithms presented in this paper. See +[[#design-sender-factories]], [[#design-sender-adaptors]], and +[[#design-sender-consumers]] for short explanations of the algorithms used in +these code examples. ### Hello world ### {#example-hello-world} @@ -181,11 +203,34 @@ auto [i] = this_thread::sync_wait(add_42).value(); // This example demonstrates the basics of schedulers, senders, and receivers: -1. First we need to get a scheduler from somewhere, such as a thread pool. A scheduler is a lightweight handle to an execution resource. -2. To start a chain of work on a scheduler, we call [[#design-sender-factory-schedule]], which returns a sender that completes on the scheduler. A sender describes asynchronous work and sends a signal (value, error, or stopped) to some recipient(s) when that work completes. -3. We use sender algorithms to produce senders and compose asynchronous work. [[#design-sender-adaptor-then]] is a sender adaptor that takes an input sender and a `std::invocable`, and calls the `std::invocable` on the signal sent by the input sender. The sender returned by `then` sends the result of that invocation. In this case, the input sender came from `schedule`, so its `void`, meaning it won't send us a value, so our `std::invocable` takes no parameters. But we return an `int`, which will be sent to the next recipient. -4. Now, we add another operation to the chain, again using [[#design-sender-adaptor-then]]. This time, we get sent a value - the `int` from the previous step. We add `42` to it, and then return the result. -5. Finally, we're ready to submit the entire asynchronous pipeline and wait for its completion. Everything up until this point has been completely asynchronous; the work may not have even started yet. To ensure the work has started and then block pending its completion, we use [[#design-sender-consumer-sync_wait]], which will either return a `std::optional>` with the value sent by the last sender, or an empty `std::optional` if the last sender sent a stopped signal, or it throws an exception if the last sender sent an error. +1. First we need to get a scheduler from somewhere, such as a thread pool. A + scheduler is a lightweight handle to an execution resource. + +2. To start a chain of work on a scheduler, we call + [[#design-sender-factory-schedule]], which returns a sender that completes on + the scheduler. A sender describes asynchronous work and sends a signal + (value, error, or stopped) to some recipient(s) when that work completes. + +3. We use sender algorithms to produce senders and compose asynchronous work. + [[#design-sender-adaptor-then]] is a sender adaptor that takes an input + sender and a `std::invocable`, and calls the `std::invocable` on the signal + sent by the input sender. The sender returned by `then` sends the result of + that invocation. In this case, the input sender came from `schedule`, so its + `void`, meaning it won't send us a value, so our `std::invocable` takes no + parameters. But we return an `int`, which will be sent to the next recipient. + +4. Now, we add another operation to the chain, again using + [[#design-sender-adaptor-then]]. This time, we get sent a value - the `int` + from the previous step. We add `42` to it, and then return the result. + +5. Finally, we're ready to submit the entire asynchronous pipeline and wait for + its completion. Everything up until this point has been completely + asynchronous; the work may not have even started yet. To ensure the work has + started and then block pending its completion, we use + [[#design-sender-consumer-sync_wait]], which will either return a + `std::optional>` with the value sent by the last sender, or + an empty `std::optional` if the last sender sent a stopped signal, or it + throws an exception if the last sender sent an error. ### Asynchronous inclusive scan ### {#example-async-inclusive-scan} @@ -236,21 +281,54 @@ sender auto async_inclusive_scan(scheduler auto sch, // This example builds an asynchronous computation of an inclusive scan: -1. It scans a sequence of `double`s (represented as the `std::span` `input`) and stores the result in another sequence of `double`s (represented as `std::span` `output`). -2. It takes a scheduler, which specifies what execution resource the scan should be launched on. -3. It also takes a `tile_count` parameter that controls the number of execution agents that will be spawned. -4. First we need to allocate temporary storage needed for the algorithm, which we'll do with a `std::vector`, `partials`. We need one `double` of temporary storage for each execution agent we create. -5. Next we'll create our initial sender with [[#design-sender-factory-just]] and [[#design-sender-adaptor-transfer]]. These senders will send the temporary storage, which we've moved into the sender. The sender has a completion scheduler of `sch`, which means the next item in the chain will use `sch`. -6. Senders and sender adaptors support composition via `operator|`, similar to C++ ranges. We'll use `operator|` to attach the next piece of work, which will spawn `tile_count` execution agents using [[#design-sender-adaptor-bulk]] (see [[#design-pipeable]] for details). -7. Each agent will call a `std::invocable`, passing it two arguments. The first is the agent's index (`i`) in the [[#design-sender-adaptor-bulk]] operation, in this case a unique integer in `[0, tile_count)`. The second argument is what the input sender sent - the temporary storage. -8. We start by computing the start and end of the range of input and output elements that this agent is responsible for, based on our agent index. -9. Then we do a sequential `std::inclusive_scan` over our elements. We store the scan result for our last element, which is the sum of all of our elements, in our temporary storage `partials`. -10. After all computation in that initial [[#design-sender-adaptor-bulk]] pass has completed, every one of the spawned execution agents will have written the sum of its elements into its slot in `partials`. -11. Now we need to scan all of the values in `partials`. We'll do that with a single execution agent which will execute after the [[#design-sender-adaptor-bulk]] completes. We create that execution agent with [[#design-sender-adaptor-then]]. -12. [[#design-sender-adaptor-then]] takes an input sender and an `std::invocable` and calls the `std::invocable` with the value sent by the input sender. Inside our `std::invocable`, we call `std::inclusive_scan` on `partials`, which the input senders will send to us. +1. It scans a sequence of `double`s (represented as the `std::span` `input`) and stores the result in another sequence of `double`s + (represented as `std::span` `output`). +2. It takes a scheduler, which specifies what execution resource the scan should + be launched on. +3. It also takes a `tile_count` parameter that controls the number of execution + agents that will be spawned. +4. First we need to allocate temporary storage needed for the algorithm, which + we'll do with a `std::vector`, `partials`. We need one `double` of temporary + storage for each execution agent we create. +5. Next we'll create our initial sender with [[#design-sender-factory-just]] and + [[#design-sender-adaptor-transfer]]. These senders will send the temporary + storage, which we've moved into the sender. The sender has a completion + scheduler of `sch`, which means the next item in the chain will use `sch`. +6. Senders and sender adaptors support composition via `operator|`, similar to + C++ ranges. We'll use `operator|` to attach the next piece of work, which + will spawn `tile_count` execution agents using + [[#design-sender-adaptor-bulk]] (see [[#design-pipeable]] for details). +7. Each agent will call a `std::invocable`, passing it two arguments. The first + is the agent's index (`i`) in the [[#design-sender-adaptor-bulk]] operation, + in this case a unique integer in `[0, tile_count)`. The second argument is + what the input sender sent - the temporary storage. +8. We start by computing the start and end of the range of input and output + elements that this agent is responsible for, based on our agent index. +9. Then we do a sequential `std::inclusive_scan` over our elements. We store the + scan result for our last element, which is the sum of all of our elements, + in our temporary storage `partials`. +10. After all computation in that initial [[#design-sender-adaptor-bulk]] pass + has completed, every one of the spawned execution agents will have written + the sum of its elements into its slot in `partials`. +11. Now we need to scan all of the values in `partials`. We'll do that with a + single execution agent which will execute after the + [[#design-sender-adaptor-bulk]] completes. We create that execution agent + with [[#design-sender-adaptor-then]]. +12. [[#design-sender-adaptor-then]] takes an input sender and an + `std::invocable` and calls the `std::invocable` with the value sent by the + input sender. Inside our `std::invocable`, we call `std::inclusive_scan` + on `partials`, which the input senders will send to us. 13. Then we return `partials`, which the next phase will need. -14. Finally we do another [[#design-sender-adaptor-bulk]] of the same shape as before. In this [[#design-sender-adaptor-bulk]], we will use the scanned values in `partials` to integrate the sums from other tiles into our elements, completing the inclusive scan. -15. `async_inclusive_scan` returns a sender that sends the output `std::span`. A consumer of the algorithm can chain additional work that uses the scan result. At the point at which `async_inclusive_scan` returns, the computation may not have completed. In fact, it may not have even started. +14. Finally we do another [[#design-sender-adaptor-bulk]] of the same shape as + before. In this [[#design-sender-adaptor-bulk]], we will use the scanned + values in `partials` to integrate the sums from other tiles into our + elements, completing the inclusive scan. +15. `async_inclusive_scan` returns a sender that sends the output + `std::span`. A consumer of the algorithm can chain additional work + that uses the scan result. At the point at which `async_inclusive_scan` + returns, the computation may not have completed. In fact, it may not have + even started. ### Asynchronous dynamically-sized read ### {#example-async-dynamically-sized-read} @@ -287,28 +365,57 @@ sender_of auto async_read_array(auto handle) { // } ``` -This example demonstrates a common asynchronous I/O pattern - reading a payload of a dynamic size by first reading the size, then reading the number of bytes specified by the size: - -1. `async_read` is a pipeable sender adaptor. It's a customization point object, but this is what it's call signature looks like. It takes a sender parameter which must send an input buffer in the form of a `std::span`, and a handle to an I/O context. It will asynchronously read into the input buffer, up to the size of the `std::span`. It returns a sender which will send the number of bytes read once the read completes. -2. `async_read_array` takes an I/O handle and reads a size from it, and then a buffer of that many bytes. It returns a sender that sends a `dynamic_buffer` object that owns the data that was sent. -3. `dynamic_buffer` is an aggregate struct that contains a `std::unique_ptr` and a size. -4. The first thing we do inside of `async_read_array` is create a sender that will send a new, empty `dynamic_array` object using [[#design-sender-factory-just]]. We can attach more work to the pipeline using `operator|` composition (see [[#design-pipeable]] for details). -5. We need the lifetime of this `dynamic_array` object to last for the entire pipeline. So, we use `let_value`, which takes an input sender and a `std::invocable` that must return a sender itself (see [[#design-sender-adaptor-let]] for details). `let_value` sends the value from the input sender to the `std::invocable`. Critically, the lifetime of the sent object will last until the sender returned by the `std::invocable` completes. -6. Inside of the `let_value` `std::invocable`, we have the rest of our logic. First, we want to initiate an `async_read` of the buffer size. To do that, we need to send a `std::span` pointing to `buf.size`. We can do that with [[#design-sender-factory-just]]. -7. We chain the `async_read` onto the [[#design-sender-factory-just]] sender with `operator|`. -8. Next, we pipe a `std::invocable` that will be invoked after the `async_read` completes using [[#design-sender-adaptor-then]]. +This example demonstrates a common asynchronous I/O pattern - reading a payload +of a dynamic size by first reading the size, then reading the number of bytes +specified by the size: + +1. `async_read` is a pipeable sender adaptor. It's a customization point object, + but this is what it's call signature looks like. It takes a sender parameter + which must send an input buffer in the form of a `std::span`, and + a handle to an I/O context. It will asynchronously read into the input + buffer, up to the size of the `std::span`. It returns a sender which will + send the number of bytes read once the read completes. +2. `async_read_array` takes an I/O handle and reads a size from it, and then a + buffer of that many bytes. It returns a sender that sends a `dynamic_buffer` + object that owns the data that was sent. +3. `dynamic_buffer` is an aggregate struct that contains a + `std::unique_ptr` and a size. +4. The first thing we do inside of `async_read_array` is create a sender that + will send a new, empty `dynamic_array` object using + [[#design-sender-factory-just]]. We can attach more work to the pipeline + using `operator|` composition (see [[#design-pipeable]] for details). +5. We need the lifetime of this `dynamic_array` object to last for the entire + pipeline. So, we use `let_value`, which takes an input sender and a + `std::invocable` that must return a sender itself (see + [[#design-sender-adaptor-let]] for details). `let_value` sends the value + from the input sender to the `std::invocable`. Critically, the lifetime of + the sent object will last until the sender returned by the `std::invocable` + completes. +6. Inside of the `let_value` `std::invocable`, we have the rest of our logic. + First, we want to initiate an `async_read` of the buffer size. To do that, + we need to send a `std::span` pointing to `buf.size`. We can do that with + [[#design-sender-factory-just]]. +7. We chain the `async_read` onto the [[#design-sender-factory-just]] sender + with `operator|`. +8. Next, we pipe a `std::invocable` that will be invoked after the `async_read` + completes using [[#design-sender-adaptor-then]]. 9. That `std::invocable` gets sent the number of bytes read. 10. We need to check that the number of bytes read is what we expected. 11. Now that we have read the size of the data, we can allocate storage for it. -12. We return a `std::span` to the storage for the data from the `std::invocable`. This will be sent to the next recipient in the pipeline. +12. We return a `std::span` to the storage for the data from the + `std::invocable`. This will be sent to the next recipient in the pipeline. 13. And that recipient will be another `async_read`, which will read the data. -14. Once the data has been read, in another [[#design-sender-adaptor-then]], we confirm that we read the right number of bytes. -15. Finally, we move out of and return our `dynamic_buffer` object. It will get sent by the sender returned by `async_read_array`. We can attach more things to that sender to use the data in the buffer. +14. Once the data has been read, in another [[#design-sender-adaptor-then]], we + confirm that we read the right number of bytes. +15. Finally, we move out of and return our `dynamic_buffer` object. It will get + sent by the sender returned by `async_read_array`. We can attach more + things to that sender to use the data in the buffer. ## Asynchronous Windows socket `recv` ## {#example-async-windows-socket-recv} -To get a better feel for how this interface might be used by low-level operations see this example implementation -of a cancellable `async_recv()` operation for a Windows Socket. +To get a better feel for how this interface might be used by low-level +operations see this example implementation of a cancellable `async_recv()` +operation for a Windows Socket. ```c++ struct operation_base : WSAOVERALAPPED { @@ -320,6 +427,8 @@ struct operation_base : WSAOVERALAPPED { template struct recv_op : operation_base { + using operation_state_concept = std::execution::operation_state_t; + recv_op(SOCKET s, void* data, size_t len, Receiver r) : receiver(std::move(r)) , sock(s) { @@ -333,40 +442,40 @@ struct recv_op : operation_base { buffer.buf = static_cast(data); } - friend void tag_invoke(std::execution::start_t, recv_op& self) noexcept { + void start() & noexcept { // Avoid even calling WSARecv() if operation already cancelled auto st = std::execution::get_stop_token( - std::execution::get_env(self.receiver)); + std::execution::get_env(receiver)); if (st.stop_requested()) { - std::execution::set_stopped(std::move(self.receiver)); + std::execution::set_stopped(std::move(receiver)); return; } // Store and cache result here in case it changes during execution const bool stopPossible = st.stop_possible(); if (!stopPossible) { - self.ready.store(true, std::memory_order_relaxed); + ready.store(true, std::memory_order_relaxed); } // Launch the operation DWORD bytesTransferred = 0; DWORD flags = 0; - int result = WSARecv(self.sock, &self.buffer, 1, &bytesTransferred, &flags, - static_cast(&self), NULL); + int result = WSARecv(sock, &buffer, 1, &bytesTransferred, &flags, + static_cast(this), NULL); if (result == SOCKET_ERROR) { int errorCode = WSAGetLastError(); if (errorCode != WSA_IO_PENDING) { if (errorCode == WSA_OPERATION_ABORTED) { - std::execution::set_stopped(std::move(self.receiver)); + std::execution::set_stopped(std::move(receiver)); } else { - std::execution::set_error(std::move(self.receiver), + std::execution::set_error(std::move(receiver), std::error_code(errorCode, std::system_category())); } return; } } else { // Completed synchronously (assuming FILE_SKIP_COMPLETION_PORT_ON_SUCCESS has been set) - execution::set_value(std::move(self.receiver), bytesTransferred); + execution::set_value(std::move(receiver), bytesTransferred); return; } @@ -374,20 +483,20 @@ struct recv_op : operation_base { // May be completing concurrently on another thread already. if (stopPossible) { // Register the stop callback - self.stopCallback.emplace(std::move(st), cancel_cb{self}); + stopCallback.emplace(std::move(st), cancel_cb{*this}); // Mark as 'completed' - if (self.ready.load(std::memory_order_acquire) || - self.ready.exchange(true, std::memory_order_acq_rel)) { + if (ready.load(std::memory_order_acquire) || + ready.exchange(true, std::memory_order_acq_rel)) { // Already completed on another thread - self.stopCallback.reset(); + stopCallback.reset(); - BOOL ok = WSAGetOverlappedResult(self.sock, (WSAOVERLAPPED*)&self, &bytesTransferred, FALSE, &flags); + BOOL ok = WSAGetOverlappedResult(sock, (WSAOVERLAPPED*)this, &bytesTransferred, FALSE, &flags); if (ok) { - std::execution::set_value(std::move(self.receiver), bytesTransferred); + std::execution::set_value(std::move(receiver), bytesTransferred); } else { int errorCode = WSAGetLastError(); - std::execution::set_error(std::move(self.receiver), + std::execution::set_error(std::move(receiver), std::error_code(errorCode, std::system_category())); } } @@ -405,16 +514,16 @@ struct recv_op : operation_base { static void on_complete(operation_base* op, DWORD bytesTransferred, int errorCode) noexcept { recv_op& self = *static_cast(op); - if (ready.load(std::memory_order_acquire) || - ready.exchange(true, std::memory_order_acq_rel)) { + if (self.ready.load(std::memory_order_acquire) || + self.ready.exchange(true, std::memory_order_acq_rel)) { // Unsubscribe any stop-callback so we know that CancelIoEx() is not accessing 'op' // any more - stopCallback.reset(); + self.stopCallback.reset(); if (errorCode == 0) { - std::execution::set_value(std::move(receiver), bytesTransferred); + std::execution::set_value(std::move(self.receiver), bytesTransferred); } else { - std::execution::set_error(std::move(receiver), + std::execution::set_error(std::move(self.receiver), std::error_code(errorCode, std::system_category())); } } @@ -435,10 +544,8 @@ struct recv_sender { size_t len; template - friend recv_op tag_invoke(std::execution::connect_t, - const recv_sender& s, - Receiver r) { - return recv_op{s.sock, s.data, s.len, std::move(r)}; + recv_op connect(Receiver r) const { + return recv_op{sock, data, len, std::move(r)}; } }; @@ -451,29 +558,52 @@ recv_sender async_recv(SOCKET s, void* data, size_t len) { #### Sudoku solver #### {#example-sudoku} -This example comes from Kirk Shoop, who ported an example from TBB's documentation to sender/receiver in his fork of the libunifex repo. It is a Sudoku solver that uses a configurable number of threads to explore the search space for solutions. +This example comes from Kirk Shoop, who ported an example from TBB's +documentation to sender/receiver in his fork of the libunifex repo. It is a +Sudoku solver that uses a configurable number of threads to explore the search +space for solutions. -The sender/receiver-based Sudoku solver can be found [here](https://github.com/kirkshoop/libunifex/blob/sudoku/examples/sudoku.cpp). Some things that are worth noting about Kirk's solution: +The sender/receiver-based Sudoku solver can be found +[here](https://github.com/kirkshoop/libunifex/blob/sudoku/examples/sudoku.cpp). +Some things that are worth noting about Kirk's solution: -1. Although it schedules asychronous work onto a thread pool, and each unit of work will schedule more work, its use of structured concurrency patterns make reference counting unnecessary. The solution does not make use of `shared_ptr`. +1. Although it schedules asychronous work onto a thread pool, and each unit of + work will schedule more work, its use of structured concurrency patterns + make reference counting unnecessary. The solution does not make use of + `shared_ptr`. -2. In addition to eliminating the need for reference counting, the use of structured concurrency makes it easy to ensure that resources are cleaned up on all code paths. In contrast, the TBB example that inspired this one [leaks memory](https://github.com/oneapi-src/oneTBB/issues/568). +2. In addition to eliminating the need for reference counting, the use of + structured concurrency makes it easy to ensure that resources are cleaned up + on all code paths. In contrast, the TBB example that inspired this one + [leaks memory](https://github.com/oneapi-src/oneTBB/issues/568). -For comparison, the TBB-based Sudoku solver can be found [here](https://github.com/oneapi-src/oneTBB/blob/40a9a1060069d37d5f66912c6ee4cf165144774b/examples/task_group/sudoku/sudoku.cpp). +For comparison, the TBB-based Sudoku solver can be found +[here](https://github.com/oneapi-src/oneTBB/blob/40a9a1060069d37d5f66912c6ee4cf165144774b/examples/task_group/sudoku/sudoku.cpp). #### File copy #### {#example-file-copy} -This example also comes from Kirk Shoop which uses sender/receiver to recursively copy the files a directory tree. It demonstrates how sender/receiver can be used to do IO, using a scheduler that schedules work on Linux's io_uring. +This example also comes from Kirk Shoop which uses sender/receiver to +recursively copy the files a directory tree. It demonstrates how sender/receiver +can be used to do IO, using a scheduler that schedules work on Linux's io_uring. -As with the Sudoku example, this example obviates the need for reference counting by employing structured concurrency. It uses iteration with an upper limit to avoid having too many open file handles. +As with the Sudoku example, this example obviates the need for reference +counting by employing structured concurrency. It uses iteration with an upper +limit to avoid having too many open file handles. -You can find the example [here](https://github.com/kirkshoop/libunifex/blob/filecopy/examples/file_copy.cpp). +You can find the example +[here](https://github.com/kirkshoop/libunifex/blob/filecopy/examples/file_copy.cpp). #### Echo server #### {#example-echo-server} -Dietmar Kuehl has a hobby project that implements networking APIs on top of sender/receiver. He recently implemented an echo server as a demo. His echo server code can be found [here](https://github.com/dietmarkuehl/kuhllib/blob/main/src/examples/echo_server.cpp). +Dietmar Kuehl has proposed networking APIs that use the sender/receiver +abstraction (see \[P2762](https://wg21.link/P2762)). He has implemented an echo +server as a demo. His echo server code can be found +[here](https://github.com/dietmarkuehl/kuhllib/blob/main/src/examples/echo_server.cpp). -Below, I show the part of the echo server code. This code is executed for each client that connects to the echo server. In a loop, it reads input from a socket and echos the input back to the same socket. All of this, including the loop, is implemented with generic async algorithms. +Below, I show the part of the echo server code. This code is executed for each +client that connects to the echo server. In a loop, it reads input from a socket +and echos the input back to the same socket. All of this, including the loop, is +implemented with generic async algorithms.
     outstanding.start(
@@ -498,228 +628,272 @@ Below, I show the part of the echo server code. This code is executed for each c
     );
     
-In this code, `NN::async_read_some` and `NN::async_write_some` are asynchronous socket-based networking APIs that return senders. `EX::repeat_effect_until`, `EX::let_value`, and `EX::then` are fully generic sender adaptor algorithms that accept and return senders. +In this code, `NN::async_read_some` and `NN::async_write_some` are asynchronous +socket-based networking APIs that return senders. `EX::repeat_effect_until`, +`EX::let_value`, and `EX::then` are fully generic sender adaptor algorithms that +accept and return senders. -This is a good example of seamless composition of async IO functions with non-IO operations. And by composing the senders in this structured way, all the state for the composite operation -- the `repeat_effect_until` expression and all its child operations -- is stored altogether in a single object. +This is a good example of seamless composition of async IO functions with non-IO +operations. And by composing the senders in this structured way, all the state +for the composite operation -- the `repeat_effect_until` expression and all its +child operations -- is stored altogether in a single object. ## Examples: Algorithms ## {#example-algorithm} -In this section we show a few simple sender/receiver-based algorithm implementations. +In this section we show a few simple sender/receiver-based algorithm +implementations. ### `then` ### {#example-then} ```c++ -namespace exec = std::execution; +namespace stdexec = std::execution; -template -class _then_receiver - : exec::receiver_adaptor<_then_receiver, R> { - friend exec::receiver_adaptor<_then_receiver, R>; +template +class _then_receiver : public R { F f_; - // Customize set_value by invoking the callable and passing the result to the inner receiver - template - void set_value(As&&... as) && noexcept try { - exec::set_value(std::move(*this).base(), std::invoke((F&&) f_, (As&&) as...)); - } catch(...) { - exec::set_error(std::move(*this).base(), std::current_exception()); - } - public: - _then_receiver(R r, F f) - : exec::receiver_adaptor<_then_receiver, R>{std::move(r)} - , f_(std::move(f)) {} + _then_receiver(R r, F f) : R(std::move(r)), f_(std::move(f)) {} + + // Customize set_value by invoking the callable and passing the result to + // the inner receiver + template + requires std::invocable + void set_value(As&&... as) && noexcept { + try { + stdexec::set_value(std::move(*this).base(), std::invoke((F&&) f_, (As&&) as...)); + } catch(...) { + stdexec::set_error(std::move(*this).base(), std::current_exception()); + } + } }; -template +template struct _then_sender { - using sender_concept = exec::sender_t; + using sender_concept = stdexec::sender_t; S s_; F f_; template - using _set_value_t = exec::completion_signatures< - exec::set_value_t(std::invoke_result_t)>; + using _set_value_t = stdexec::completion_signatures< + stdexec::set_value_t(std::invoke_result_t)>; + + using _except_ptr_sig = + stdexec::completion_signatures; // Compute the completion signatures - template - friend auto tag_invoke(exec::get_completion_signatures_t, _then_sender&&, Env) - -> exec::transform_completion_signatures_of, - _set_value_t>; + template + auto get_completion_signatures(Env&& env) && noexcept + -> stdexec::transform_completion_signatures_of< + S, Env, _except_ptr_sig, _set_value_t> { + return {}; + } // Connect: - template - friend auto tag_invoke(exec::connect_t, _then_sender&& self, R r) - -> exec::connect_result_t> { - return exec::connect( - (S&&) self.s_, _then_receiver{(R&&) r, (F&&) self.f_}); + template + auto connect(R r) && -> stdexec::connect_result_t> { + return stdexec::connect( + (S&&) s_, _then_receiver{(R&&) r, (F&&) f_}); } - friend decltype(auto) tag_invoke(get_env_t, const _then_sender& self) noexcept { - return get_env(self.s_); + decltype(auto) get_env() const noexcept { + return get_env(s_); } }; -template -exec::sender auto then(S s, F f) { +template +stdexec::sender auto then(S s, F f) { return _then_sender{(S&&) s, (F&&) f}; } ``` -This code builds a `then` algorithm that transforms the value(s) from the input sender -with a transformation function. The result of the transformation becomes the new value. -The other receiver functions (`set_error` and `set_stopped`), as well as all receiver queries, -are passed through unchanged. +This code builds a `then` algorithm that transforms the value(s) from the input +sender with a transformation function. The result of the transformation becomes +the new value. The other receiver functions (`set_error` and `set_stopped`), as +well as all receiver queries, are passed through unchanged. In detail, it does the following: -1. Defines a receiver in terms of `execution::receiver_adaptor` that aggregates - another receiver and an invocable that: - * Defines a constrained `tag_invoke` overload for transforming the value - channel. - * Defines another constrained overload of `tag_invoke` that passes all other - customizations through unchanged. +1. Defines a receiver in terms of receiver and an invocable that: + * Defines a constrained `set_value` member function for transforming the + value channel. + * Delegates `set_error` and `set_stopped` to the inner receiver. - The `tag_invoke` overloads are actually implemented by - `execution::receiver_adaptor`; they dispatch either to named members, as - shown above with `_then_receiver::set_value`, or to the adapted receiver. -2. Defines a sender that aggregates another sender and the invocable, which defines a `tag_invoke` customization for `std::execution::connect` that wraps the incoming receiver in the receiver from (1) and passes it and the incoming sender to `std::execution::connect`, returning the result. It also defines a `tag_invoke` customization of `get_completion_signatures` that declares the sender's completion signatures when executed within a particular environment. +2. Defines a sender that aggregates another sender and the invocable, which + defines a `connect` member function that wraps the incoming receiver in the + receiver from (1) and passes it and the incoming sender to + `std::execution::connect`, returning the result. It also defines a + `get_completion_signatures` member function that declares the sender's + completion signatures when executed within a particular environment. ### `retry` ### {#example-retry} ```c++ using namespace std; -namespace exec = execution; +namespace stdexec = execution; -template +template concept _decays_to = same_as, To>; // _conv needed so we can emplace construct non-movable types into // a std::optional. template - requires is_nothrow_move_constructible_v struct _conv { F f_; + + static_assert(is_nothrow_move_constructible_v); explicit _conv(F f) noexcept : f_((F&&) f) {} + operator invoke_result_t() && { return ((F&&) f_)(); } }; template -struct _op; +struct _retry_op; -// pass through all customizations except set_error, which retries the operation. +// pass through all customizations except set_error, which retries +// the operation. template -struct _retry_receiver - : exec::receiver_adaptor<_retry_receiver> { - _op* o_; - - R&& base() && noexcept { return (R&&) o_->r_; } - const R& base() const & noexcept { return o_->r_; } +struct _retry_receiver { + _retry_op* o_; - explicit _retry_receiver(_op* o) : o_(o) {} + void set_value(auto&&... as) && noexcept { + stdexec::set_value(std::move(o_->r_), (decltype(as)&&) as...); + } void set_error(auto&&) && noexcept { o_->_retry(); // This causes the op to be retried } + + void set_stopped() && noexcept { + stdexec::set_stopped(std::move(o_->r_)); + } + + decltype(auto) get_env() const noexcept { + return get_env(o_->r_); + } }; // Hold the nested operation state in an optional so we can // re-construct and re-start it if the operation fails. template -struct _op { +struct _retry_op { + using operation_state_concept = stdexec::operation_state_t; + using _child_op_t = + stdexec::connect_result_t>; + S s_; R r_; - optional< - exec::connect_result_t>> o_; + optional<_child_op_t> o_; - _op(S s, R r): s_((S&&)s), r_((R&&)r), o_{_connect()} {} _op(_op&&) = delete; + _op(S s, R r) + : s_(std::move(s)), r_(std::move(r)), o_{_connect()} {} auto _connect() noexcept { return _conv{[this] { - return exec::connect(s_, _retry_receiver{this}); + return stdexec::connect(s_, _retry_receiver{this}); }}; } - void _retry() noexcept try { - o_.emplace(_connect()); // potentially-throwing - exec::start(*o_); - } catch(...) { - exec::set_error((R&&) r_, std::current_exception()); + + void _retry() noexcept { + try { + o_.emplace(_connect()); // potentially-throwing + stdexec::start(*o_); + } catch(...) { + stdexec::set_error(std::move(r_), std::current_exception()); + } } - friend void tag_invoke(exec::start_t, _op& o) noexcept { - exec::start(*o.o_); + + void start() & noexcept { + stdexec::start(*o_); } }; +// Helpers for computing the `then` sender's completion signatures: +template + using _value_t = + stdexec::completion_signatures; + +template + using _error_t = stdexec::completion_signatures<>; + +using _except_sig = + stdexec::completion_signatures; + template struct _retry_sender { - using sender_concept = exec::sender_t; + using sender_concept = stdexec::sender_t; S s_; - explicit _retry_sender(S s) : s_((S&&) s) {} - - template - using _value_t = - exec::completion_signatures; - template - using _error_t = exec::completion_signatures<>; + explicit _retry_sender(S s) : s_(std::move(s)) {} // Declare the signatures with which this sender can complete template - friend auto tag_invoke(exec::get_completion_signatures_t, const _retry_sender&, Env) - -> exec::transform_completion_signatures_of, - _value_t, _error_t>; - - template - friend _op tag_invoke(exec::connect_t, _retry_sender&& self, R r) { - return {(S&&) self.s_, (R&&) r}; + using _compl_sigs = + stdexec::transform_completion_signatures_of< + S&, Env, _except_sig, _value_t, _error_t>; + + template + auto get_completion_signatures(Env&&) const noexcept -> _compl_sigs { + return {}; + } + + template + requires stdexec::sender_to> + _retry_op connect(R r) && { + return {std::move(s_), std::move(r)}; } - friend decltype(auto) tag_invoke(exec::get_env_t, const _retry_sender& self) noexcept { - return get_env(self.s_); + decltype(auto) get_env() const noexcept { + return get_env(s_); } }; -template -exec::sender auto retry(S s) { - return _retry_sender{(S&&) s}; +template +stdexec::sender auto retry(S s) { + return _retry_sender{std::move(s)}; } ``` -The `retry` algorithm takes a multi-shot sender and causes it to repeat on error, passing -through values and stopped signals. Each time the input sender is restarted, a new receiver -is connected and the resulting operation state is stored in an `optional`, which allows us -to reinitialize it multiple times. +The `retry` algorithm takes a multi-shot sender and causes it to repeat on +error, passing through values and stopped signals. Each time the input sender is +restarted, a new receiver is connected and the resulting operation state is +stored in an `optional`, which allows us to reinitialize it multiple times. This example does the following: -1. Defines a `_conv` utility that takes advantage of C++17's guaranteed copy elision to - emplace a non-movable type in a `std::optional`. +1. Defines a `_conv` utility that takes advantage of C++17's guaranteed copy + elision to emplace a non-movable type in a `std::optional`. -2. Defines a `_retry_receiver` that holds a pointer back to the operation state. It passes - all customizations through unmodified to the inner receiver owned by the operation state - except for `set_error`, which causes a `_retry()` function to be called instead. +2. Defines a `_retry_receiver` that holds a pointer back to the operation state. + It passes all customizations through unmodified to the inner receiver owned + by the operation state except for `set_error`, which causes a `_retry()` + function to be called instead. -3. Defines an operation state that aggregates the input sender and receiver, and declares - storage for the nested operation state in an `optional`. Constructing the operation - state constructs a `_retry_receiver` with a pointer to the (under construction) operation - state and uses it to connect to the aggregated sender. +3. Defines an operation state that aggregates the input sender and receiver, and + declares storage for the nested operation state in an `optional`. + Constructing the operation state constructs a `_retry_receiver` with a + pointer to the (under construction) operation state and uses it to connect + to the input sender. -4. Starting the operation state dispatches to `start` on the inner operation state. +4. Starting the operation state dispatches to `start` on the inner operation + state. -5. The `_retry()` function reinitializes the inner operation state by connecting the sender - to a new receiver, holding a pointer back to the outer operation state as before. +5. The `_retry()` function reinitializes the inner operation state by connecting + the sender to a new receiver, holding a pointer back to the outer operation + state as before. -6. After reinitializing the inner operation state, `_retry()` calls `start` on it, causing - the failed operation to be rescheduled. +6. After reinitializing the inner operation state, `_retry()` calls `start` on + it, causing the failed operation to be rescheduled. -7. Defines a `_retry_sender` that implements the `connect` customization point to return - an operation state constructed from the passed-in sender and receiver. +7. Defines a `_retry_sender` that implements a `connect` member function to + return an operation state constructed from the passed-in sender and + receiver. -8. `_retry_sender` also implements the `get_completion_signatures` customization point to describe the ways this sender may complete when executed in a particular execution resource. +8. `_retry_sender` also implements a `get_completion_signatures` member function + to describe the ways this sender may complete when executed in a particular + execution resource. ## Examples: Schedulers ## {#example-schedulers} @@ -728,58 +902,61 @@ In this section we look at some schedulers of varying complexity. ### Inline scheduler ### {#example-schedulers-inline} ```c++ +namespace stdexec = std::execution; + class inline_scheduler { template - struct _op { - [[no_unique_address]] R rec_; - friend void tag_invoke(std::execution::start_t, _op& op) noexcept { - std::execution::set_value((R&&) op.rec_); - } - }; + struct _op { + using operation_state_concept = operation_state_t; + R rec_; + + void start() & noexcept { + stdexec::set_value(std::move(rec_)); + } + }; struct _env { template - friend inline_scheduler tag_invoke( - std::execution::get_completion_scheduler_t, _env) noexcept { - return {}; - } + inline_scheduler query(stdexec::get_completion_scheduler_t) const noexcept { + return {}; + } }; struct _sender { - using sender_concept = std::execution::sender_t; - using completion_signatures = - std::execution::completion_signatures; - - template - friend auto tag_invoke(std::execution::connect_t, _sender, R&& rec) - noexcept(std::is_nothrow_constructible_v, R>) - -> _op> { - return {(R&&) rec}; - } + using sender_concept = stdexec::sender_t; + using _compl_sigs = stdexec::completion_signatures; + using completion_signatures = _compl_sigs; - friend _env tag_invoke(exec::get_env_t, _sender) noexcept { + template R> + _op connect(R rec) noexcept(std::is_nothrow_move_constructible_v) { + return {std::move(rec)}; + } + + _env get_env() const noexcept { return {}; } }; - friend _sender tag_invoke(std::execution::schedule_t, const inline_scheduler&) noexcept { + public: + inline_scheduler() = default; + + _sender schedule() const noexcept { return {}; } - public: - inline_scheduler() = default; bool operator==(const inline_scheduler&) const noexcept = default; }; ``` -The inline scheduler is a trivial scheduler that completes immediately and synchronously on -the thread that calls `std::execution::start` on the operation state produced by its sender. -In other words, start(connect(schedule(inline-scheduler), receiver)) is -just a fancy way of saying `set_value(receiver)`, with the exception of the fact that `start` -wants to be passed an lvalue. +The inline scheduler is a trivial scheduler that completes immediately and +synchronously on the thread that calls `std::execution::start` on the operation +state produced by its sender. In other words, +`start(connect(schedule(inline_scheduler()), receiver))` is just a fancy way of +saying `set_value(receiver)`, with the exception of the fact that `start` wants +to be passed an lvalue. -Although not a particularly useful scheduler, it serves to illustrate the basics of -implementing one. The `inline_scheduler`: +Although not a particularly useful scheduler, it serves to illustrate the basics +of implementing one. The `inline_scheduler`: 1. Customizes `execution::schedule` to return an instance of the sender type `_sender`. @@ -796,8 +973,9 @@ implementing one. The `inline_scheduler`: ### Single thread scheduler ### {#example-single-thread} -This example shows how to create a scheduler for an execution resource that consists of a single -thread. It is implemented in terms of a lower-level execution resource called `std::execution::run_loop`. +This example shows how to create a scheduler for an execution resource that +consists of a single thread. It is implemented in terms of a lower-level +execution resource called `std::execution::run_loop`. ```c++ class single_thread_context { @@ -809,6 +987,7 @@ public: : loop_() , thread_([this] { loop_.run(); }) {} + single_thread_context(single_thread_context&&) = delete; ~single_thread_context() { loop_.finish(); @@ -825,12 +1004,13 @@ public: }; ``` -The `single_thread_context` owns an event loop and a thread to drive it. In the destructor, it tells the event -loop to finish up what it's doing and then joins the thread, blocking for the event loop to drain. +The `single_thread_context` owns an event loop and a thread to drive it. In the +destructor, it tells the event loop to finish up what it's doing and then joins +the thread, blocking for the event loop to drain. The interesting bits are in the `execution::run_loop` context implementation. It is slightly too long to include here, so we only provide [a reference to -it](https://github.com/NVIDIA/stdexec/blob/c2cdb2a2abe2b29a34cf277728319d6ca92ec0bb/include/stdexec/execution.hpp#L3916-L4101), +it](https://github.com/NVIDIA/stdexec/blob/596707991a321ecf8219c03b79819ff4e8ecd278/include/stdexec/execution.hpp#L4201-L4339), but there is one noteworthy detail about its implementation: It uses space in its operation states to build an intrusive linked list of work items. In structured concurrency patterns, the operation states of nested operations @@ -841,7 +1021,10 @@ allocations. ## Examples: Server theme ## {#example-server} -In this section we look at some examples of how one would use senders to implement an HTTP server. The examples ignore the low-level details of the HTTP server and looks at how senders can be combined to achieve the goals of the project. +In this section we look at some examples of how one would use senders to +implement an HTTP server. The examples ignore the low-level details of the HTTP +server and looks at how senders can be combined to achieve the goals of the +project. General application context: * server application that processes images @@ -856,77 +1039,97 @@ General application context: ### Composability with `execution::let_*` ### {#example-server-let} Example context: -- we are looking at the flow of processing an HTTP request and sending back the response -- show how one can break the (slightly complex) flow into steps with `execution::let_*` functions -- different phases of processing HTTP requests are broken down into separate concerns -- each part of the processing might use different execution resources (details not shown in this example) -- error handling is generic, regardless which component fails; we always send the right response to the clients +- we are looking at the flow of processing an HTTP request and sending back the + response. +- show how one can break the (slightly complex) flow into steps with + `execution::let_*` functions. +- different phases of processing HTTP requests are broken down into separate + concerns. +- each part of the processing might use different execution resources (details + not shown in this example). +- error handling is generic, regardless which component fails; we always send + the right response to the clients. Goals: -- show how one can break more complex flows into steps with let_* functions -- exemplify the use of `let_value`, `let_error`, `let_stopped`, and `just` algorithms +- show how one can break more complex flows into steps with let_* functions. +- exemplify the use of `let_value`, `let_error`, `let_stopped`, and `just` + algorithms. ```c++ -namespace ex = std::execution; +namespace stdexec = std::execution; // Returns a sender that yields an http_request object for an incoming request -ex::sender auto schedule_request_start(read_requests_ctx ctx) {...} +stdexec::sender auto schedule_request_start(read_requests_ctx ctx) {...} + // Sends a response back to the client; yields a void signal on success -ex::sender auto send_response(const http_response& resp) {...} +stdexec::sender auto send_response(const http_response& resp) {...} + // Validate that the HTTP request is well-formed; forwards the request on success -ex::sender auto validate_request(const http_request& req) {...} +stdexec::sender auto validate_request(const http_request& req) {...} // Handle the request; main application logic -ex::sender auto handle_request(const http_request& req) { +stdexec::sender auto handle_request(const http_request& req) { //... - return ex::just(http_response{200, result_body}); + return stdexec::just(http_response{200, result_body}); } // Transforms server errors into responses to be sent to the client -ex::sender auto error_to_response(std::exception_ptr err) { +stdexec::sender auto error_to_response(std::exception_ptr err) { try { std::rethrow_exception(err); } catch (const std::invalid_argument& e) { - return ex::just(http_response{404, e.what()}); + return stdexec::just(http_response{404, e.what()}); } catch (const std::exception& e) { - return ex::just(http_response{500, e.what()}); + return stdexec::just(http_response{500, e.what()}); } catch (...) { - return ex::just(http_response{500, "Unknown server error"}); + return stdexec::just(http_response{500, "Unknown server error"}); } } + // Transforms cancellation of the server into responses to be sent to the client -ex::sender auto stopped_to_response() { - return ex::just(http_response{503, "Service temporarily unavailable"}); +stdexec::sender auto stopped_to_response() { + return stdexec::just(http_response{503, "Service temporarily unavailable"}); } + //... + // The whole flow for transforming incoming requests into responses -ex::sender auto snd = +stdexec::sender auto snd = // get a sender when a new request comes schedule_request_start(the_read_requests_ctx) // make sure the request is valid; throw if not - | ex::let_value(validate_request) + | stdexec::let_value(validate_request) // process the request in a function that may be using a different execution resource - | ex::let_value(handle_request) + | stdexec::let_value(handle_request) // If there are errors transform them into proper responses - | ex::let_error(error_to_response) + | stdexec::let_error(error_to_response) // If the flow is cancelled, send back a proper response - | ex::let_stopped(stopped_to_response) + | stdexec::let_stopped(stopped_to_response) // write the result back to the client - | ex::let_value(send_response) + | stdexec::let_value(send_response) // done ; + // execute the whole flow asynchronously -ex::start_detached(std::move(snd)); +stdexec::start_detached(std::move(snd)); ``` -The example shows how one can separate out the concerns for interpreting requests, validating requests, running the main logic for handling the request, generating error responses, handling cancellation and sending the response back to the client. -They are all different phases in the application, and can be joined together with the `let_*` functions. +The example shows how one can separate out the concerns for interpreting +requests, validating requests, running the main logic for handling the request, +generating error responses, handling cancellation and sending the response back +to the client. They are all different phases in the application, and can be +joined together with the `let_*` functions. -All our functions return `execution::sender` objects, so that they can all generate success, failure and cancellation paths. -For example, regardless where an error is generated (reading request, validating request or handling the response), we would have one common block to handle the error, and following error flows is easy. +All our functions return `execution::sender` objects, so that they can all +generate success, failure and cancellation paths. For example, regardless where +an error is generated (reading request, validating request or handling the +response), we would have one common block to handle the error, and following +error flows is easy. -Also, because of using `execution::sender` objects at any step, we might expect any of these steps to be completely asynchronous; the overall flow doesn't care. -Regardless of the execution resource in which the steps, or part of the steps are executed in, the flow is still the same. +Also, because of using `execution::sender` objects at any step, we might expect +any of these steps to be completely asynchronous; the overall flow doesn't care. +Regardless of the execution resource in which the steps, or part of the steps +are executed in, the flow is still the same. ### Moving between execution resources with `execution::on` and `execution::transfer` ### {#example-server-on} @@ -939,53 +1142,54 @@ Goals: - show how one can change the execution resource - exemplify the use of `on` and `transfer` algorithms - ```c++ -namespace ex = std::execution; +namespace stdexec = std::execution; -size_t legacy_read_from_socket(int sock, char* buffer, size_t buffer_len) {} -void process_read_data(const char* read_data, size_t read_len) {} +size_t legacy_read_from_socket(int sock, char* buffer, size_t buffer_len); +void process_read_data(const char* read_data, size_t read_len); //... // A sender that just calls the legacy read function -auto snd_read = ex::just(sock, buf, buf_len) | ex::then(legacy_read_from_socket); +auto snd_read = stdexec::just(sock, buf, buf_len) + | stdexec::then(legacy_read_from_socket); + // The entire flow auto snd = // start by reading data on the I/O thread - ex::on(io_sched, std::move(snd_read)) + stdexec::on(io_sched, std::move(snd_read)) // do the processing on the worker threads pool - | ex::transfer(work_sched) + | stdexec::transfer(work_sched) // process the incoming data (on worker threads) - | ex::then([buf](int read_len) { process_read_data(buf, read_len); }) + | stdexec::then([buf](int read_len) { process_read_data(buf, read_len); }) // done ; + // execute the whole flow asynchronously -ex::start_detached(std::move(snd)); +stdexec::start_detached(std::move(snd)); ``` -The example assume that we need to wrap some legacy code of reading sockets, and handle execution resource switching. -(This style of reading from socket may not be the most efficient one, but it's working for our purposes.) -For performance reasons, the reading from the socket needs to be done on the I/O thread, and all the processing needs to happen on a work-specific execution resource (i.e., thread pool). - -Calling `execution::on` will ensure that the given sender will be started on the given scheduler. -In our example, `snd_read` is going to be started on the I/O scheduler. -This sender will just call the legacy code. - -The completion-signal will be issued in the I/O execution resource, so we have to move it to the work thread pool. -This is achieved with the help of the `execution::transfer` algorithm. -The rest of the processing (in our case, the last call to `then`) will happen in the work thread pool. - -The reader should notice the difference between `execution::on` and `execution::transfer`. -The `execution::on` algorithm will ensure that the given sender will start in the specified context, and doesn't care where the completion-signal for that sender is sent. -The `execution::transfer` algorithm will not care where the given sender is going to be started, but will ensure that the completion-signal of will be transferred to the given context. - -## What this proposal is **not** ## {#intro-is-not} - -This paper is not a patch on top of [[P0443R14]]; we are not asking to update the existing paper, we are asking to retire it in favor of this paper, which is already self-contained; any example code within this paper can be written in Standard C++, without the need -to standardize any further facilities. - -This paper is not an alternative design to [[P0443R14]]; rather, we have taken the design in the current executors paper, and applied targeted fixes to allow it to fulfill the promises of the sender/receiver model, as well as provide all the facilities we consider -essential when writing user code using standard execution concepts; we have also applied the guidance of removing one-way executors from the paper entirely, and instead provided an algorithm based around senders that serves the same purpose. +The example assume that we need to wrap some legacy code of reading sockets, and +handle execution resource switching. (This style of reading from socket may not +be the most efficient one, but it's working for our purposes.) For performance +reasons, the reading from the socket needs to be done on the I/O thread, and all +the processing needs to happen on a work-specific execution resource (i.e., +thread pool). + +Calling `execution::on` will ensure that the given sender will be started on the +given scheduler. In our example, `snd_read` is going to be started on the I/O +scheduler. This sender will just call the legacy code. + +The completion-signal will be issued in the I/O execution resource, so we have +to move it to the work thread pool. This is achieved with the help of the +`execution::transfer` algorithm. The rest of the processing (in our case, the +last call to `then`) will happen in the work thread pool. + +The reader should notice the difference between `execution::on` and +`execution::transfer`. The `execution::on` algorithm will ensure that the given +sender will start in the specified context, and doesn't care where the +completion-signal for that sender is sent. The `execution::transfer` algorithm +will not care where the given sender is going to be started, but will ensure +that the completion-signal of will be transferred to the given context. ## Design changes from P0443 ## {#intro-compare} @@ -1014,120 +1218,263 @@ essential when writing user code using standard execution concepts; we have also 10. Some additional utilities are added: * `run_loop`: An execution resource that provides a multi-producer, single-consumer, first-in-first-out work queue. - * `receiver_adaptor`: A utility for algorithm authors for defining one - receiver type in terms of another. * `completion_signatures` and `transform_completion_signatures`: Utilities for describing the ways in which a sender can complete in a declarative syntax. ## Prior art ## {#intro-prior-art} -This proposal builds upon and learns from years of prior art with asynchronous and parallel programming frameworks in C++. In this section, we discuss async abstractions that have previously been suggested as a possible basis for asynchronous algorithms and why they fall short. +This proposal builds upon and learns from years of prior art with asynchronous +and parallel programming frameworks in C++. In this section, we discuss async +abstractions that have previously been suggested as a possible basis for +asynchronous algorithms and why they fall short. ### Futures ### {#intro-prior-art-futures} -A future is a handle to work that has already been scheduled for execution. It is one end of a communication channel; the other end is a promise, used to receive the result from the concurrent operation and to communicate it to the future. +A future is a handle to work that has already been scheduled for execution. It +is one end of a communication channel; the other end is a promise, used to +receive the result from the concurrent operation and to communicate it to the +future. -Futures, as traditionally realized, require the dynamic allocation and management of a shared state, synchronization, and typically type-erasure of work and continuation. Many of these costs are inherent in the nature of "future" as a handle to work that is already scheduled for execution. These expenses rule out the future abstraction for many uses and makes it a poor choice for a basis of a generic mechanism. +Futures, as traditionally realized, require the dynamic allocation and +management of a shared state, synchronization, and typically type-erasure of +work and continuation. Many of these costs are inherent in the nature of +"future" as a handle to work that is already scheduled for execution. These +expenses rule out the future abstraction for many uses and makes it a poor +choice for a basis of a generic mechanism. ### Coroutines ### {#intro-prior-art-coroutines} -C++20 coroutines are frequently suggested as a basis for asynchronous algorithms. It's fair to ask why, if we added coroutines to C++, are we suggesting the addition of a library-based abstraction for asynchrony. Certainly, coroutines come with huge syntactic and semantic advantages over the alternatives. - -Although coroutines are lighter weight than futures, coroutines suffer many of the same problems. Since they typically start suspended, they can avoid synchronizing the chaining of dependent work. However in many cases, coroutine frames require an unavoidable dynamic allocation and indirect function calls. This is done to hide the layout of the coroutine frame from the C++ type system, which in turn makes possible the separate compilation of coroutines and certain compiler optimizations, such as optimization of the coroutine frame size. - -Those advantages come at a cost, though. Because of the dynamic allocation of coroutine frames, coroutines in embedded or heterogeneous environments, which often lack support for dynamic allocation, require great attention to detail. And the allocations and indirections tend to complicate the job of the inliner, often resulting in sub-optimal codegen. - -The coroutine language feature mitigates these shortcomings somewhat with the HALO optimization [[P0981R0]], which leverages existing compiler optimizations such as allocation elision and devirtualization to inline the coroutine, completely eliminating the runtime overhead. However, HALO requires a sophisiticated compiler, and a fair number of stars need to align for the optimization to kick in. In our experience, more often than not in real-world code today's compilers are not able to inline the coroutine, resulting in allocations and indirections in the generated code. - -In a suite of generic async algorithms that are expected to be callable from hot code paths, the extra allocations and indirections are a deal-breaker. It is for these reasons that we consider coroutines a poor choise for a basis of all standard async. +C++20 coroutines are frequently suggested as a basis for asynchronous +algorithms. It's fair to ask why, if we added coroutines to C++, are we +suggesting the addition of a library-based abstraction for asynchrony. +Certainly, coroutines come with huge syntactic and semantic advantages over the +alternatives. + +Although coroutines are lighter weight than futures, coroutines suffer many of +the same problems. Since they typically start suspended, they can avoid +synchronizing the chaining of dependent work. However in many cases, coroutine +frames require an unavoidable dynamic allocation and indirect function calls. +This is done to hide the layout of the coroutine frame from the C++ type system, +which in turn makes possible the separate compilation of coroutines and certain +compiler optimizations, such as optimization of the coroutine frame size. + +Those advantages come at a cost, though. Because of the dynamic allocation of +coroutine frames, coroutines in embedded or heterogeneous environments, which +often lack support for dynamic allocation, require great attention to detail. +And the allocations and indirections tend to complicate the job of the inliner, +often resulting in sub-optimal codegen. + +The coroutine language feature mitigates these shortcomings somewhat with the +HALO optimization [[P0981R0]], which leverages existing compiler optimizations +such as allocation elision and devirtualization to inline the coroutine, +completely eliminating the runtime overhead. However, HALO requires a +sophisiticated compiler, and a fair number of stars need to align for the +optimization to kick in. In our experience, more often than not in real-world +code today's compilers are not able to inline the coroutine, resulting in +allocations and indirections in the generated code. + +In a suite of generic async algorithms that are expected to be callable from hot +code paths, the extra allocations and indirections are a deal-breaker. It is for +these reasons that we consider coroutines a poor choise for a basis of all +standard async. ### Callbacks ### {#intro-prior-art-callbacks} -Callbacks are the oldest, simplest, most powerful, and most efficient mechanism for creating chains of work, but suffer problems of their own. Callbacks must propagate either errors or values. This simple requirement yields many different interface possibilities. The lack of a standard callback shape obstructs generic design. +Callbacks are the oldest, simplest, most powerful, and most efficient mechanism +for creating chains of work, but suffer problems of their own. Callbacks must +propagate either errors or values. This simple requirement yields many different +interface possibilities. The lack of a standard callback shape obstructs generic +design. -Additionally, few of these possibilities accommodate cancellation signals when the user requests upstream work to stop and clean up. +Additionally, few of these possibilities accommodate cancellation signals when +the user requests upstream work to stop and clean up. ## Field experience ## {#intro-field-experience} ### libunifex ### {#intro-field-experience-libunifex} -This proposal draws heavily from our field experience with [libunifex](https://github.com/facebookexperimental/libunifex). Libunifex implements all of the concepts and customization points defined in this paper (with slight variations -- the design of P2300 has evolved due to LEWG feedback), many of this paper's algorithms (some under different names), and much more besides. - -Libunifex has several concrete schedulers in addition to the `run_loop` suggested here (where it is called `manual_event_loop`). It has schedulers that dispatch efficiently to epoll and io_uring on Linux and the Windows Thread Pool on Windows. +This proposal draws heavily from our field experience with +[libunifex](https://github.com/facebookexperimental/libunifex). Libunifex +implements all of the concepts and customization points defined in this paper +(with slight variations -- the design of P2300 has evolved due to LEWG +feedback), many of this paper's algorithms (some under different names), and +much more besides. + +Libunifex has several concrete schedulers in addition to the `run_loop` +suggested here (where it is called `manual_event_loop`). It has schedulers that +dispatch efficiently to epoll and io_uring on Linux and the Windows Thread Pool +on Windows. + +In addition to the proposed interfaces and the additional schedulers, it has +several important extensions to the facilities described in this paper, which +demonstrate directions in which these abstractions may be evolved over time, +including: + +* Timed schedulers, which permit scheduling work on an execution resource at a + particular time or after a particular duration has elapsed. In addition, it + provides time-based algorithms. +* File I/O schedulers, which permit filesystem I/O to be scheduled. +* Two complementary abstractions for streams (asynchronous ranges), and a set of + stream-based algorithms. + +Libunifex has seen heavy production use at Meta. An employee summarizes it +as follows: + +> As of June, 2023, Unifex is still used in production at Meta. It's used to +> express the asynchrony in +> [rsys](https://engineering.fb.com/2020/12/21/video-engineering/rsys/), and is +> therefore serving video calling to billions of people every month on Meta's +> social networking apps on iOS, Android, Windows, and macOS. It's also serving +> the Virtual Desktop experience on Oculus Quest devices, and some internal uses +> that run on Linux. +> +> One team at Meta has migrated from `folly::Future` to `unifex::task` and seen +> significant developer efficiency improvements. Coroutines are easier to +> understand than chained futures so the team was able to meet requirements for +> certain constrained environments that would have been too complicated to +> maintain with futures. +> +> In all the cases mentioned above, developers mix-and-match between the sender +> algorithms in Unifex and Unifex's coroutine type, `unifex::task`. We also rely +> on `unifex::task`'s scheduler affinity to minimize surprise when programming +> with coroutines. -In addition to the proposed interfaces and the additional schedulers, it has several important extensions to the facilities described in this paper, which demonstrate directions in which these abstractions may be evolved over time, including: +### stdexec ### {#intro-field-experience-stdexec} -* Timed schedulers, which permit scheduling work on an execution resource at a particular time or after a particular duration has elapsed. In addition, it provides time-based algorithms. -* File I/O schedulers, which permit filesystem I/O to be scheduled. -* Two complementary abstractions for streams (asynchronous ranges), and a set of stream-based algorithms. +[stdexec](https://github.com/NVIDIA/stdexec) is the reference implementation of +this proposal. It is a complete implementation, written from the specification +in this paper, and is current with [\R8](https://wg21.link/P2300R8). -Libunifex has seen heavy production use at Facebook. As of October 2021, it is currently used in production within the following applications and platforms: +The original purpose of stdexec was to help find specification bugs and to +harden the wording of the proposal, but it has since become one of NVIDIA's core +C++ libraries for high-performance computing. In addition to the facilities +proposed in this paper, stdexec has schedulers for CUDA, Intel TBB, and MacOS. +Like libunifex, its scope has also expanded to include a streaming abstraction +and stream algorithms, and time-based schedulers and algorithms. -* Facebook Messenger on iOS, Android, Windows, and macOS -* Instagram on iOS and Android -* Facebook on iOS and Android -* Portal -* An internal Facebook product that runs on Linux +The stdexec project has seen lots of community interest and contributions. At the +time of writing (March, 2024), the GitHub repository has 1.2k stars, 130 forks, +and 50 contributors. -All of these applications are making direct use of the sender/receiver abstraction as presented in this paper. One product (Instagram on iOS) is making use of the sender/coroutine integration as presented. The monthly active users of these products number in the billions. +stdexec is fit for broad use and for ultimate contribution to libc++. ### Other implementations ### {#intro-field-experience-other-implementations} -The authors are aware of a number of other implementations of sender/receiver from this paper. These are presented here in perceived order of maturity and field experience. +The authors are aware of a number of other implementations of sender/receiver +from this paper. These are presented here in perceived order of maturity and +field experience. * [[HPX]] - HPX is a general purpose C++ runtime system for parallel and distributed applications that has been under active development since 2007. HPX exposes a uniform, standards-oriented API, and keeps abreast of the latest standards and proposals. It is used in a wide variety of high-performance applications. + HPX is a general purpose C++ runtime system for parallel and distributed + applications that has been under active development since 2007. HPX exposes + a uniform, standards-oriented API, and keeps abreast of the latest standards + and proposals. It is used in a wide variety of high-performance + applications. - The sender/receiver implementation in HPX has been under active development since May 2020. It is used to erase the overhead of futures and to make it possible to write efficient generic asynchronous algorithms that are agnostic to their execution resource. In HPX, algorithms can migrate execution between execution resources, even to GPUs and back, using a uniform standard interface with sender/receiver. + The sender/receiver implementation in HPX has been under active development + since May 2020. It is used to erase the overhead of futures and to make it + possible to write efficient generic asynchronous algorithms that are + agnostic to their execution resource. In HPX, algorithms can migrate + execution between execution resources, even to GPUs and back, using a + uniform standard interface with sender/receiver. - Far and away, the HPX team has the greatest usage experience outside Facebook. Mikael Simberg summarizes the experience as follows: + Far and away, the HPX team has the greatest usage experience outside + Facebook. Mikael Simberg summarizes the experience as follows: - > Summarizing, for us the major benefits of sender/receiver compared to the old model are: + > Summarizing, for us the major benefits of sender/receiver compared to the + > old model are: > > 1. Proper hooks for transitioning between execution resources. > 2. The adaptors. Things like `let_value` are really nice additions. - > 3. Separation of the error channel from the value channel (also cancellation, but we don't have much use for it at the moment). Even from a teaching perspective having to explain that the future `f2` in the continuation will always be ready here `f1.then([](future f2) {...})` is enough of a reason to separate the channels. All the other obvious reasons apply as well of course. - > 4. For futures we have a thing called `hpx::dataflow` which is an optimized version of `when_all(...).then(...)` which avoids intermediate allocations. With the sender/receiver `when_all(...) | then(...)` we get that "for free". + > 3. Separation of the error channel from the value channel (also + > cancellation, but we don't have much use for it at the moment). Even + > from a teaching perspective having to explain that the future `f2` in + > the continuation will always be ready here `f1.then([](future f2) + > {...})` is enough of a reason to separate the channels. All the other + > obvious reasons apply as well of course. + > 4. For futures we have a thing called `hpx::dataflow` which is an + > optimized version of `when_all(...).then(...)` which avoids + > intermediate allocations. With the sender/receiver `when_all(...) | + > then(...)` we get that "for free". * [kuhllib](https://github.com/dietmarkuehl/kuhllib/) by Dietmar Kuehl - This is a prototype Standard Template Library with an implementation of sender/receiver that has been under development since May, 2021. It is significant mostly for its support for sender/receiver-based networking interfaces. + This is a prototype Standard Template Library with an implementation of + sender/receiver that has been under development since May, 2021. It is + significant mostly for its support for sender/receiver-based networking + interfaces. - Here, Dietmar Kuehl speaks about the perceived complexity of sender/receiver: + Here, Dietmar Kuehl speaks about the perceived complexity of + sender/receiver: - > ... and, also similar to STL: as I had tried to do things in that space before I recognize sender/receivers as being maybe complicated in one way but a huge simplification in another one: like with STL I think those who use it will benefit - if not from the algorithm from the clarity of abstraction: the separation of concerns of STL (the algorithm being detached from the details of the sequence representation) is a major leap. Here it is rather similar: the separation of the asynchronous algorithm from the details of execution. Sure, there is some glue to tie things back together but each of them is simpler than the combined result. + > ... and, also similar to STL: as I had tried to do things in that space + > before I recognize sender/receivers as being maybe complicated in one way + > but a huge simplification in another one: like with STL I think those who + > use it will benefit - if not from the algorithm from the clarity of + > abstraction: the separation of concerns of STL (the algorithm being + > detached from the details of the sequence representation) is a major leap. + > Here it is rather similar: the separation of the asynchronous algorithm + > from the details of execution. Sure, there is some glue to tie things back + > together but each of them is simpler than the combined result. Elsewhere, he said: - > ... to me it feels like sender/receivers are like iterators when STL emerged: they are different from what everybody did in that space. However, everything people are already doing in that space isn’t right. - - Kuehl also has experience teaching sender/receiver at Bloomberg. About that experience he says: - - > When I asked [my students] specifically about how complex they consider the sender/receiver stuff the feedback was quite unanimous that the sender/receiver parts aren’t trivial but not what contributes to the complexity. + > ... to me it feels like sender/receivers are like iterators when STL + > emerged: they are different from what everybody did in that space. + > However, everything people are already doing in that space isn't right. -* [The reference implementation](https://github.com/NVIDIA/stdexec) + Kuehl also has experience teaching sender/receiver at Bloomberg. About that + experience he says: - This is a complete implementation written from the specification in this paper. Its primary purpose is to help find specification bugs and to harden the wording of the proposal. It is - fit for broad use and for contribution to libc++. + > When I asked [my students] specifically about how complex they consider + > the sender/receiver stuff the feedback was quite unanimous that the + > sender/receiver parts aren't trivial but not what contributes to the + > complexity. - It is current with R8 of this paper. +* [C++ Bare Metal Senders and Receivers](https://github.com/intel/cpp-baremetal-senders-and-receivers) from Intel -* [Reference implementation for the Microsoft STL](https://github.com/miscco/STL/tree/proposal/executors) by Michael Schellenberger Costa - - This is another reference implementation of this proposal, this time in a fork of the Mircosoft STL implementation. Michael Schellenberger Costa is not affiliated with Microsoft. He intends to contribute this implementation upstream when it is complete. + This is a prototype implementation of sender/receiver by Intel that has been + under development since August, 2023. It is significant mostly for its + support for bare metal (no operating system) and embedded systems, a domain + for which senders are particularly well-suited due to their very low dynamic + memory requirements. ### Inspirations ### {#intro-field-experience-inspirations} -This proposal also draws heavily from our experience with [Thrust](https://github.com/NVIDIA/thrust) and [Agency](https://github.com/agency-library/agency). It is also inspired by the needs of countless other C++ frameworks for asynchrony, parallelism, and concurrency, including: +This proposal also draws heavily from our experience with +[Thrust](https://github.com/NVIDIA/thrust) and +[Agency](https://github.com/agency-library/agency). It is also inspired by the +needs of countless other C++ frameworks for asynchrony, parallelism, and +concurrency, including: -* HPX +* \[HPX](https://github.com/STEllAR-GROUP/hpx) * [Folly](https://github.com/facebook/folly/blob/master/folly/docs/Futures.md) * [stlab](https://stlab.cc/libraries/concurrency/) # Revision history # {#revisions} +## R9 ## {#r9} + +The changes since R8 are as follows: + +Fixes: + + * The `tag_invoke` mechanism has been replace with member functions + for customizations as per \[P2855](https://wg21.link/p2855). + + * Per guidance from LWG and LEWG, `receiver_adaptor` has been removed. + + * The `receiver` concept is tweaked to requires that receiver types are not + `final`. Without `receiver_adaptor` and `tag_invoke`, receiver adaptors + are easily written using implementation inheritance. + +Enhancements: + + * The specification of the `sync_wait` algorithm has been updated + for clarity. + ## R8 ## {#r8} The changes since R7 are as follows: @@ -1228,8 +1575,9 @@ The changes since R5 are as follows: are no longer needed and are dropped. * `ensure_started` and `split` are changed to persist the result of calling `get_attrs()` on the input sender. - * Reorder constraints of the `scheduler` and `receiver` concepts to avoid constraint recursion - when used in tandem with poorly-constrained, implicitly convertible types. + * Reorder constraints of the `scheduler` and `receiver` concepts to avoid + constraint recursion when used in tandem with poorly-constrained, implicitly + convertible types. * Re-express the `sender_of` concept to be more ergonomic and general. * Make the specification of the alias templates `value_types_of_t` and `error_types_of_t`, and the variable template `sends_done` more concise by @@ -1415,11 +1763,12 @@ environment entirely, passing then as separate arguments along with the sender t **Impact:** -This change, apart from increasing the expressive power of the sender/receiver abstraction, has the following impact: +This change, apart from increasing the expressive power of the sender/receiver +abstraction, has the following impact: * Typed senders become moderately more challenging to write. (The new - `completion_signatures` and `transform_completion_signatures` utilities are added - to ease this extra burden.) + `completion_signatures` and `transform_completion_signatures` utilities are + added to ease this extra burden.) * Sender adaptor algorithms that previously constrained their sender arguments to satisfy the `typed_sender` concept can no longer do so as the receiver is @@ -1433,14 +1782,21 @@ This change, apart from increasing the expressive power of the sender/receiver a **"Has it been implemented?"** Yes, the reference implementation, which can be found at -https://github.com/NVIDIA/stdexec, has implemented this -design as well as some dependently-typed senders to confirm that it works. +[https://github.com/NVIDIA/stdexec](https://github.com/NVIDIA/stdexec), has +implemented this design as well as some dependently-typed senders to confirm +that it works. **Implementation experience** -Although this change has not yet been made in libunifex, the most widely adopted sender/receiver implementation, a similar design can be found in Folly's coroutine support library. In Folly.Coro, it is possible to await a special awaitable to obtain the current coroutine's associated scheduler (called an executor in Folly). +Although this change has not yet been made in libunifex, the most widely adopted +sender/receiver implementation, a similar design can be found in Folly's +coroutine support library. In Folly.Coro, it is possible to await a special +awaitable to obtain the current coroutine's associated scheduler (called an +executor in Folly). -For instance, the following Folly code grabs the current executor, schedules a task for execution on that executor, and starts the resulting (scheduled) task by enqueueing it for execution. +For instance, the following Folly code grabs the current executor, schedules a +task for execution on that executor, and starts the resulting (scheduled) task +by enqueueing it for execution. ```c++ // From Facebook's Folly open source library: @@ -1486,19 +1842,20 @@ R4: * Receiver queries have been moved from the receiver into a separate environment object. * Receivers have an associated environment. The new `get_env` CPO retrieves a - receiver's environment. If a receiver doesn't implement `get_env`, it returns - an unspecified "empty" environment -- an empty struct. + receiver's environment. If a receiver doesn't implement `get_env`, it + returns an unspecified "empty" environment -- an empty struct. * `sender_traits` now takes an optional `Env` parameter that is used to determine the error/value types. -* The primary `sender_traits` template is replaced with a `completion_signatures_of_t` - alias implemented in terms of a new `get_completion_signatures` CPO that dispatches - with `tag_invoke`. `get_completion_signatures` takes a sender and an optional - environment. A sender can customize this to specify its value/error types. +* The primary `sender_traits` template is replaced with a + `completion_signatures_of_t` alias implemented in terms of a new + `get_completion_signatures` CPO that dispatches with `tag_invoke`. + `get_completion_signatures` takes a sender and an optional environment. A + sender can customize this to specify its value/error types. * Support for untyped senders is dropped. The `typed_sender` concept has been renamed to `sender` and now takes an optional environment. -* The environment argument to the `sender` concept and the `get_completion_signatures` - CPO defaults to `no_env`. All environment queries fail (are ill-formed) when - passed an instance of `no_env`. +* The environment argument to the `sender` concept and the + `get_completion_signatures` CPO defaults to `no_env`. All environment + queries fail (are ill-formed) when passed an instance of `no_env`. * A type `S` is required to satisfy sender<S> to be considered a sender. If it doesn't know what types it will complete with independent of an environment, it returns an instance of the placeholder @@ -1516,40 +1873,50 @@ The changes since R2 are as follows: Fixes: - * Fix specification of the `on` algorithm to clarify lifetimes of - intermediate operation states and properly scope the `get_scheduler` query. - * Fix a memory safety bug in the implementation of connect-awaitable. - * Fix recursive definition of the `scheduler` concept. +* Fix specification of the `on` algorithm to clarify lifetimes of intermediate + operation states and properly scope the `get_scheduler` query. +* Fix a memory safety bug in the implementation of + connect-awaitable. +* Fix recursive definition of the `scheduler` concept. Enhancements: - * Add `run_loop` execution resource. - * Add `receiver_adaptor` utility to simplify writing receivers. - * Require a scheduler's sender to model `sender_of` and provide a completion scheduler. - * Specify the cancellation scope of the `when_all` algorithm. - * Make `as_awaitable` a customization point. - * Change `connect`'s handling of awaitables to consider those types that are awaitable owing to customization of `as_awaitable`. - * Add `value_types_of_t` and `error_types_of_t` alias templates; rename `stop_token_type_t` to `stop_token_of_t`. - * Add a design rationale for the removal of the possibly eager algorithms. - * Expand the section on field experience. +* Add `run_loop` execution resource. +* Add `receiver_adaptor` utility to simplify writing receivers. +* Require a scheduler's sender to model `sender_of` and provide a completion + scheduler. +* Specify the cancellation scope of the `when_all` algorithm. +* Make `as_awaitable` a customization point. +* Change `connect`'s handling of awaitables to consider those types that are + awaitable owing to customization of `as_awaitable`. +* Add `value_types_of_t` and `error_types_of_t` alias templates; rename + `stop_token_type_t` to `stop_token_of_t`. +* Add a design rationale for the removal of the possibly eager algorithms. +* Expand the section on field experience. ## R2 ## {#r2} The changes since R1 are as follows: * Remove the eagerly executing sender algorithms. -* Extend the `execution::connect` customization point and the `sender_traits<>` template to recognize awaitables as `typed_sender`s. -* Add utilities `as_awaitable()` and `with_awaitable_senders<>` so a coroutine type can trivially make senders awaitable with a coroutine. +* Extend the `execution::connect` customization point and the `sender_traits<>` + template to recognize awaitables as `typed_sender`s. +* Add utilities `as_awaitable()` and `with_awaitable_senders<>` so a coroutine + type can trivially make senders awaitable with a coroutine. * Add a section describing the design of the sender/awaitable interactions. -* Add a section describing the design of the cancellation support in sender/receiver. +* Add a section describing the design of the cancellation support in + sender/receiver. * Add a section showing examples of simple sender adaptor algorithms. * Add a section showing examples of simple schedulers. -* Add a few more examples: a sudoku solver, a parallel recursive file copy, and an echo server. +* Add a few more examples: a sudoku solver, a parallel recursive file copy, and + an echo server. * Refined the forward progress guarantees on the `bulk` algorithm. -* Add a section describing how to use a range of senders to represent async sequences. +* Add a section describing how to use a range of senders to represent async + sequences. * Add a section showing how to use senders to represent partial success. * Add sender factories `execution::just_error` and `execution::just_stopped`. -* Add sender adaptors `execution::stopped_as_optional` and `execution::stopped_as_error`. +* Add sender adaptors `execution::stopped_as_optional` and + `execution::stopped_as_error`. * Document more production uses of sender/receiver at scale. * Various fixes of typos and bugs. @@ -1572,30 +1939,30 @@ Initial revision. The following three sections describe the entirety of the proposed design. * [[#design-intro]] describes the conventions used through the rest of the - design sections, as well as an example illustrating how we envision code will - be written using this proposal. + design sections, as well as an example illustrating how we envision code + will be written using this proposal. * [[#design-user]] describes all the functionality from the perspective we intend for users: it describes the various concepts they will interact with, and what their programming model is. * [[#design-implementer]] describes the machinery that allows for that programming model to function, and the information contained there is - necessary for people implementing senders and sender algorithms (including the - standard library ones) - but is not necessary to use senders productively. + necessary for people implementing senders and sender algorithms (including + the standard library ones) - but is not necessary to use senders + productively. ## Conventions ## {#design-conventions} The following conventions are used throughout the design section: - 1. The namespace proposed in this paper is the same as in [[P0443R14]]: - `std::execution`; however, for brevity, the `std::` part of this name is - omitted. When you see `execution::foo`, treat that as - `std::execution::foo`. - 2. Universal references and explicit calls to `std::move`/`std::forward` are - omitted in code samples and signatures for simplicity; assume universal - references and perfect forwarding unless stated otherwise. - 3. None of the names proposed here are names that we are particularly attached - to; consider the names to be reasonable placeholders that can freely be - changed, should the committee want to do so. +1. The namespace proposed in this paper is the same as in [[P0443R14]]: + `std::execution`; however, for brevity, the `std::` part of this name is + omitted. When you see `execution::foo`, treat that as `std::execution::foo`. +2. Universal references and explicit calls to `std::move`/`std::forward` are + omitted in code samples and signatures for simplicity; assume universal + references and perfect forwarding unless stated otherwise. +3. None of the names proposed here are names that we are particularly attached + to; consider the names to be reasonable placeholders that can freely be + changed, should the committee want to do so. ## Queries and algorithms ## {#design-queries-and-algorithms} @@ -1635,9 +2002,9 @@ execution::sender auto snd = execution::schedule(sch); // on the execution resource associated with sch -Note that a particular scheduler type may provide other kinds of scheduling operations -which are supported by its associated execution resource. It is not limited to scheduling -purely using the `execution::schedule` API. +Note that a particular scheduler type may provide other kinds of scheduling +operations which are supported by its associated execution resource. It is not +limited to scheduling purely using the `execution::schedule` API. Future papers will propose additional scheduler concepts that extend `scheduler` to add other capabilities. For example: @@ -1684,12 +2051,15 @@ this_thread::sync_wait(cont); ## Senders are composable through sender algorithms ## {#design-composable} -Asynchronous programming often departs from traditional code structure and control flow that we are familiar with. -A successful asynchronous framework must provide an intuitive story for composition of asynchronous work: expressing dependencies, passing objects, managing object lifetimes, etc. +Asynchronous programming often departs from traditional code structure and +control flow that we are familiar with. A successful asynchronous framework must +provide an intuitive story for composition of asynchronous work: expressing +dependencies, passing objects, managing object lifetimes, etc. -The true power and utility of senders is in their composability. -With senders, users can describe generic execution pipelines and graphs, and then run them on and across a variety of different schedulers. -Senders are composed using [=sender algorithms=]: +The true power and utility of senders is in their composability. With senders, +users can describe generic execution pipelines and graphs, and then run them on +and across a variety of different schedulers. Senders are composed using +[=sender algorithms=]: * [=sender factories=], algorithms that take no senders and return a sender. * [=sender adaptors=], algorithms that take (and potentially @@ -1699,30 +2069,64 @@ Senders are composed using [=sender algorithms=]: ## Senders can propagate completion schedulers ## {#design-propagation} -One of the goals of executors is to support a diverse set of execution resources, including traditional thread pools, task and fiber frameworks (like HPX and [Legion](https://github.com/StanfordLegion/legion)), and GPUs and other accelerators (managed by runtimes such as CUDA or SYCL). -On many of these systems, not all execution agents are created equal and not all functions can be run on all execution agents. -Having precise control over the execution resource used for any given function call being submitted is important on such systems, and the users of standard execution facilities will expect to be able to express such requirements. - -[[P0443R14]] was not always clear about the place of execution of any given piece of code. -Precise control was present in the two-way execution API present in earlier executor designs, but it has so far been missing from the senders design. There has been a proposal ([[P1897R3]]) to provide a number of sender algorithms that would enforce certain rules on the places of execution -of the work described by a sender, but we have found those sender algorithms to be insufficient for achieving the best performance on all platforms that are of interest to us. The implementation strategies that we are aware of result in one of the following situations: - - 1. trying to submit work to one execution resource (such as a CPU thread pool) from another execution resource (such as a GPU or a task framework), which assumes that all execution agents are as capable as a `std::thread` (which they aren't). - 2. forcibly interleaving two adjacent execution graph nodes that are both executing on one execution resource (such as a GPU) with glue code that runs on another execution resource (such as a CPU), which is prohibitively expensive for some execution resources (such as CUDA or SYCL). - 3. having to customise most or all sender algorithms to support an execution resource, so that you can avoid problems described in 1. and 2, which we believe is impractical and brittle based on months of field experience attempting this in [Agency](https://github.com/agency-library/agency). - -None of these implementation strategies are acceptable for many classes of parallel runtimes, such as task frameworks (like HPX) or accelerator runtimes (like CUDA or SYCL). - -Therefore, in addition to the `on` sender algorithm from [[P1897R3]], we are proposing a way for senders to advertise what scheduler (and by extension what execution resource) they will complete on. -Any given sender may have [=completion schedulers=] for some or all of the signals (value, error, or stopped) it completes with (for more detail on the completion-signals, see [[#design-receivers]]). -When further work is attached to that sender by invoking sender algorithms, that work will also complete on an appropriate completion scheduler. +One of the goals of executors is to support a diverse set of execution +resources, including traditional thread pools, task and fiber frameworks (like +\[HPX](https://github.com/STEllAR-GROUP/hpx) +[Legion](https://github.com/StanfordLegion/legion)), and GPUs and other +accelerators (managed by runtimes such as CUDA or SYCL). On many of these +systems, not all execution agents are created equal and not all functions can be +run on all execution agents. Having precise control over the execution resource +used for any given function call being submitted is important on such systems, +and the users of standard execution facilities will expect to be able to express +such requirements. + +[[P0443R14]] was not always clear about the place of execution of any +given piece of code. Precise control was present in the two-way execution API +present in earlier executor designs, but it has so far been missing from the +senders design. There has been a proposal ([[P1897R3]]) to provide a number of +sender algorithms that would enforce certain rules on the places of execution of +the work described by a sender, but we have found those sender algorithms to be +insufficient for achieving the best performance on all platforms that are of +interest to us. The implementation strategies that we are aware of result in one +of the following situations: + + 1. trying to submit work to one execution resource (such as a CPU thread pool) + from another execution resource (such as a GPU or a task framework), which + assumes that all execution agents are as capable as a `std::thread` (which + they aren't). + 2. forcibly interleaving two adjacent execution graph nodes that are both + executing on one execution resource (such as a GPU) with glue code that + runs on another execution resource (such as a CPU), which is prohibitively + expensive for some execution resources (such as CUDA or SYCL). + 3. having to customise most or all sender algorithms to support an execution + resource, so that you can avoid problems described in 1. and 2, which we + believe is impractical and brittle based on months of field experience + attempting this in [Agency](https://github.com/agency-library/agency). + +None of these implementation strategies are acceptable for many classes of +parallel runtimes, such as task frameworks (like +\[HPX](https://github.com/STEllAR-GROUP/hpx)) or accelerator runtimes (like CUDA +or SYCL). + +Therefore, in addition to the `on` sender algorithm from [[P1897R3]], we are +proposing a way for senders to advertise what scheduler (and by extension what +execution resource) they will complete on. Any given sender may have +[=completion schedulers=] for some or all of the signals (value, error, or +stopped) it completes with (for more detail on the completion-signals, see +[[#design-receivers]]). When further work is attached to that sender by invoking +sender algorithms, that work will also complete on an appropriate completion +scheduler. ### `execution::get_completion_scheduler` ### {#design-sender-query-get_completion_scheduler} -`get_completion_scheduler` is a query that retrieves the completion scheduler for a specific completion-signal from a sender's environment. -For a sender that lacks a completion scheduler query for a given signal, calling `get_completion_scheduler` is ill-formed. -If a sender advertises a completion scheduler for a signal in this way, that sender must ensure that it [=send|sends=] that signal on an execution agent belonging to an execution resource represented by a scheduler returned from this function. -See [[#design-propagation]] for more details. +`get_completion_scheduler` is a query that retrieves the completion scheduler +for a specific completion-signal from a sender's environment. For a sender that +lacks a completion scheduler query for a given signal, calling +`get_completion_scheduler` is ill-formed. If a sender advertises a completion +scheduler for a signal in this way, that sender must ensure that it +[=send|sends=] that signal on an execution agent belonging to an execution +resource represented by a scheduler returned from this function. See +[[#design-propagation]] for more details.
 execution::scheduler auto cpu_sched = new_thread_scheduler{};
@@ -1751,12 +2155,20 @@ execution::scheduler auto completion_sch3 =
 
 ## Execution resource transitions are explicit ## {#design-transitions}
 
-[[P0443R14]] does not contain any mechanisms for performing an execution resource transition. The only sender algorithm that can create a sender that will move execution to a *specific* execution resource is `execution::schedule`, which does not take an input sender.
-That means that there's no way to construct sender chains that traverse different execution resources. This is necessary to fulfill the promise of senders being able to replace two-way executors, which had this capability.
+[[P0443R14]] does not contain any mechanisms for performing an execution
+resource transition. The only sender algorithm that can create a sender that
+will move execution to a *specific* execution resource is `execution::schedule`,
+which does not take an input sender. That means that there's no way to construct
+sender chains that traverse different execution resources. This is necessary to
+fulfill the promise of senders being able to replace two-way executors, which
+had this capability.
 
-We propose that, for senders advertising their [=completion scheduler=], all execution resource transitions must be explicit; running user code anywhere but where they defined it to run must be considered a bug.
+We propose that, for senders advertising their [=completion scheduler=], all
+execution resource transitions must be explicit; running user code
+anywhere but where they defined it to run must be considered a bug.
 
-The `execution::transfer` sender adaptor performs a transition from one execution resource to another:
+The `execution::transfer` sender adaptor performs a transition from one
+execution resource to another:
 
 
 execution::scheduler auto sch1 = ...;
@@ -1801,31 +2213,41 @@ overloads. Multi-shot senders should also define overloads of
 `execution::connect` that accept rvalue-qualified senders to allow the sender to
 be also used in places where only a single-shot sender is required.
 
-If the user of a sender does not require the sender to remain valid after connecting it to a
-receiver then it can pass an rvalue-reference to the sender to the call to `execution::connect`.
-Such usages should be able to accept either single-shot or multi-shot senders.
+If the user of a sender does not require the sender to remain valid after
+connecting it to a receiver then it can pass an rvalue-reference to the sender
+to the call to `execution::connect`. Such usages should be able to accept either
+single-shot or multi-shot senders.
 
-If the caller does wish for the sender to remain valid after the call then it can pass an lvalue-qualified sender
-to the call to `execution::connect`. Such usages will only accept multi-shot senders.
+If the caller does wish for the sender to remain valid after the call then it
+can pass an lvalue-qualified sender to the call to `execution::connect`. Such
+usages will only accept multi-shot senders.
 
-Algorithms that accept senders will typically either decay-copy an input sender and store it somewhere
-for later usage (for example as a data-member of the returned sender) or will immediately call
-`execution::connect` on the input sender, such as in `this_thread::sync_wait` or `execution::start_detached`.
+Algorithms that accept senders will typically either decay-copy an input sender
+and store it somewhere for later usage (for example as a data-member of the
+returned sender) or will immediately call `execution::connect` on the input
+sender, such as in `this_thread::sync_wait` or `execution::start_detached`.
 
-Some multi-use sender algorithms may require that an input sender be copy-constructible but will only call
-`execution::connect` on an rvalue of each copy, which still results in effectively executing the operation multiple times.
-Other multi-use sender algorithms may require that the sender is move-constructible but will invoke `execution::connect`
-on an lvalue reference to the sender.
+Some multi-use sender algorithms may require that an input sender be
+copy-constructible but will only call `execution::connect` on an rvalue of each
+copy, which still results in effectively executing the operation multiple times.
+Other multi-use sender algorithms may require that the sender is
+move-constructible but will invoke `execution::connect` on an lvalue reference
+to the sender.
 
-For a sender to be usable in both multi-use scenarios, it will generally be required to be both copy-constructible and lvalue-connectable.
+For a sender to be usable in both multi-use scenarios, it will generally be
+required to be both copy-constructible and lvalue-connectable.
 
 ## Senders are forkable ## {#design-forkable}
 
-Any non-trivial program will eventually want to fork a chain of senders into independent streams of work, regardless of whether they are single-shot or multi-shot.
-For instance, an incoming event to a middleware system may be required to trigger events on more than one downstream system.
-This requires that we provide well defined mechanisms for making sure that connecting a sender multiple times is possible and correct.
+Any non-trivial program will eventually want to fork a chain of senders into
+independent streams of work, regardless of whether they are single-shot or
+multi-shot. For instance, an incoming event to a middleware system may be
+required to trigger events on more than one downstream system. This requires
+that we provide well defined mechanisms for making sure that connecting a sender
+multiple times is possible and correct.
 
-The `split` sender adaptor facilitates connecting to a sender multiple times, regardless of whether it is single-shot or multi-shot:
+The `split` sender adaptor facilitates connecting to a sender multiple times,
+regardless of whether it is single-shot or multi-shot:
 
 
 auto some_algorithm(execution::sender auto&& input) {
@@ -1842,44 +2264,52 @@ auto some_algorithm(execution::sender auto&& input) {
 
 ## Senders support cancellation ## {#design-cancellation}
 
-Senders are often used in scenarios where the application may be concurrently executing
-multiple strategies for achieving some program goal. When one of these strategies succeeds
-(or fails) it may not make sense to continue pursuing the other strategies as their results
-are no longer useful.
+Senders are often used in scenarios where the application may be concurrently
+executing multiple strategies for achieving some program goal. When one of these
+strategies succeeds (or fails) it may not make sense to continue pursuing the
+other strategies as their results are no longer useful.
 
-For example, we may want to try to simultaneously connect to multiple network servers and use
-whichever server responds first. Once the first server responds we no longer need to continue
-trying to connect to the other servers.
+For example, we may want to try to simultaneously connect to multiple network
+servers and use whichever server responds first. Once the first server responds
+we no longer need to continue trying to connect to the other servers.
 
-Ideally, in these scenarios, we would somehow be able to request that those other strategies
-stop executing promptly so that their resources (e.g. cpu, memory, I/O bandwidth) can be
-released and used for other work.
+Ideally, in these scenarios, we would somehow be able to request that those
+other strategies stop executing promptly so that their resources (e.g. cpu,
+memory, I/O bandwidth) can be released and used for other work.
 
-While the design of senders has support for cancelling an operation before it starts
-by simply destroying the sender or the operation-state returned from `execution::connect()`
-before calling `execution::start()`, there also needs to be a standard, generic mechanism
-to ask for an already-started operation to complete early.
+While the design of senders has support for cancelling an operation before it
+starts by simply destroying the sender or the operation-state returned from
+`execution::connect()` before calling `execution::start()`, there also needs to
+be a standard, generic mechanism to ask for an already-started operation to
+complete early.
 
-The ability to be able to cancel in-flight operations is fundamental to supporting some kinds
-of generic concurrency algorithms.
+The ability to be able to cancel in-flight operations is fundamental to
+supporting some kinds of generic concurrency algorithms.
 
 For example:
-* a `when_all(ops...)` algorithm should cancel other operations as soon as one operation fails
-* a `first_successful(ops...)` algorithm should cancel the other operations as soon as one operation completes successfuly
-* a generic `timeout(src, duration)` algorithm needs to be able to cancel the `src` operation after the timeout duration has elapsed.
-* a `stop_when(src, trigger)` algorithm should cancel `src` if `trigger` completes first and cancel `trigger` if `src` completes first
-
-
-The mechanism used for communcating cancellation-requests, or stop-requests, needs to have a uniform interface
-so that generic algorithms that compose sender-based operations, such as the ones listed above, are able to
-communicate these cancellation requests to senders that they don't know anything about.
+* a `when_all(ops...)` algorithm should cancel other operations as soon as one
+    operation fails
+* a `first_successful(ops...)` algorithm should cancel the other operations as
+    soon as one operation completes successfuly
+* a generic `timeout(src, duration)` algorithm needs to be able to cancel the
+    `src` operation after the timeout duration has elapsed.
+* a `stop_when(src, trigger)` algorithm should cancel `src` if `trigger`
+    completes first and cancel `trigger` if `src` completes first
+
+The mechanism used for communcating cancellation-requests, or stop-requests,
+needs to have a uniform interface so that generic algorithms that compose
+sender-based operations, such as the ones listed above, are able to communicate
+these cancellation requests to senders that they don't know anything about.
+
+The design is intended to be composable so that cancellation of higher-level
+operations can propagate those cancellation requests through intermediate layers
+to lower-level operations that need to actually respond to the cancellation
+requests.
+
+For example, we can compose the algorithms mentioned above so that child
+operations are cancelled when any one of the multiple cancellation conditions
+occurs:
 
-The design is intended to be composable so that cancellation of higher-level operations can propagate
-those cancellation requests through intermediate layers to lower-level operations that need to actually
-respond to the cancellation requests.
-
-For example, we can compose the algorithms mentioned above so that child operations
-are cancelled when any one of the multiple cancellation conditions occurs:
 
 sender auto composed_cancellation_example(auto query) {
   return stop_when(
@@ -1894,55 +2324,78 @@ sender auto composed_cancellation_example(auto query) {
 }
 
-In this example, if we take the operation returned by `query_server_b(query)`, this operation will -receive a stop-request when any of the following happens: -* `first_successful` algorithm will send a stop-request if `query_server_a(query)` completes successfully -* `when_all` algorithm will send a stop-request if the `load_file("some_file.jpg")` operation completes with an error or stopped result. -* `timeout` algorithm will send a stop-request if the operation does not complete within 5 seconds. -* `stop_when` algorithm will send a stop-request if the user clicks on the "Cancel" button in the user-interface. -* The parent operation consuming the `composed_cancellation_example()` sends a stop-request - - -Note that within this code there is no explicit mention of cancellation, stop-tokens, callbacks, etc. -yet the example fully supports and responds to the various cancellation sources. - -The intent of the design is that the common usage of cancellation in sender/receiver-based code is -primarily through use of concurrency algorithms that manage the detailed plumbing of cancellation -for you. Much like algorithms that compose senders relieve the user from having to write their own -receiver types, algorithms that introduce concurrency and provide higher-level cancellation semantics -relieve the user from having to deal with low-level details of cancellation. +In this example, if we take the operation returned by `query_server_b(query)`, +this operation will receive a stop-request when any of the following happens: + +* `first_successful` algorithm will send a stop-request if + `query_server_a(query)` completes successfully +* `when_all` algorithm will send a stop-request if the + `load_file("some_file.jpg")` operation completes with an error or stopped + result. +* `timeout` algorithm will send a stop-request if the operation does not + complete within 5 seconds. +* `stop_when` algorithm will send a stop-request if the user clicks on the + "Cancel" button in the user-interface. +* The parent operation consuming the `composed_cancellation_example()` sends a + stop-request + +Note that within this code there is no explicit mention of cancellation, +stop-tokens, callbacks, etc. yet the example fully supports and responds to the +various cancellation sources. + +The intent of the design is that the common usage of cancellation in +sender/receiver-based code is primarily through use of concurrency algorithms +that manage the detailed plumbing of cancellation for you. Much like algorithms +that compose senders relieve the user from having to write their own receiver +types, algorithms that introduce concurrency and provide higher-level +cancellation semantics relieve the user from having to deal with low-level +details of cancellation. ### Cancellation design summary ### {#design-cancellation-summary} -The design of cancellation described in this paper is built on top of and extends the `std::stop_token`-based -cancellation facilities added in C++20, first proposed in [[P2175R0]]. - -At a high-level, the facilities proposed by this paper for supporting cancellation include: -* Add `std::stoppable_token` and `std::stoppable_token_for` concepts that generalise the interface of `std::stop_token` type to allow other types with different implementation strategies. -* Add `std::unstoppable_token` concept for detecting whether a `stoppable_token` can never receive a stop-request. -* Add `std::in_place_stop_token`, `std::in_place_stop_source` and `std::in_place_stop_callback` types that provide a more efficient implementation of a stop-token for use in structured concurrency situations. -* Add `std::never_stop_token` for use in places where you never want to issue a stop-request -* Add `std::execution::get_stop_token()` CPO for querying the stop-token to use for an operation from its receiver's execution environment. -* Add `std::execution::stop_token_of_t` for querying the type of a stop-token returned from `get_stop_token()` - -In addition, there are requirements added to some of the algorithms to specify what their cancellation -behaviour is and what the requirements of customisations of those algorithms are with respect to -cancellation. - -The key component that enables generic cancellation within sender-based operations is the `execution::get_stop_token()` CPO. -This CPO takes a single parameter, which is the execution environment of the receiver passed to `execution::connect`, and returns a `std::stoppable_token` -that the operation can use to check for stop-requests for that operation. +The design of cancellation described in this paper is built on top of and +extends the `std::stop_token`-based cancellation facilities added in C++20, +first proposed in [[P2175R0]]. + +At a high-level, the facilities proposed by this paper for supporting +cancellation include: + +* Add `std::stoppable_token` and `std::stoppable_token_for` concepts that + generalise the interface of `std::stop_token` type to allow other types with + different implementation strategies. +* Add `std::unstoppable_token` concept for detecting whether a `stoppable_token` + can never receive a stop-request. +* Add `std::in_place_stop_token`, `std::in_place_stop_source` and + `std::in_place_stop_callback` types that provide a more efficient + implementation of a stop-token for use in structured concurrency situations. +* Add `std::never_stop_token` for use in places where you never want to issue a + stop-request. +* Add `std::execution::get_stop_token()` CPO for querying the stop-token to use + for an operation from its receiver's execution environment. +* Add `std::execution::stop_token_of_t` for querying the type of a stop-token + returned from `get_stop_token()`. + +In addition, there are requirements added to some of the algorithms to specify +what their cancellation behaviour is and what the requirements of customisations +of those algorithms are with respect to cancellation. + +The key component that enables generic cancellation within sender-based +operations is the `execution::get_stop_token()` CPO. This CPO takes a single +parameter, which is the execution environment of the receiver passed to +`execution::connect`, and returns a `std::stoppable_token` that the operation +can use to check for stop-requests for that operation. As the caller of `execution::connect` typically has control over the receiver -type it passes, it is able to customise the `std::execution::get_env()` CPO for that -receiver to return an execution environment that hooks the +type it passes, it is able to customise the `std::execution::get_env()` CPO for +that receiver to return an execution environment that hooks the `execution::get_stop_token()` CPO to return a stop-token that the receiver has control over and that it can use to communicate a stop-request to the operation once it has started. ### Support for cancellation is optional ### {#design-cancellation-optional} -Support for cancellation is optional, both on part of the author of the receiver and on part of the author of the sender. +Support for cancellation is optional, both on part of the author of the receiver +and on part of the author of the sender. If the receiver's execution environment does not customise the `execution::get_stop_token()` CPO then invoking the CPO on that receiver's @@ -1950,58 +2403,78 @@ environment will invoke the default implementation which returns `std::never_stop_token`. This is a special `stoppable_token` type that is statically known to always return `false` from the `stop_possible()` method. -Sender code that tries to use this stop-token will in general result in code that handles stop-requests being -compiled out and having little to no run-time overhead. +Sender code that tries to use this stop-token will in general result in code +that handles stop-requests being compiled out and having little to no run-time +overhead. -If the sender doesn't call `execution::get_stop_token()`, for example because the operation does not support -cancellation, then it will simply not respond to stop-requests from the caller. +If the sender doesn't call `execution::get_stop_token()`, for example because +the operation does not support cancellation, then it will simply not respond to +stop-requests from the caller. -Note that stop-requests are generally racy in nature as there is often a race betwen an operation completing -naturally and the stop-request being made. If the operation has already completed or past the point at which -it can be cancelled when the stop-request is sent then the stop-request may just be ignored. An application -will typically need to be able to cope with senders that might ignore a stop-request anyway. +Note that stop-requests are generally racy in nature as there is often a race +betwen an operation completing naturally and the stop-request being made. If the +operation has already completed or past the point at which it can be cancelled +when the stop-request is sent then the stop-request may just be ignored. An +application will typically need to be able to cope with senders that might +ignore a stop-request anyway. ### Cancellation is inherently racy ### {#design-cancellation-racy} -Usually, an operation will attach a stop-callback at some point inside the call to `execution::start()` so that -a subsequent stop-request will interrupt the logic. - -A stop-request can be issued concurrently from another thread. This means the implementation of `execution::start()` -needs to be careful to ensure that, once a stop-callback has been registered, that there are no data-races between -a potentially concurrently-executing stop-callback and the rest of the `execution::start()` implementation. - -An implementation of `execution::start()` that supports cancellation will generally need to perform (at least) -two separate steps: launch the operation, subscribe a stop-callback to the receiver's stop-token. Care needs -to be taken depending on the order in which these two steps are performed. - -If the stop-callback is subscribed first and then the operation is launched, care needs to be taken to ensure -that a stop-request that invokes the stop-callback on another thread after the stop-callback is registered -but before the operation finishes launching does not either result in a missed cancellation request or a -data-race. e.g. by performing an atomic write after the launch has finished executing - -If the operation is launched first and then the stop-callback is subscribed, care needs to be taken to ensure -that if the launched operation completes concurrently on another thread that it does not destroy the operation-state -until after the stop-callback has been registered. e.g. by having the `execution::start` implementation write to -an atomic variable once it has finished registering the stop-callback and having the concurrent completion handler -check that variable and either call the completion-signalling operation or store the result and defer calling the -receiver's completion-signalling operation to the `execution::start()` call (which is still executing). - -For an example of an implementation strategy for solving these data-races see [[#example-async-windows-socket-recv]]. +Usually, an operation will attach a stop-callback at some point inside the call +to `execution::start()` so that a subsequent stop-request will interrupt the +logic. + +A stop-request can be issued concurrently from another thread. This means the +implementation of `execution::start()` needs to be careful to ensure that, once +a stop-callback has been registered, that there are no data-races between a +potentially concurrently-executing stop-callback and the rest of the +`execution::start()` implementation. + +An implementation of `execution::start()` that supports cancellation will +generally need to perform (at least) two separate steps: launch the operation, +subscribe a stop-callback to the receiver's stop-token. Care needs to be taken +depending on the order in which these two steps are performed. + +If the stop-callback is subscribed first and then the operation is launched, +care needs to be taken to ensure that a stop-request that invokes the +stop-callback on another thread after the stop-callback is registered but before +the operation finishes launching does not either result in a missed cancellation +request or a data-race. e.g. by performing an atomic write after the launch has +finished executing + +If the operation is launched first and then the stop-callback is subscribed, +care needs to be taken to ensure that if the launched operation completes +concurrently on another thread that it does not destroy the operation-state +until after the stop-callback has been registered. e.g. by having the +`execution::start` implementation write to an atomic variable once it has +finished registering the stop-callback and having the concurrent completion +handler check that variable and either call the completion-signalling operation +or store the result and defer calling the receiver's completion-signalling +operation to the `execution::start()` call (which is still executing). + +For an example of an implementation strategy for solving these data-races see +[[#example-async-windows-socket-recv]]. ### Cancellation design status ### {#design-cancellation-status} This paper currently includes the design for cancellation as proposed in [[P2175R0]] - "Composable cancellation for sender-based async operations". -P2175R0 contains more details on the background motivation and prior-art and design rationale of this design. +P2175R0 contains more details on the background motivation and prior-art and +design rationale of this design. -It is important to note, however, that initial review of this design in the SG1 concurrency subgroup raised some concerns -related to runtime overhead of the design in single-threaded scenarios and these concerns are still being investigated. +It is important to note, however, that initial review of this design in the SG1 +concurrency subgroup raised some concerns related to runtime overhead of the +design in single-threaded scenarios and these concerns are still being +investigated. -The design of P2175R0 has been included in this paper for now, despite its potential to change, as we believe that -support for cancellation is a fundamental requirement for an async model and is required in some form to be able to -talk about the semantics of some of the algorithms proposed in this paper. +The design of P2175R0 has been included in this paper for now, despite its +potential to change, as we believe that support for cancellation is a +fundamental requirement for an async model and is required in some form to be +able to talk about the semantics of some of the algorithms proposed in this +paper. -This paper will be updated in the future with any changes that arise from the investigations into P2175R0. +This paper will be updated in the future with any changes that arise from the +investigations into P2175R0. ## Sender factories and adaptors are lazy ## {#design-lazy-algorithms} @@ -2010,24 +2483,26 @@ executing their logic eagerly; i.e., before the returned sender has been connected to a receiver and started. These algorithms were removed because eager execution has a number of negative semantic and performance implications. -We have originally included this functionality in the paper because of a long-standing -belief that eager execution is a mandatory feature to be included in the standard Executors -facility for that facility to be acceptable for accelerator vendors. A particular concern -was that we must be able to write generic algorithms that can run either eagerly or lazily, -depending on the kind of an input sender or scheduler that have been passed into them as -arguments. We considered this a requirement, because the _latency_ of launching work on an +We have originally included this functionality in the paper because of a +long-standing belief that eager execution is a mandatory feature to be included +in the standard Executors facility for that facility to be acceptable for +accelerator vendors. A particular concern was that we must be able to write +generic algorithms that can run either eagerly or lazily, depending on the kind +of an input sender or scheduler that have been passed into them as arguments. We +considered this a requirement, because the _latency_ of launching work on an accelerator can sometimes be considerable. -However, in the process of working on this paper and implementations of the features -proposed within, our set of requirements has shifted, as we understood the different -implementation strategies that are available for the feature set of this paper better, -and, after weighting the earlier concerns against the points presented below, we -have arrived at the conclusion that a purely lazy model is enough for most algorithms, -and users who intend to launch work earlier may use an algorithm such as `ensure_started` -to achieve that goal. We have also come to deeply appreciate the fact that a purely -lazy model allows both the implementation and the compiler to have a much better -understanding of what the complete graph of tasks looks like, allowing them to better -optimize the code - also when targetting accelerators. +However, in the process of working on this paper and implementations of the +features proposed within, our set of requirements has shifted, as we understood +the different implementation strategies that are available for the feature set +of this paper better, and, after weighting the earlier concerns against the +points presented below, we have arrived at the conclusion that a purely lazy +model is enough for most algorithms, and users who intend to launch work earlier +may use an algorithm such as `ensure_started` to achieve that goal. We have also +come to deeply appreciate the fact that a purely lazy model allows both the +implementation and the compiler to have a much better understanding of what the +complete graph of tasks looks like, allowing them to better optimize the code - +also when targetting accelerators. ### Eager execution leads to detached work or worse ### {#design-lazy-algorithms-detached} @@ -2173,23 +2648,36 @@ child operations, which may complete before a receiver is ever attached. ## Schedulers advertise their forward progress guarantees ## {#design-fpg} -To decide whether a scheduler (and its associated execution resource) is sufficient for a specific task, it may be necessary to know what kind of forward progress guarantees it provides for the execution agents it creates. The C++ Standard defines the following -forward progress guarantees: +To decide whether a scheduler (and its associated execution resource) is +sufficient for a specific task, it may be necessary to know what kind of forward +progress guarantees it provides for the execution agents it creates. The C++ +Standard defines the following forward progress guarantees: -* concurrent, which requires that a thread makes progress eventually; -* parallel, which requires that a thread makes progress once it executes a step; and +* concurrent, which requires that a thread makes progress + eventually; +* parallel, which requires that a thread makes progress once it executes + a step; and * weakly parallel, which does not require that the thread makes progress. -This paper introduces a scheduler query function, `get_forward_progress_guarantee`, which returns one of the enumerators of a new `enum` type, `forward_progress_guarantee`. Each enumerator of `forward_progress_guarantee` corresponds to one of the aforementioned +This paper introduces a scheduler query function, +`get_forward_progress_guarantee`, which returns one of the enumerators of a new +`enum` type, `forward_progress_guarantee`. Each enumerator of +`forward_progress_guarantee` corresponds to one of the aforementioned guarantees. ## Most sender adaptors are pipeable ## {#design-pipeable} -To facilitate an intuitive syntax for composition, most sender adaptors are pipeable; they can be composed (piped) together with `operator|`. -This mechanism is similar to the `operator|` composition that C++ range adaptors support and draws inspiration from piping in *nix shells. -Pipeable sender adaptors take a sender as their first parameter and have no other sender parameters. +To facilitate an intuitive syntax for composition, most sender adaptors are pipeable; they can be composed (piped) +together with `operator|`. This mechanism is similar to the `operator|` +composition that C++ range adaptors support and draws inspiration from piping in +*nix shells. +Pipeable sender adaptors take a sender as their first parameter and have no +other sender parameters. -`a | b` will pass the sender `a` as the first argument to the pipeable sender adaptor `b`. Pipeable sender adaptors support partial application of the parameters after the first. For example, all of the following are equivalent: +`a | b` will pass the sender `a` as the first argument to the pipeable sender +adaptor `b`. Pipeable sender adaptors support partial application of the +parameters after the first. For example, all of the following are equivalent:
 execution::bulk(snd, N, [] (std::size_t i, auto d) {});
@@ -2197,9 +2685,12 @@ execution::bulk(N, [] (std::size_t i, auto d) {})(snd);
 snd | execution::bulk(N, [] (std::size_t i, auto d) {});
 
-Piping enables you to compose together senders with a linear syntax. -Without it, you'd have to use either nested function call syntax, which would cause a syntactic inversion of the direction of control flow, or you'd have to introduce a temporary variable for each stage of the pipeline. -Consider the following example where we want to execute first on a CPU thread pool, then on a CUDA GPU, then back on the CPU thread pool: +Piping enables you to compose together senders with a linear syntax. Without it, +you'd have to use either nested function call syntax, which would cause a +syntactic inversion of the direction of control flow, or you'd have to introduce +a temporary variable for each stage of the pipeline. Consider the following +example where we want to execute first on a CPU thread pool, then on a CUDA GPU, +then back on the CPU thread pool: @@ -2248,20 +2739,35 @@ auto [result] = this_thread::sync_wait(snd).value();
-Certain sender adaptors are not pipeable, because using the pipeline syntax can result in confusion of the semantics of the adaptors involved. Specifically, the following sender adaptors are not pipeable. +Certain sender adaptors are not pipeable, because using the pipeline syntax can +result in confusion of the semantics of the adaptors involved. Specifically, the +following sender adaptors are not pipeable. -* `execution::when_all` and `execution::when_all_with_variant`: Since this sender adaptor takes a variadic pack of senders, a partially applied form would be ambiguous with a non partially applied form with an arity of one less. -* `execution::on`: This sender adaptor changes how the sender passed to it is executed, not what happens to its result, but allowing it in a pipeline makes it read as if it performed a function more similar to `transfer`. +* `execution::when_all` and `execution::when_all_with_variant`: Since this + sender adaptor takes a variadic pack of senders, a partially applied form + would be ambiguous with a non partially applied form with an arity of one + less. +* `execution::on`: This sender adaptor changes how the sender passed to it is + executed, not what happens to its result, but allowing it in a pipeline makes + it read as if it performed a function more similar to `transfer`. Sender consumers could be made pipeable, but we have chosen to not do so. -However, since these are terminal nodes in a pipeline and nothing can be piped after them, we believe a pipe syntax may be confusing as well as unnecessary, as consumers cannot be chained. -We believe sender consumers read better with function call syntax. +However, since these are terminal nodes in a pipeline and nothing can be piped +after them, we believe a pipe syntax may be confusing as well as unnecessary, as +consumers cannot be chained. We believe sender consumers read better with +function call syntax. ## A range of senders represents an async sequence of data ## {#design-range-of-senders} -Senders represent a single unit of asynchronous work. In many cases though, what is being modelled is a sequence of data arriving asynchronously, and you want computation to happen on demand, when each element arrives. This requires nothing more than what is in this paper and the range support in C++20. A range of senders would allow you to model such input as keystrikes, mouse movements, sensor readings, or network requests. +Senders represent a single unit of asynchronous work. In many cases though, what +is being modelled is a sequence of data arriving asynchronously, and you want +computation to happen on demand, when each element arrives. This requires +nothing more than what is in this paper and the range support in C++20. A range +of senders would allow you to model such input as keystrikes, mouse movements, +sensor readings, or network requests. -Given some expression R that is a range of senders, consider the following in a coroutine that returns an async generator type: +Given some expression R that is a range of senders, consider +the following in a coroutine that returns an async generator type:
     for (auto snd : R) {
@@ -2272,17 +2778,35 @@ Given some expression R that is a range of senders, consider
     }
     
-This transforms each element of the asynchronous sequence R with the function `fn` on demand, as the data arrives. The result is a new asynchronous sequence of the transformed values. +This transforms each element of the asynchronous sequence R +with the function `fn` on demand, as the data arrives. The result is a new +asynchronous sequence of the transformed values. -Now imagine that R is the simple expression `views::iota(0) | views::transform(execution::just)`. This creates a lazy range of senders, each of which completes immediately with monotonically increasing integers. The above code churns through the range, generating a new infine asynchronous range of values [`fn(0)`, `fn(1)`, `fn(2)`, ...]. +Now imagine that R is the simple expression `views::iota(0) +| views::transform(execution::just)`. This creates a lazy range of senders, each +of which completes immediately with monotonically increasing integers. The above +code churns through the range, generating a new infine asynchronous range of +values [`fn(0)`, `fn(1)`, `fn(2)`, ...]. -Far more interesting would be if R were a range of senders representing, say, user actions in a UI. The above code gives a simple way to respond to user actions on demand. +Far more interesting would be if R were a range of senders +representing, say, user actions in a UI. The above code gives a simple way to +respond to user actions on demand. ## Senders can represent partial success ## {#design-partial-success} -Receivers have three ways they can complete: with success, failure, or cancellation. This begs the question of how they can be used to represent async operations that *partially* succeed. For example, consider an API that reads from a socket. The connection could drop after the API has filled in some of the buffer. In cases like that, it makes sense to want to report both that the connection dropped and that some data has been successfully read. +Receivers have three ways they can complete: with success, failure, or +cancellation. This begs the question of how they can be used to represent async +operations that *partially* succeed. For example, consider an API that reads +from a socket. The connection could drop after the API has filled in some of the +buffer. In cases like that, it makes sense to want to report both that the +connection dropped and that some data has been successfully read. -Often in the case of partial success, the error condition is not fatal nor does it mean the API has failed to satisfy its post-conditions. It is merely an extra piece of information about the nature of the completion. In those cases, "partial success" is another way of saying "success". As a result, it is sensible to pass both the error code and the result (if any) through the value channel, as shown below: +Often in the case of partial success, the error condition is not fatal nor does +it mean the API has failed to satisfy its post-conditions. It is merely an extra +piece of information about the nature of the completion. In those cases, +"partial success" is another way of saying "success". As a result, it is +sensible to pass both the error code and the result (if any) through the value +channel, as shown below:
     // Capture a buffer for read_socket_async to fill in
@@ -2302,9 +2826,21 @@ Often in the case of partial success, the error condition is not fatal nor does
         })
     
-In other cases, the partial success is more of a partial *failure*. That happens when the error condition indicates that in some way the function failed to satisfy its post-conditions. In those cases, sending the error through the value channel loses valuable contextual information. It's possible that bundling the error and the incomplete results into an object and passing it through the error channel makes more sense. In that way, generic algorithms will not miss the fact that a post-condition has not been met and react inappropriately. - -Another possibility is for an async API to return a *range* of senders: if the API completes with full success, full error, or cancellation, the returned range contains just one sender with the result. Otherwise, if the API partially fails (doesn't satisfy its post-conditions, but some incomplete result is available), the returned range would have *two* senders: the first containing the partial result, and the second containing the error. Such an API might be used in a coroutine as follows: +In other cases, the partial success is more of a partial *failure*. That happens +when the error condition indicates that in some way the function failed to +satisfy its post-conditions. In those cases, sending the error through the value +channel loses valuable contextual information. It's possible that bundling the +error and the incomplete results into an object and passing it through the error +channel makes more sense. In that way, generic algorithms will not miss the fact +that a post-condition has not been met and react inappropriately. + +Another possibility is for an async API to return a *range* of senders: if the +API completes with full success, full error, or cancellation, the returned range +contains just one sender with the result. Otherwise, if the API partially fails +(doesn't satisfy its post-conditions, but some incomplete result is available), +the returned range would have *two* senders: the first containing the partial +result, and the second containing the error. Such an API might be used in a +coroutine as follows:
     // Declare a buffer for read_socket_async to fill in
@@ -2326,11 +2862,18 @@ Another possibility is for an async API to return a *range* of senders: if the A
     }
     
-Finally, it's possible to combine these two approaches when the API can both partially succeed (meeting its post-conditions) and partially fail (not meeting its post-conditions). +Finally, it's possible to combine these two approaches when the API can both +partially succeed (meeting its post-conditions) and partially fail (not meeting +its post-conditions). ## All awaitables are senders ## {#design-awaitables-are-senders} -Since C++20 added coroutines to the standard, we expect that coroutines and awaitables will be how a great many will choose to express their asynchronous code. However, in this paper, we are proposing to add a suite of asynchronous algorithms that accept senders, not awaitables. One might wonder whether and how these algorithms will be accessible to those who choose coroutines instead of senders. +Since C++20 added coroutines to the standard, we expect that coroutines and +awaitables will be how a great many will choose to express their asynchronous +code. However, in this paper, we are proposing to add a suite of asynchronous +algorithms that accept senders, not awaitables. One might wonder whether and how +these algorithms will be accessible to those who choose coroutines instead of +senders. In truth there will be no problem because all generally awaitable types automatically model the `sender` concept. The adaptation is transparent and @@ -2353,13 +2896,22 @@ int main() { } ``` -Since awaitables are senders, writing a sender-based asynchronous algorithm is trivial if you have a coroutine task type: implement the algorithm as a coroutine. If you are not bothered by the possibility of allocations and indirections as a result of using coroutines, then there is no need to ever write a sender, a receiver, or an operation state. +Since awaitables are senders, writing a sender-based asynchronous algorithm is +trivial if you have a coroutine task type: implement the algorithm as a +coroutine. If you are not bothered by the possibility of allocations and +indirections as a result of using coroutines, then there is no need to ever +write a sender, a receiver, or an operation state. ## Many senders can be trivially made awaitable ## {#design-senders-are-awaitable} -If you choose to implement your sender-based algorithms as coroutines, you'll run into the issue of how to retrieve results from a passed-in sender. This is not a problem. If the coroutine type opts in to sender support -- trivial with the `execution::with_awaitable_senders` utility -- then a large class of senders are transparently awaitable from within the coroutine. +If you choose to implement your sender-based algorithms as coroutines, you'll +run into the issue of how to retrieve results from a passed-in sender. This is +not a problem. If the coroutine type opts in to sender support -- trivial with +the `execution::with_awaitable_senders` utility -- then a large class of senders +are transparently awaitable from within the coroutine. -For example, consider the following trivial implementation of the sender-based `retry` algorithm: +For example, consider the following trivial implementation of the sender-based +`retry` algorithm:
 template<class S>
@@ -2374,15 +2926,41 @@ task<single-sender-value-type<S>> retry(S s) {
 }
 
-Only *some* senders can be made awaitable directly because of the fact that callbacks are more expressive than coroutines. An awaitable expression has a single type: the result value of the async operation. In contrast, a callback can accept multiple arguments as the result of an operation. What's more, the callback can have overloaded function call signatures that take different sets of arguments. There is no way to automatically map such senders into awaitables. The `with_awaitable_senders` utility recognizes as awaitables those senders that send a single value of a single type. To await another kind of sender, a user would have to first map its value channel into a single value of a single type -- say, with the `into_variant` sender algorithm -- before `co_await`-ing that sender. +Only *some* senders can be made awaitable directly because of the fact that +callbacks are more expressive than coroutines. An awaitable expression has a +single type: the result value of the async operation. In contrast, a callback +can accept multiple arguments as the result of an operation. What's more, the +callback can have overloaded function call signatures that take different sets +of arguments. There is no way to automatically map such senders into awaitables. +The `with_awaitable_senders` utility recognizes as awaitables those senders that +send a single value of a single type. To await another kind of sender, a user +would have to first map its value channel into a single value of a single type +-- say, with the `into_variant` sender algorithm -- before `co_await`-ing that +sender. ## Cancellation of a sender can unwind a stack of coroutines ## {#design-native-coro-unwind} -When looking at the sender-based `retry` algorithm in the previous section, we can see that the value and error cases are correctly handled. But what about cancellation? What happens to a coroutine that is suspended awaiting a sender that completes by calling `execution::set_stopped`? - -When your task type's promise inherits from `with_awaitable_senders`, what happens is this: the coroutine behaves as if an *uncatchable exception* had been thrown from the `co_await` expression. (It is not really an exception, but it's helpful to think of it that way.) Provided that the promise types of the calling coroutines also inherit from `with_awaitable_senders`, or more generally implement a member function called `unhandled_stopped`, the exception unwinds the chain of coroutines as if an exception were thrown except that it bypasses `catch(...)` clauses. - -In order to "catch" this uncatchable stopped exception, one of the calling coroutines in the stack would have to await a sender that maps the stopped channel into either a value or an error. That is achievable with the `execution::let_stopped`, `execution::upon_stopped`, `execution::stopped_as_optional`, or `execution::stopped_as_error` sender adaptors. For instance, we can use `execution::stopped_as_optional` to "catch" the stopped signal and map it into an empty optional as shown below: +When looking at the sender-based `retry` algorithm in the previous section, we +can see that the value and error cases are correctly handled. But what about +cancellation? What happens to a coroutine that is suspended awaiting a sender +that completes by calling `execution::set_stopped`? + +When your task type's promise inherits from `with_awaitable_senders`, what +happens is this: the coroutine behaves as if an *uncatchable exception* had been +thrown from the `co_await` expression. (It is not really an exception, but it's +helpful to think of it that way.) Provided that the promise types of the calling +coroutines also inherit from `with_awaitable_senders`, or more generally +implement a member function called `unhandled_stopped`, the exception unwinds +the chain of coroutines as if an exception were thrown except that it bypasses +`catch(...)` clauses. + +In order to "catch" this uncatchable stopped exception, one of the calling +coroutines in the stack would have to await a sender that maps the stopped +channel into either a value or an error. That is achievable with the +`execution::let_stopped`, `execution::upon_stopped`, +`execution::stopped_as_optional`, or `execution::stopped_as_error` sender +adaptors. For instance, we can use `execution::stopped_as_optional` to "catch" +the stopped signal and map it into an empty optional as shown below: ```c++ if (auto opt = co_await execution::stopped_as_optional(some_sender)) { @@ -2392,18 +2970,37 @@ if (auto opt = co_await execution::stopped_as_optional(some_sender)) { } ``` -As described in the section "All awaitables are senders", the sender customization points recognize awaitables and adapt them transparently to model the sender concept. When `connect`-ing an awaitable and a receiver, the adaptation layer awaits the awaitable within a coroutine that implements `unhandled_stopped` in its promise type. The effect of this is that an "uncatchable" stopped exception propagates seamlessly out of awaitables, causing `execution::set_stopped` to be called on the receiver. - -Obviously, `unhandled_stopped` is a library extension of the coroutine promise interface. Many promise types will not implement `unhandled_stopped`. When an uncatchable stopped exception tries to propagate through such a coroutine, it is treated as an unhandled exception and `terminate` is called. The solution, as described above, is to use a sender adaptor to handle the stopped exception before awaiting it. It goes without saying that any future Standard Library coroutine types ought to implement `unhandled_stopped`. The author of [[P1056R1]], which proposes a standard coroutine task type, is in agreement. +As described in the section "All +awaitables are senders", the sender customization points recognize +awaitables and adapt them transparently to model the sender concept. When +`connect`-ing an awaitable and a receiver, the adaptation layer awaits the +awaitable within a coroutine that implements `unhandled_stopped` in its promise +type. The effect of this is that an "uncatchable" stopped exception propagates +seamlessly out of awaitables, causing `execution::set_stopped` to be called on +the receiver. + +Obviously, `unhandled_stopped` is a library extension of the coroutine promise +interface. Many promise types will not implement `unhandled_stopped`. When an +uncatchable stopped exception tries to propagate through such a coroutine, it is +treated as an unhandled exception and `terminate` is called. The solution, as +described above, is to use a sender adaptor to handle the stopped exception +before awaiting it. It goes without saying that any future Standard Library +coroutine types ought to implement `unhandled_stopped`. The author of +[[P1056R1]], which proposes a standard coroutine task type, is in agreement. ## Composition with parallel algorithms ## {#design-parallel-algorithms} -The C++ Standard Library provides a large number of algorithms that offer the potential for non-sequential execution via the use of execution policies. The set of algorithms with execution policy overloads are often referred to as "parallel algorithms", although -additional policies are available. +The C++ Standard Library provides a large number of algorithms that offer the +potential for non-sequential execution via the use of execution policies. The +set of algorithms with execution policy overloads are often referred to as +"parallel algorithms", although additional policies are available. -Existing policies, such as `execution::par`, give the implementation permission to execute the algorithm in parallel. However, the choice of execution resources used to perform the work is left to the implementation. +Existing policies, such as `execution::par`, give the implementation permission +to execute the algorithm in parallel. However, the choice of execution resources +used to perform the work is left to the implementation. -We will propose a customization point for combining schedulers with policies in order to provide control over where work will execute. +We will propose a customization point for combining schedulers with policies in +order to provide control over where work will execute.
 template<class ExecutionPolicy>
@@ -2413,17 +3010,27 @@ template<class ExecutionPolicy>
 );
 
-This function would return an object of an unspecified type which can be used in place of an execution policy as the first argument to one of the parallel algorithms. The overload selected by that object should execute its computation as requested by -`policy` while using `scheduler` to create any work to be run. The expression may be ill-formed if `scheduler` is not able to support the given policy. +This function would return an object of an unspecified type which can be used in +place of an execution policy as the first argument to one of the parallel +algorithms. The overload selected by that object should execute its computation +as requested by `policy` while using `scheduler` to create any work to be run. +The expression may be ill-formed if `scheduler` is not able to support the given +policy. -The existing parallel algorithms are synchronous; all of the effects performed by the computation are complete before the algorithm returns to its caller. This remains unchanged with the `executing_on` customization point. +The existing parallel algorithms are synchronous; all of the effects performed +by the computation are complete before the algorithm returns to its caller. This +remains unchanged with the `executing_on` customization point. -In the future, we expect additional papers will propose asynchronous forms of the parallel algorithms which (1) return senders rather than values or `void` and (2) where a customization point pairing a sender with an execution policy would similarly be used to -obtain an object of unspecified type to be provided as the first argument to the algorithm. +In the future, we expect additional papers will propose asynchronous forms of +the parallel algorithms which (1) return senders rather than values or `void` +and (2) where a customization point pairing a sender with an execution policy +would similarly be used to obtain an object of unspecified type to be provided +as the first argument to the algorithm. ## User-facing sender factories ## {#design-sender-factories} -A [=sender factory=] is an algorithm that takes no senders as parameters and returns a sender. +A [=sender factory=] is an algorithm that takes no senders as parameters and +returns a sender. ### `execution::schedule` ### {#design-sender-factory-schedule} @@ -2433,7 +3040,8 @@ execution::sender auto schedule( );
-Returns a sender describing the start of a task graph on the provided scheduler. See [[#design-schedulers]]. +Returns a sender describing the start of a task graph on the provided scheduler. +See [[#design-schedulers]].
 execution::scheduler auto sch1 = get_system_thread_pool().scheduler();
@@ -2450,7 +3058,12 @@ execution::sender auto just(
 );
 
-Returns a sender with no [=completion scheduler|completion schedulers=], which [=send|sends=] the provided values. The input values are decay-copied into the returned sender. When the returned sender is connected to a receiver, the values are moved into the operation state if the sender is an rvalue; otherwise, they are copied. Then xvalues referencing the values in the operation state are passed to the receiver's `set_value`. +Returns a sender with no [=completion scheduler|completion schedulers=], which +[=send|sends=] the provided values. The input values are decay-copied into the +returned sender. When the returned sender is connected to a receiver, the values +are moved into the operation state if the sender is an rvalue; otherwise, they +are copied. Then xvalues referencing the values in the operation state are +passed to the receiver's `set_value`. ```c++ execution::sender auto snd1 = execution::just(3.14); @@ -2489,7 +3102,12 @@ execution::sender auto just_error( );
-Returns a sender with no [=completion scheduler|completion schedulers=], which completes with the specified error. If the provided error is an lvalue reference, a copy is made inside the returned sender and a non-const lvalue reference to the copy is sent to the receiver's `set_error`. If the provided value is an rvalue reference, it is moved into the returned sender and an rvalue reference to it is sent to the receiver's `set_error`. +Returns a sender with no [=completion scheduler|completion schedulers=], which +completes with the specified error. If the provided error is an lvalue +reference, a copy is made inside the returned sender and a non-const lvalue +reference to the copy is sent to the receiver's `set_error`. If the provided +value is an rvalue reference, it is moved into the returned sender and an rvalue +reference to it is sent to the receiver's `set_error`. ### `execution::just_stopped` ### {#design-sender-factory-just_stopped} @@ -2497,7 +3115,8 @@ Returns a sender with no [=completion scheduler|completion schedulers=], which c execution::sender auto just_stopped();
-Returns a sender with no [=completion scheduler|completion schedulers=], which completes immediately by calling the receiver's `set_stopped`. +Returns a sender with no [=completion scheduler|completion schedulers=], which +completes immediately by calling the receiver's `set_stopped`. ### `execution::read` ### {#design-sender-factory-read} @@ -2518,9 +3137,16 @@ execution::sender auto get_stop_token() { } -Returns a sender that reaches into a receiver's environment and pulls out the current value associated with the customization point denoted by `Tag`. It then sends the value read back to the receiver through the value channel. For instance, `get_scheduler()` (with no arguments) is a sender that asks the receiver for the currently suggested `scheduler` and passes it to the receiver's `set_value` completion-signal. +Returns a sender that reaches into a receiver's environment and pulls out the +current value associated with the customization point denoted by `Tag`. It then +sends the value read back to the receiver through the value channel. For +instance, `get_scheduler()` (with no arguments) is a sender that asks the +receiver for the currently suggested `scheduler` and passes it to the receiver's +`set_value` completion-signal. -This can be useful when scheduling nested dependent work. The following sender pulls the current schduler into the value channel and then schedules more work onto it. +This can be useful when scheduling nested dependent work. The following sender +pulls the current schduler into the value channel and then schedules more work +onto it.
     execution::sender auto task =
@@ -2532,16 +3158,27 @@ This can be useful when scheduling nested dependent work. The following sender p
     this_thread::sync_wait( std::move(task) ); // wait for it to finish
     
-This code uses the fact that `sync_wait` associates a scheduler with the receiver that it connects with `task`. `get_scheduler()` reads that scheduler out of the receiver, and passes it to `let_value`'s receiver's `set_value` function, which in turn passes it to the lambda. That lambda returns a new sender that uses the scheduler to schedule some nested work onto `sync_wait`'s scheduler. +This code uses the fact that `sync_wait` associates a scheduler with the +receiver that it connects with `task`. `get_scheduler()` reads that scheduler +out of the receiver, and passes it to `let_value`'s receiver's `set_value` +function, which in turn passes it to the lambda. That lambda returns a new +sender that uses the scheduler to schedule some nested work onto `sync_wait`'s +scheduler. ## User-facing sender adaptors ## {#design-sender-adaptors} -A [=sender adaptor=] is an algorithm that takes one or more senders, which it may `execution::connect`, as parameters, and returns a sender, whose completion is related to the sender arguments it has received. +A [=sender adaptor=] is an algorithm that takes one or more senders, which it +may `execution::connect`, as parameters, and returns a sender, whose completion +is related to the sender arguments it has received. -Sender adaptors are lazy, that is, they are never allowed to submit any work for execution prior to the returned sender being [=started=] later on, and are also guaranteed to not start any input senders passed into them. Sender consumers -such as [[#design-sender-consumer-start_detached]] and [[#design-sender-consumer-sync_wait]] start senders. +Sender adaptors are lazy, that is, they are never allowed to submit any +work for execution prior to the returned sender being [=started=] later on, and +are also guaranteed to not start any input senders passed into them. Sender +consumers such as [[#design-sender-consumer-start_detached]] and +[[#design-sender-consumer-sync_wait]] start senders. -For more implementer-centric description of starting senders, see [[#design-laziness]]. +For more implementer-centric description of starting senders, see +[[#design-laziness]]. ### `execution::transfer` ### {#design-sender-adaptor-transfer} @@ -2552,7 +3189,9 @@ execution::sender auto transfer( ); -Returns a sender describing the transition from the execution agent of the input sender to the execution agent of the target scheduler. See [[#design-transitions]]. +Returns a sender describing the transition from the execution agent of the input +sender to the execution agent of the target scheduler. See +[[#design-transitions]].
 execution::scheduler auto cpu_sched = get_system_thread_pool().scheduler();
@@ -2574,9 +3213,12 @@ execution::sender auto then(
 );
 
-`then` returns a sender describing the task graph described by the input sender, with an added node of invoking the provided function with the values [=send|sent=] by the input sender as arguments. +`then` returns a sender describing the task graph described by the input sender, +with an added node of invoking the provided function with the values +[=send|sent=] by the input sender as arguments. -`then` is **guaranteed** to not begin executing `function` until the returned sender is started. +`then` is **guaranteed** to not begin executing `function` until the returned +sender is started.
 execution::sender auto input = get_input();
@@ -2587,7 +3229,8 @@ execution::sender auto snd = execution::then(input, [](auto... args) {
 // followed by printing all of the values sent by pred
 
-This adaptor is included as it is necessary for writing any sender code that actually performs a useful function. +This adaptor is included as it is necessary for writing any sender code that +actually performs a useful function. ### `execution::upon_*` ### {#design-sender-adaptor-upon} @@ -2603,7 +3246,9 @@ execution::sender auto upon_stopped( ); -`upon_error` and `upon_stopped` are similar to `then`, but where `then` works with values sent by the input sender, `upon_error` works with errors, and `upon_stopped` is invoked when the "stopped" signal is sent. +`upon_error` and `upon_stopped` are similar to `then`, but where `then` works +with values sent by the input sender, `upon_error` works with errors, and +`upon_stopped` is invoked when the "stopped" signal is sent. ### `execution::let_*` ### {#design-sender-adaptor-let} @@ -2624,12 +3269,21 @@ execution::sender auto let_stopped( ); -`let_value` is very similar to `then`: when it is started, it invokes the provided function with the values [=send|sent=] by the input sender as arguments. However, where the sender returned from `then` sends exactly what that function ends up returning - -`let_value` requires that the function return a sender, and the sender returned by `let_value` sends the values sent by the sender returned from the callback. This is similar to the notion of "future unwrapping" in future/promise-based frameworks. +`let_value` is very similar to `then`: when it is started, it invokes the +provided function with the values [=send|sent=] by the input sender as +arguments. However, where the sender returned from `then` sends exactly what +that function ends up returning - +`let_value` requires that the function return a sender, and the sender returned +by `let_value` sends the values sent by the sender returned from the callback. +This is similar to the notion of "future unwrapping" in future/promise-based +frameworks. -`let_value` is **guaranteed** to not begin executing `function` until the returned sender is started. +`let_value` is **guaranteed** to not begin executing `function` until the +returned sender is started. -`let_error` and `let_stopped` are similar to `let_value`, but where `let_value` works with values sent by the input sender, `let_error` works with errors, and `let_stopped` is invoked when the "stopped" signal is sent. +`let_error` and `let_stopped` are similar to `let_value`, but where `let_value` +works with values sent by the input sender, `let_error` works with errors, and +`let_stopped` is invoked when the "stopped" signal is sent. ### `execution::on` ### {#design-sender-adaptor-on} @@ -2640,7 +3294,10 @@ execution::sender auto on( ); -Returns a sender which, when started, will start the provided sender on an execution agent belonging to the execution resource associated with the provided scheduler. This returned sender has no [=completion scheduler|completion schedulers=]. +Returns a sender which, when started, will start the provided sender on an +execution agent belonging to the execution resource associated with the provided +scheduler. This returned sender has no [=completion scheduler|completion +schedulers=]. ### `execution::into_variant` ### {#design-sender-adaptor-into_variant} @@ -2650,7 +3307,10 @@ execution::sender auto into_variant( ); -Returns a sender which sends a variant of tuples of all the possible sets of types sent by the input sender. Senders can send multiple sets of values depending on runtime conditions; this is a helper function that turns them into a single variant value. +Returns a sender which sends a variant of tuples of all the possible sets of +types sent by the input sender. Senders can send multiple sets of values +depending on runtime conditions; this is a helper function that turns them into +a single variant value. ### `execution::stopped_as_optional` ### {#design-sender-adaptor-stopped_as_optional} @@ -2660,7 +3320,9 @@ execution::sender auto stopped_as_optional( ); -Returns a sender that maps the value channel from a `T` to an `optional>`, and maps the stopped channel to a value of an empty `optional>`. +Returns a sender that maps the value channel from a `T` to an +`optional>`, and maps the stopped channel to a value of an empty +`optional>`. ### `execution::stopped_as_error` ### {#design-sender-adaptor-stopped_as_error} @@ -2684,14 +3346,23 @@ execution::sender auto bulk( ); -Returns a sender describing the task of invoking the provided function with every index in the provided shape along with the values sent by the input sender. The returned sender completes once all invocations have completed, or an error has occurred. If it completes -by sending values, they are equivalent to those sent by the input sender. +Returns a sender describing the task of invoking the provided function with +every index in the provided shape along with the values sent by the input +sender. The returned sender completes once all invocations have completed, or an +error has occurred. If it completes by sending values, they are equivalent to +those sent by the input sender. -No instance of `function` will begin executing until the returned sender is started. Each invocation of `function` runs in an execution agent whose forward progress guarantees are determined by the scheduler on which they are run. All agents created by a single use -of `bulk` execute with the same guarantee. The number of execution agents used by `bulk` is not specified. This allows a scheduler to execute some invocations of the `function` in parallel. +No instance of `function` will begin executing until the returned sender is +started. Each invocation of `function` runs in an execution agent whose forward +progress guarantees are determined by the scheduler on which they are run. All +agents created by a single use of `bulk` execute with the same guarantee. The +number of execution agents used by `bulk` is not specified. This allows a +scheduler to execute some invocations of the `function` in parallel. -In this proposal, only integral types are used to specify the shape of the bulk section. We expect that future papers may wish to explore extensions of the interface to explore additional kinds of shapes, such as multi-dimensional grids, that are commonly used for -parallel computing tasks. +In this proposal, only integral types are used to specify the shape of the bulk +section. We expect that future papers may wish to explore extensions of the +interface to explore additional kinds of shapes, such as multi-dimensional +grids, that are commonly used for parallel computing tasks. ### `execution::split` ### {#design-sender-adaptor-split} @@ -2699,7 +3370,9 @@ parallel computing tasks. execution::sender auto split(execution::sender auto sender); -If the provided sender is a multi-shot sender, returns that sender. Otherwise, returns a multi-shot sender which sends values equivalent to the values sent by the provided sender. See [[#design-shot]]. +If the provided sender is a multi-shot sender, returns that sender. Otherwise, +returns a multi-shot sender which sends values equivalent to the values sent by +the provided sender. See [[#design-shot]]. ### `execution::when_all` ### {#design-sender-adaptor-when_all} @@ -2713,9 +3386,18 @@ execution::sender auto when_all_with_variant( ); -`when_all` returns a sender that completes once all of the input senders have completed. It is constrained to only accept senders that can complete with a single set of values (_i.e._, it only calls one overload of `set_value` on its receiver). The values sent by this sender are the values sent by each of the input senders, in order of the arguments passed to `when_all`. It completes inline on the execution resource on which the last input sender completes, unless stop is requested before `when_all` is started, in which case it completes inline within the call to `start`. +`when_all` returns a sender that completes once all of the input senders have +completed. It is constrained to only accept senders that can complete with a +single set of values (_i.e._, it only calls one overload of `set_value` on its +receiver). The values sent by this sender are the values sent by each of the +input senders, in order of the arguments passed to `when_all`. It completes +inline on the execution resource on which the last input sender completes, +unless stop is requested before `when_all` is started, in which case it +completes inline within the call to `start`. -`when_all_with_variant` does the same, but it adapts all the input senders using `into_variant`, and so it does not constrain the input arguments as `when_all` does. +`when_all_with_variant` does the same, but it adapts all the input senders using +`into_variant`, and so it does not constrain the input arguments as `when_all` +does. The returned sender has no [=completion scheduler|completion schedulers=]. @@ -2744,21 +3426,30 @@ execution::sender auto ensure_started( ); -Once `ensure_started` returns, it is known that the provided sender has been [=connect|connected=] and `start` has been called on the resulting operation state (see [[#design-states]]); in other words, the work described by the provided sender has been submitted -for execution on the appropriate execution resources. Returns a sender which completes when the provided sender completes and sends values equivalent to those of the provided sender. - -If the returned sender is destroyed before `execution::connect()` is called, or if `execution::connect()` is called but the -returned operation-state is destroyed before `execution::start()` is called, then a stop-request is sent to the eagerly launched -operation and the operation is detached and will run to completion in the background. Its result will be discarded when it -eventually completes. - -Note that the application will need to make sure that resources are kept alive in the case that the operation detaches. -e.g. by holding a `std::shared_ptr` to those resources or otherwise having some out-of-band way to signal completion of +Once `ensure_started` returns, it is known that the provided sender has been +[=connect|connected=] and `start` has been called on the resulting operation +state (see [[#design-states]]); in other words, the work described by the +provided sender has been submitted +for execution on the appropriate execution resources. Returns a sender which +completes when the provided sender completes and sends values equivalent to +those of the provided sender. + +If the returned sender is destroyed before `execution::connect()` is called, or +if `execution::connect()` is called but the returned operation-state is +destroyed before `execution::start()` is called, then a stop-request is sent to +the eagerly launched operation and the operation is detached and will run to +completion in the background. Its result will be discarded when it eventually +completes. + +Note that the application will need to make sure that resources are kept alive +in the case that the operation detaches. e.g. by holding a `std::shared_ptr` to +those resources or otherwise having some out-of-band way to signal completion of the operation so that resource release can be sequenced after the completion. ## User-facing sender consumers ## {#design-sender-consumers} -A [=sender consumer=] is an algorithm that takes one or more senders, which it may `execution::connect`, as parameters, and does not return a sender. +A [=sender consumer=] is an algorithm that takes one or more senders, which it +may `execution::connect`, as parameters, and does not return a sender. ### `execution::start_detached` ### {#design-sender-consumer-start_detached} @@ -2768,7 +3459,8 @@ void start_detached( ); -Like `ensure_started`, but does not return a value; if the provided sender sends an error instead of a value, `std::terminate` is called. +Like `ensure_started`, but does not return a value; if the provided sender sends +an error instead of a value, `std::terminate` is called. ### `this_thread::sync_wait` ### {#design-sender-consumer-sync_wait} @@ -2779,24 +3471,44 @@ auto sync_wait( -> std::optional<std::tuple<values-sent-by(sender)>>; -`this_thread::sync_wait` is a sender consumer that submits the work described by the provided sender for execution, similarly to `ensure_started`, except that it blocks the current `std::thread` or thread of `main` until the work is completed, and returns -an optional tuple of values that were sent by the provided sender on its completion of work. Where [[#design-sender-factory-schedule]] and [[#design-sender-factory-just]] are meant to enter the domain of senders, `sync_wait` is meant to exit the domain of -senders, retrieving the result of the task graph. - -If the provided sender sends an error instead of values, `sync_wait` throws that error as an exception, or rethrows the original exception if the error is of type `std::exception_ptr`. - -If the provided sender sends the "stopped" signal instead of values, `sync_wait` returns an empty optional. - -For an explanation of the `requires` clause, see [[#design-typed]]. That clause also explains another sender consumer, built on top of `sync_wait`: `sync_wait_with_variant`. - -Note: This function is specified inside `std::this_thread`, and not inside `execution`. This is because `sync_wait` has to block the current execution agent, but determining what the current execution agent is is not reliable. Since the standard -does not specify any functions on the current execution agent other than those in `std::this_thread`, this is the flavor of this function that is being proposed. If C++ ever obtains fibers, for instance, we expect that a variant of this function called -`std::this_fiber::sync_wait` would be provided. We also expect that runtimes with execution agents that use different synchronization mechanisms than `std::thread`'s will provide their own flavors of `sync_wait` as well (assuming their execution agents have the means +`this_thread::sync_wait` is a sender consumer that submits the work described by +the provided sender for execution, similarly to `ensure_started`, except that it +blocks the current `std::thread` or thread of `main` until the work is +completed, and returns an optional tuple of values that were sent by the +provided sender on its completion of work. Where +[[#design-sender-factory-schedule]] and [[#design-sender-factory-just]] are +meant to enter the domain of senders, `sync_wait` is meant to exit +the domain of senders, retrieving the result of the task graph. + +If the provided sender sends an error instead of values, `sync_wait` throws that +error as an exception, or rethrows the original exception if the error is of +type `std::exception_ptr`. + +If the provided sender sends the "stopped" signal instead of values, `sync_wait` +returns an empty optional. + +For an explanation of the `requires` clause, see [[#design-typed]]. That clause +also explains another sender consumer, built on top of `sync_wait`: +`sync_wait_with_variant`. + +Note: This function is specified inside `std::this_thread`, and not inside +`execution`. This is because `sync_wait` has to block the current +execution agent, but determining what the current execution agent is is not +reliable. Since the standard does not specify any functions on the current +execution agent other than those in `std::this_thread`, this is the flavor of +this function that is being proposed. If C++ ever obtains fibers, for instance, +we expect that a variant of this function called `std::this_fiber::sync_wait` +would be provided. We also expect that runtimes with execution agents that use +different synchronization mechanisms than `std::thread`'s will provide their own +flavors of `sync_wait` as well (assuming their execution agents have the means to block in a non-deadlock manner). ## `execution::execute` ## {#design-execute} -In addition to the three categories of functions presented above, we also propose to include a convenience function for fire-and-forget eager one-way submission of an invocable to a scheduler, to fulfil the role of one-way executors from P0443. +In addition to the three categories of functions presented above, we also +propose to include a convenience function for fire-and-forget eager one-way +submission of an invocable to a scheduler, to fulfil the role of one-way +executors from P0443.
 void execution::execute(
@@ -2817,7 +3529,8 @@ execution::start_detached(work);
 
 ## Receivers serve as glue between senders ## {#design-receivers}
 
-A [=receiver=] is a callback that supports more than one channel. In fact, it supports three of them:
+A [=receiver=] is a callback that supports more than one channel. In fact, it
+supports three of them:
 
 * `set_value`, which is the moral equivalent of an `operator()` or a function
     call, which signals successful completion of the operation its execution
@@ -2830,31 +3543,44 @@ A [=receiver=] is a callback that supports more than one channel. In fact, it su
     to indicate that the operation stopped early, typically because it was asked
     to do so because the result is no longer needed.
 
-Once an async operation has been started exactly one of these functions must be invoked
-on a receiver before it is destroyed.
+Once an async operation has been started exactly one of these functions must be
+invoked on a receiver before it is destroyed.
 
 While the receiver interface may look novel, it is in fact very similar to the
 interface of `std::promise`, which provides the first two signals as `set_value`
 and `set_exception`, and it's possible to emulate the third channel with
 lifetime management of the promise.
 
-Receivers are not a part of the end-user-facing API of this proposal; they are necessary to allow unrelated senders communicate with each other, but the only users who will interact with receivers directly are authors of senders.
+Receivers are not a part of the end-user-facing API of this proposal; they are
+necessary to allow unrelated senders communicate with each other, but the only
+users who will interact with receivers directly are authors of senders.
 
 Receivers are what is passed as the second argument to [[#design-connect]].
 
 ## Operation states represent work ## {#design-states}
 
-An [=operation state=] is an object that represents work. Unlike senders, it is not a chaining mechanism; instead, it is a concrete object that packages the work described by a full sender chain, ready to be executed. An operation state is neither movable nor
-copyable, and its interface consists of a single algorithm: `start`, which serves as the submission point of the work represented by a given operation state.
+An [=operation state=] is an object that represents work. Unlike senders, it is
+not a chaining mechanism; instead, it is a concrete object that packages the
+work described by a full sender chain, ready to be executed. An operation state
+is neither movable nor copyable, and its interface consists of a single
+algorithm: `start`, which serves as the submission point of the work represented
+by a given operation state.
 
-Operation states are not a part of the user-facing API of this proposal; they are necessary for implementing sender consumers like `execution::ensure_started` and `this_thread::sync_wait`, and the knowledge of them is necessary to implement senders, so the only users who will
-interact with operation states directly are authors of senders and authors of sender algorithms.
+Operation states are not a part of the user-facing API of this proposal; they
+are necessary for implementing sender consumers like `execution::ensure_started`
+and `this_thread::sync_wait`, and the knowledge of them is necessary to
+implement senders, so the only users who will interact with operation states
+directly are authors of senders and authors of sender algorithms.
 
-The return value of [[#design-connect]] must satisfy the operation state concept.
+The return value of [[#design-connect]] must satisfy the operation state
+concept.
 
 ## `execution::connect` ## {#design-connect}
 
-`execution::connect` is a customization point which [=connects=] senders with receivers, resulting in an operation state that will ensure that if `start` is called that one of the completion operations will be called on the receiver passed to `connect`.
+`execution::connect` is a customization point which [=connects=] senders with
+receivers, resulting in an operation state that will ensure that if `start` is
+called that one of the completion operations will be called on the receiver
+passed to `connect`.
 
 
 execution::sender auto snd = some input sender;
@@ -2872,18 +3598,31 @@ execution::start(state);
 
 ## Sender algorithms are customizable ## {#design-customization}
 
-Senders being able to advertise what their [=completion schedulers=] are fulfills one of the promises of senders: that of being able to customize an implementation of a sender algorithm based on what scheduler any work it depends on will complete on.
+Senders being able to advertise what their [=completion schedulers=] are
+fulfills one of the promises of senders: that of being able to customize an
+implementation of a sender algorithm based on what scheduler any work it depends
+on will complete on.
 
-The simple way to provide customizations for functions like `then`, that is for [=sender adaptors=] and [=sender consumers=], is to follow the customization scheme that has been adopted for C++20 ranges library; to do that, we would define
-the expression `execution::then(sender, invocable)` to be equivalent to:
+The simple way to provide customizations for functions like `then`, that is for
+[=sender adaptors=] and [=sender consumers=], is to follow the customization
+scheme that has been adopted for C++20 ranges library; to do that, we would
+define the expression `execution::then(sender, invocable)` to be equivalent to:
 
   1. `sender.then(invocable)`, if that expression is well-formed; otherwise
-  2. `then(sender, invocable)`, performed in a context where this call always performs ADL, if that expression is well-formed; otherwise
-  3. a default implementation of `then`, which returns a sender adaptor, and then define the exact semantics of said adaptor.
-
-However, this definition is problematic. Imagine another sender adaptor, `bulk`, which is a structured abstraction for a loop over an index space. Its default implementation is just a for loop. However, for accelerator runtimes like CUDA, we would like sender algorithms
-like `bulk` to have specialized behavior, which invokes a kernel of more than one thread (with its size defined by the call to `bulk`); therefore, we would like to customize `bulk` for CUDA senders to achieve this. However, there's no reason for CUDA kernels to
-necessarily customize the `then` sender adaptor, as the generic implementation is perfectly sufficient. This creates a problem, though; consider the following snippet:
+  2. `then(sender, invocable)`, performed in a context where this call always
+      performs ADL, if that expression is well-formed; otherwise
+  3. a default implementation of `then`, which returns a sender adaptor, and
+      then define the exact semantics of said adaptor.
+
+However, this definition is problematic. Imagine another sender adaptor, `bulk`,
+which is a structured abstraction for a loop over an index space. Its default
+implementation is just a for loop. However, for accelerator runtimes like CUDA,
+we would like sender algorithms like `bulk` to have specialized behavior, which
+invokes a kernel of more than one thread (with its size defined by the call to
+`bulk`); therefore, we would like to customize `bulk` for CUDA senders to
+achieve this. However, there's no reason for CUDA kernels to necessarily
+customize the `then` sender adaptor, as the generic implementation is perfectly
+sufficient. This creates a problem, though; consider the following snippet:
 
 
 execution::scheduler auto cuda_sch = cuda_scheduler{};
@@ -2900,8 +3639,10 @@ execution::sender auto next = execution::then(cuda_sch, []{ return 1; });
 execution::sender auto kernel_sender = execution::bulk(next, shape, [](int i){ ... });
 
-How can we specialize the `bulk` sender adaptor for our wrapped `schedule_sender`? Well, here's one possible approach, taking advantage of ADL (and the fact that the definition of "associated namespace" also recursively enumerates the associated namespaces of all template -parameters of a type): +How can we specialize the `bulk` sender adaptor for our wrapped +`schedule_sender`? Well, here's one possible approach, taking advantage of ADL +(and the fact that the definition of "associated namespace" also recursively +enumerates the associated namespaces of all template parameters of a type):
 namespace cuda::for_adl_purposes {
@@ -2921,210 +3662,234 @@ execution::sender auto bulk(
 } // namespace cuda::for_adl_purposes
 
-However, if the input sender is not just a `then_sender_adaptor` like in the example above, but another sender that overrides `bulk` by itself, as a member function, because its author believes they know an optimization for bulk - the specialization above will no -longer be selected, because a member function of the first argument is a better match than the ADL-found overload. - -This means that well-meant specialization of sender algorithms that are entirely scheduler-agnostic can have negative consequences. -The scheduler-specific specialization - which is essential for good performance on platforms providing specialized ways to launch certain sender algorithms - would not be selected in such cases. -But it's really the scheduler that should control the behavior of sender algorithms when a non-default implementation exists, not the sender. Senders merely describe work; schedulers, however, are the handle to the -runtime that will eventually execute said work, and should thus have the final say in *how* the work is going to be executed. - -Therefore, we are proposing the following customization scheme (also modified to take [[#design-dispatch]] into account): the expression `execution::(sender, args...)`, for any given sender algorithm that accepts a sender as its first argument, should be -equivalent to: - - 1. tag_invoke(<sender-algorithm>, get_completion_scheduler<Tag>(get_env(sender)), sender, args...), if that expression is well-formed; otherwise - 2. `tag_invoke(, sender, args...)`, if that expression is well-formed; otherwise - 4. a default implementation, if there exists a default implementation of the given sender algorithm. - -where Tag is one of `set_value`, `set_error`, or `set_stopped`. For most sender algorithms, the completion scheduler for `set_value` would be used, but for some (like `upon_error` or `let_stopped`), one of the others would be used. - -For sender algorithms which accept concepts other than `sender` as their first argument, we propose that the customization scheme remains as it has been in [[P0443R14]] so far, except it should also use `tag_invoke`. +However, if the input sender is not just a `then_sender_adaptor` like in the +example above, but another sender that overrides `bulk` by itself, as a member +function, because its author believes they know an optimization for bulk - the +specialization above will no longer be selected, because a member function of +the first argument is a better match than the ADL-found overload. + +This means that well-meant specialization of sender algorithms that are entirely +scheduler-agnostic can have negative consequences. The scheduler-specific +specialization - which is essential for good performance on platforms providing +specialized ways to launch certain sender algorithms - would not be selected in +such cases. But it's really the scheduler that should control the behavior of +sender algorithms when a non-default implementation exists, not the sender. +Senders merely describe work; schedulers, however, are the handle to the runtime +that will eventually execute said work, and should thus have the final say in +*how* the work is going to be executed. + +Therefore, we are proposing the following customization scheme: the expression +`execution::(sender, args...)`, for any given sender algorithm +that accepts a sender as its first argument, should do the following: + + 1. Create a sender that implements the default implementation of the sender + algorithm. That sender is tuple-like; it can be destructured into its + constituent parts: algorithm tag, data, and child sender(s). + + 2. We query the child sender for its *domain*. A **domain** is a tag type + associated with the scheduler that the child sender will complete on. + If there are multiple child senders, we query all of them for their + domains and require that they all be the same. + + 3. We use the domain to dispatch to a `transform_sender` customization, which + accepts the sender and optionally performs a domain-specific + transformation on it. This customization is expected to return a new + sender, which will be returned from `` in place of the + original sender. ## Sender adaptors are lazy ## {#design-laziness} -Contrary to early revisions of this paper, we propose to make all sender adaptors perform strictly lazy submission, unless specified otherwise (the one notable exception in this paper is [[#design-sender-adaptor-ensure_started]], whose sole purpose is to start an -input sender). +Contrary to early revisions of this paper, we propose to make all sender +adaptors perform strictly lazy submission, unless specified otherwise (the one +notable exception in this paper is [[#design-sender-adaptor-ensure_started]], +whose sole purpose is to start an input sender). - Strictly lazy submission means that there is a guarantee that no work is submitted to an execution resource before a receiver is connected to a sender, and `execution::start` is called on the resulting operation state. +Strictly lazy submission means that there is a guarantee +that no work is submitted to an execution resource before a receiver is +connected to a sender, and `execution::start` is called on the resulting +operation state. ## Lazy senders provide optimization opportunities ## {#design-fusion} -Because lazy senders fundamentally *describe* work, instead of describing or representing the submission of said work to an execution resource, and thanks to the flexibility of the customization of most sender algorithms, they provide an opportunity for fusing -multiple algorithms in a sender chain together, into a single function that can later be submitted for execution by an execution resource. There are two ways this can happen. - -The first (and most common) way for such optimizations to happen is thanks to the structure of the implementation: because all the work is done within callbacks invoked on the completion of an earlier sender, recursively up to the original source of computation, -the compiler is able to see a chain of work described using senders as a tree of tail calls, allowing for inlining and removal of most of the sender machinery. In fact, when work is not submitted to execution resources outside of the current thread of execution, -compilers are capable of removing the senders abstraction entirely, while still allowing for composition of functions across different parts of a program. - -The second way for this to occur is when a sender algorithm is specialized for a specific set of arguments. For instance, we expect that, for senders which are known to have been started already, [[#design-sender-adaptor-ensure_started]] will be an identity transformation, -because the sender algorithm will be specialized for such senders. Similarly, an implementation could recognize two subsequent [[#design-sender-adaptor-bulk]]s of compatible shapes, and merge them together into a single submission of a GPU kernel. +Because lazy senders fundamentally *describe* work, instead of describing or +representing the submission of said work to an execution resource, and thanks to +the flexibility of the customization of most sender algorithms, they provide an +opportunity for fusing multiple algorithms in a sender chain together, into a +single function that can later be submitted for execution by an execution +resource. There are two ways this can happen. + +The first (and most common) way for such optimizations to happen is thanks to +the structure of the implementation: because all the work is done within +callbacks invoked on the completion of an earlier sender, recursively up to the +original source of computation, the compiler is able to see a chain of work +described using senders as a tree of tail calls, allowing for inlining and +removal of most of the sender machinery. In fact, when work is not submitted to +execution resources outside of the current thread of execution, compilers are +capable of removing the senders abstraction entirely, while still allowing for +composition of functions across different parts of a program. + +The second way for this to occur is when a sender algorithm is specialized for a +specific set of arguments. For instance, we expect that, for senders which are +known to have been started already, [[#design-sender-adaptor-ensure_started]] +will be an identity transformation, because the sender algorithm will be +specialized for such senders. Similarly, an implementation could recognize two +subsequent [[#design-sender-adaptor-bulk]]s of compatible shapes, and merge them +together into a single submission of a GPU kernel. ## Execution resource transitions are two-step ## {#design-transition-details} -Because `execution::transfer` takes a sender as its first argument, it is not actually directly customizable by the target scheduler. This is by design: the target scheduler may not know how to transition from a scheduler such as a CUDA scheduler; -transitioning away from a GPU in an efficient manner requires making runtime calls that are specific to the GPU in question, and the same is usually true for other kinds of accelerators too (or for scheduler running on remote systems). To avoid this problem, -specialized schedulers like the ones mentioned here can still hook into the transition mechanism, and inject a sender which will perform a transition to the regular CPU execution resource, so that any sender can be attached to it. - -This, however, is a problem: because customization of sender algorithms must be controlled by the scheduler they will run on (see [[#design-customization]]), the type of the sender returned from `transfer` must be controllable by the target scheduler. Besides, the target -scheduler may itself represent a specialized execution resource, which requires additional work to be performed to transition to it. GPUs and remote node schedulers are once again good examples of such schedulers: executing code on their execution resources -requires making runtime API calls for work submission, and quite possibly for the data movement of the values being sent by the input sender passed into `transfer`. - -To allow for such customization from both ends, we propose the inclusion of a secondary transitioning sender adaptor, called `schedule_from`. This adaptor is a form of `schedule`, but takes an additional, second argument: the input sender. This adaptor is not -meant to be invoked manually by the end users; they are always supposed to invoke `transfer`, to ensure that both schedulers have a say in how the transitions are made. Any scheduler that specializes `transfer(snd, sch)` shall ensure that the -return value of their customization is equivalent to `schedule_from(sch, snd2)`, where `snd2` is a successor of `snd` that sends values equivalent to those sent by `snd`. - -The default implementation of `transfer(snd, sched)` is `schedule_from(sched, snd)`. +Because `execution::transfer` takes a sender as its first argument, it is not +actually directly customizable by the target scheduler. This is by design: the +target scheduler may not know how to transition from a scheduler such as +a CUDA scheduler; transitioning away from a GPU in an efficient manner requires +making runtime calls that are specific to the GPU in question, and the same is +usually true for other kinds of accelerators too (or for scheduler running on +remote systems). To avoid this problem, specialized schedulers like the ones +mentioned here can still hook into the transition mechanism, and inject a sender +which will perform a transition to the regular CPU execution resource, so that +any sender can be attached to it. + +This, however, is a problem: because customization of sender algorithms must be +controlled by the scheduler they will run on (see [[#design-customization]]), +the type of the sender returned from `transfer` must be controllable by the +target scheduler. Besides, the target scheduler may itself represent a +specialized execution resource, which requires additional work to be performed +to transition to it. GPUs and remote node schedulers are once again good +examples of such schedulers: executing code on their execution resources +requires making runtime API calls for work submission, and quite possibly for +the data movement of the values being sent by the input sender passed into +`transfer`. + +To allow for such customization from both ends, we propose the inclusion of a +secondary transitioning sender adaptor, called `schedule_from`. This adaptor is +a form of `schedule`, but takes an additional, second argument: the input +sender. This adaptor is not meant to be invoked manually by the end users; they +are always supposed to invoke `transfer`, to ensure that both schedulers have a +say in how the transitions are made. Any scheduler that specializes +`transfer(snd, sch)` shall ensure that the return value of their customization +is equivalent to `schedule_from(sch, snd2)`, where `snd2` is a successor of +`snd` that sends values equivalent to those sent by `snd`. + +The default implementation of `transfer(snd, sched)` is `schedule_from(sched, +snd)`. ## All senders are typed ## {#design-typed} -All senders must advertise the types they will [=send=] when they complete. -This is necessary for a number of features, and writing code in a way that's -agnostic of whether an input sender is typed or not in common sender adaptors -such as `execution::then` is hard. - -The mechanism for this advertisement is similar to the one in [[P0443R14]]; the -way to query the types is through `completion_signatures_of_t::value_types`. - -`completion_signatures_of_t::value_types` is a template that takes two -arguments: one is a tuple-like template, the other is a variant-like template. -The tuple-like argument is required to represent senders sending more than one -value (such as `when_all`). The variant-like argument is required to represent -senders that choose which specific values to send at runtime. - -There's a choice made in the specification of -[[#design-sender-consumer-sync_wait]]: it returns a tuple of values sent by the -sender passed to it, wrapped in `std::optional` to handle the `set_stopped` -signal. However, this assumes that those values can be represented as a tuple, -like here: +All senders must advertise the types they will send when they complete. There +are many sender adaptors that need this information. Even just transitioning +from one execution context to another requires temporarily storing the async +result data so it can be propagated in the new execution context. Doing that +efficiently requires knowing the type of the data. + +The mechanism a sender uses to advertise its completions is the +`get_completion_signatures` customization point, which takes an environment and +must return a specialization of the `execution::completion_signatures` class +template. The template parameters of `execution::completion_signatures` is a +list of function types that represent the completion operations of the sender. +for example, the type `execution::set_value_t(size_t, const char*)` indicates +that the sender can complete successfully by passing a `size_t` and a `const +char*` to the receiver's `set_value` function. + +This proposal includes utilities for parsing and manipulating the list of a +sender's completion signatures. For instance, `values_of_t` is a template alias +for accessing a sender's value completions. It takes a sender, an environment, +and two variadic template template parameters: a tuple-like template and a +variant-like template. You can get the value completions of `S` and `Env` with +value_types_of_t<S, Env, tuple-like, +variant-like>. For example, for a sender that can complete +successfully with either `Ts...` or `Us...`, `value_types_of_t` would name the type `std::variant, +std::tuple>`. + +## Customization points ## {#design-dispatch} + +Earlier versions of this paper used a dispatching technique known as +`tag_invoke` (see [[P1895R0]]) to allow for customization of basis operations +and sender algorithms. This technique used private friend functions named +"`tag_invoke`" that are found by argument-dependent look-up. The `tag_invoke` +overloads are distinguished from each other by their first argument, which is +the type of the customization point object being customized. For instance, to +customize the `execution::set_value` operation, a receiver type might do the +following:
-execution::sender auto sends_1 = ...;
-execution::sender auto sends_2 = ...;
-execution::sender auto sends_3 = ...;
-
-auto [a, b, c] = this_thread::sync_wait(
-    execution::when_all(
-        sends_1,
-        sends_2,
-        sends_3)
-    | execution::transfer(
-        execution::get_completion_scheduler<execution::set_value_t>(get_env(sends_1))),
-    ).value();
-// a == 1
-// b == 2
-// c == 3
+struct my_receiver {
+  friend void tag_invoke(execution::set_value_t, my_receiver&& self, int value) noexcept {
+    std::cout << "received value: " << value;
+  }
+  //...
+};
 
-This works well for senders that always send the same set of arguments. If we ignore the possibility of having a sender that sends different sets of arguments into a receiver, we can specify the "canonical" (i.e. required to be followed by all senders) form of -`value_types` of a sender which sends `Types...` to be as follows: +The `tag_invoke` technique, although it had its strengths, has been replaced +with a new (or rather, a very old) technique that uses explicit concept opt-ins +and named member functions. For instance, the `execution::set_value` operation +is now customized by defining a member function named `set_value` in the +receiver type. This technique is more explicit and easier to understand than +`tag_invoke`. This is what a receiver author would do to customize +`execution::set_value` now:
-template<template<typename ...> typename TupleLike>
-using value_types = TupleLike;
-
- -If senders could only ever send one specific set of values, this would probably need to be the required form of `value_types` for all senders; defining it otherwise would cause very weird results and should be considered a bug. - -This matter is somewhat complicated by the fact that (1) `set_value` for receivers can be overloaded and accept different sets of arguments, and (2) senders are allowed to send multiple different sets of values, depending on runtime conditions, the data they -consumed, and so on. To accomodate this, [[P0443R14]] also includes a second template parameter to `value_types`, one that represents a variant-like type. If we permit such senders, we would almost certainly need to require that the canonical form of `value_types` -for *all* senders (to ensure consistency in how they are handled, and to avoid accidentally interpreting a user-provided variant as a sender-provided one) sending the different sets of arguments `Types1...`, `Types2...`, ..., `TypesN...` to be as follows: +struct my_receiver { + using receiver_concept = execution::receiver_t; -
-template<
-    template<typename ...> typename TupleLike,
-    template<typename ...> typename VariantLike
->
-using value_types = VariantLike<
-    TupleLike<Types1...>,
-    TupleLike<Types2...>,
-    ...,
-    TupleLike<Types3...>
->;
+  void set_value(int value) && noexcept {
+    std::cout << "received value: " << value;
+  }
+  //...
+};
 
-This, however, introduces a couple of complications: - -1. A `just(1)` sender would also need to follow this structure, so the correct type for storing the value sent by it would be `std::variant>` or some such. This introduces a lot of compile time overhead for the simplest senders, and this overhead - effectively exists in all places in the code where `value_types` is queried, regardless of the tuple-like and variant-like templates passed to it. Such overhead does exist if only the tuple-like parameter exists, but is made much worse by adding this second - wrapping layer. -2. As a consequence of (1): because `sync_wait` needs to store the above type, it can no longer return just a `std::tuple` for `just(1)`; it has to return `std::variant>`. C++ currently does not have an easy way to destructure this; it may get - less awkward with pattern matching, but even then it seems extremely heavyweight to involve variants in this API, and for the purpose of generic code, the kind of the return type of `sync_wait` must be the same across all sender types. - -One possible solution to (2) above is to place a requirement on `sync_wait` that it can only accept senders which send only a single set of values, therefore removing the need for `std::variant` to appear in its API; because of this, we propose to expose both -`sync_wait`, which is a simple, user-friendly version of the sender consumer, but requires that `value_types` have only one possible variant, and `sync_wait_with_variant`, which accepts any sender, but returns an optional whose value type is the variant of all the -possible tuples sent by the input sender: +The only exception to this is the customization of queries. There is a need to +build queryable adaptors that can forward and open and unknowable set of queries +to some wrapped object. This is done by defining a member function named +`query` in the adaptor type that takes the query CPO object as its first +(and usually only) argument. A queryable adaptor might look like this:
-auto sync_wait_with_variant(
-    execution::sender auto sender
-) -> std::optional<std::variant<
-        std::tuple<values0-sent-by(sender)>,
-        std::tuple<values1-sent-by(sender)>,
-        ...,
-        std::tuple<valuesn-sent-by(sender)>
-    >>;
-
-auto sync_wait(
-    execution::sender auto sender
-) requires (always-sends-same-values(sender))
-    -> std::optional<std::tuple<values-sent-by(sender)>>;
-
- -## Ranges-style CPOs vs `tag_invoke` ## {#design-dispatch} - -The contemporary technique for customization in the Standard Library is customization point objects. A customization point object, will it look for member functions and then for nonmember functions with the same name as the customization point, and calls those if -they match. This is the technique used by the C++20 ranges library, and previous executors proposals ([[P0443R14]] and [[P1897R3]]) intended to use it as well. However, it has several unfortunate consequences: - -1. It does not allow for easy propagation of customization points unknown to the adaptor to a wrapped object, which makes writing universal adapter types much harder - and this proposal uses quite a lot of those. - -2. It effectively reserves names globally. Because neither member names nor ADL-found functions can be qualified with a namespace, every customization point object that uses the ranges scheme reserves the name for all types in all namespaces. This is unfortunate - due to the sheer number of customization points already in the paper, but also ones that we are envisioning in the future. It's also a big problem for one of the operations being proposed already: `sync_wait`. We imagine that if, in the future, C++ was to - gain fibers support, we would want to also have `std::this_fiber::sync_wait`, in addition to `std::this_thread::sync_wait`. However, because we would want the names to be the same in both cases, we would need to make the names of the customizations not match the - names of the customization points. This is undesirable. - -This paper proposes to instead use the mechanism described in [[P1895R0]]: `tag_invoke`; the wording for `tag_invoke` has been incorporated into the proposed specification in this paper. - -In short, instead of using globally reserved names, `tag_invoke` uses the type of the customization point object itself as the mechanism to find customizations. It globally reserves only a single name - `tag_invoke` - which itself is used the same way that -ranges-style customization points are used. All other customization points are defined in terms of `tag_invoke`. For example, the customization for `std::this_thread::sync_wait(s)` will call `tag_invoke(std::this_thread::sync_wait, s)`, instead of attempting -to invoke `s.sync_wait()`, and then `sync_wait(s)` if the member call is not valid. - -Using `tag_invoke` has the following benefits: - -1. It reserves only a single global name, instead of reserving a global name for every customization point object we define. +template <class Query, class Queryable, class... Args> +concept query_for = + execution::queryable<Queryable> && + requires (const Queryable& o, Args&&... args) { + o.query(Query(), (Args&&) args...); + }; -2. It is possible to propagate customizations to a subobject, because the information of which customization point is being resolved is in the type of an argument, and not in the name of the function: +template<class Allocator = std::allocator<>, + execution::queryable Base = execution::empty_env> +struct with_allocator { + Allocator alloc{}; + Base base{}; -
-    // forward most customizations to a subobject
-    template<typename Tag, typename ...Args>
-    friend auto tag_invoke(Tag && tag, wrapper & self, Args &&... args) {
-        return std::forward<Tag>(tag)(self.subobject, std::forward<Args>(args)...);
-    }
+  // Forward unknown queries to the wrapped object:
+  template<query_for<Base> Query>
+  decltype(auto) query(Query q) const {
+    return base.query(q);
+  }
 
-    // but override one of them with a specific value
-    friend auto tag_invoke(specific_customization_point_t, wrapper & self) {
-        return self.some_value;
-    }
-    
+ // Specialize the query for the allocator: + Allocator query(execution::get_allocator_t) const { + return alloc; + } +}; +
-3. It is possible to pass those as template arguments to types, because the information of which customization point is being resolved is in the type. Similarly to how [[P0443R14]] defines a polymorphic executor wrapper which accepts a list of properties it - supports, we can imagine scheduler and sender wrappers that accept a list of queries and operations they support. That list can contain the types of the customization point objects, and the polymorphic wrappers can then specialize those customization points on - themselves using `tag_invoke`, dispatching to manually constructed vtables containing pointers to specialized implementations for the wrapped objects. For an example of such a polymorphic wrapper, see - [unifex::any_unique](https://github.com/facebookexperimental/libunifex/blob/1a6fbfc9cc3829356ccbdcf9e8d1f3cc33a6d9e0/include/unifex/any_unique.hpp) - ([example](https://github.com/facebookexperimental/libunifex/blob/1a6fbfc9cc3829356ccbdcf9e8d1f3cc33a6d9e0/examples/any_unique.cpp)). +Customization of sender algorithms such as `execution::then` and +`execution::bulk` are handled differently because they must dispatch based on +where the sender is executing. See the section on [[#design-customization]] for +more information. # Specification # {#spec} Much of this wording follows the wording of [[P0443R14]]. -[[#spec-library]] is meant to be a diff relative to the wording of the [library] clause of [[N4885]]. - -[[#spec-utilities]] is meant to be a diff relative to the wording of the [utilities] clause of [[N4885]]. This diff applies changes from [[P1895R0]]. +[[#spec-utilities]] is meant to be a diff relative to the wording of the +[utilities] clause of [[N4885]]. -[[#spec-thread]] is meant to be a diff relative to the wording of the [thread] clause of [[N4885]]. This diff applies changes from [[P2175R0]]. +[[#spec-thread]] is meant to be a diff relative to the wording of the +[thread] clause of [[N4885]]. This diff applies changes from [[P2175R0]]. -[[#spec-execution]] is meant to be added as a new library clause to the working draft of C++. +[[#spec-execution]] is meant to be added as a new library clause to the working +draft of C++. # Exception handling [except] # {#spec-except} @@ -3134,15 +3899,18 @@ Much of this wording follows the wording of [[P0443R14]]. #### The `std::terminate` function [except.terminate] #### {#spec-except.terminate} -
At the end of the bulleted list in the Note in paragraph 1, add a new bullet as follows:
+
At the end of the bulleted list in the Note in paragraph 1, add +a new bullet as follows:
- - when a callback invocation exits via an exception when requesting stop on a `std::stop_source` - or a `std::in_place_stop_source` ([stopsource.mem], [stopsource.inplace.mem]), or in - the constructor of `std::stop_callback` or `std::in_place_stop_callback` - ([stopcallback.cons], [stopcallback.inplace.cons]) when a callback invocation exits - via an exception. + + - when a callback invocation exits via an exception when requesting stop on a + `std::stop_source` or a `std::in_place_stop_source` ([stopsource.mem], + [stopsource.inplace.mem]), or in the constructor of `std::stop_callback` or + `std::in_place_stop_callback` ([stopcallback.cons], + [stopcallback.inplace.cons]) when a callback invocation exits via an + exception.
@@ -3164,70 +3932,6 @@ headers [tab:headers.cpp] -
In subclause [conforming], after [lib.types.movedfrom], -add the following new subclause with suggested stable name [lib.tmpl-heads].
- - -
-**16.4.6.17 Class template-heads** - -1. If a class template's template-head is marked with "*arguments are not - associated entities*"", any template arguments do not contribute to the - associated entities ([basic.lookup.argdep]) of a function call where a - specialization of the class template is an associated entity. In such a case, - the class template can be implemented as an alias template referring to a - templated class, or as a class template where the template arguments - themselves are templated classes. - -2. [*Example:* - -
-    template<class T> // arguments are not associated entities
-    struct S {};
-
-    namespace N {
-      int f(auto);
-      struct A {};
-    }
-
-    int x = f(S<N::A>{});  // error: N::f not a candidate
-    
- - The template `S` specified above can be implemented as - -
-    template<class T>
-    struct s-impl {
-      struct type { };
-    };
-
-    template<class T>
-    using S = s-impl<T>::type;
-    
- - or as - -
-    template<class T>
-    struct hidden {
-      using type = struct _ {
-        using type = T;
-      };
-    };
-
-    template<class HiddenT>
-    struct s-impl {
-      using T = HiddenT::type;
-    };
-
-    template<class T>
-    using S = s-impl<typename hidden<T>::type>;
-    
- - -- end example] -
-
- # General utilities library [utilities] # {#spec-utilities} ## Function objects [function.objects] ## {#spec-function.objects} @@ -3239,88 +3943,27 @@ At the end of this subclause, insert the following declarations into the synopsi
-// expositon only:
 template<class Fn, class... Args>
-  concept callable =
+  concept callable =  // expositon only
     requires (Fn&& fn, Args&&... args) {
       std::forward<Fn>(fn)(std::forward<Args>(args)...);
     };
 template<class Fn, class... Args>
-  concept nothrow-callable =
+  concept nothrow-callable =   // expositon only
     callable<Fn, Args...> &&
     requires (Fn&& fn, Args&&... args) {
       { std::forward<Fn>(fn)(std::forward<Args>(args)...) } noexcept;
     };
+// expositon only:
 template<class Fn, class... Args>
   using call-result-t = decltype(declval<Fn>()(declval<Args>()...));
 
-// [func.tag_invoke], tag_invoke
-namespace tag-invoke { // exposition only
-  void tag_invoke();
-
-  template<class Tag, class... Args>
-    concept tag_invocable =
-      requires (Tag&& tag, Args&&... args) {
-        tag_invoke(std::forward<Tag>(tag), std::forward<Args>(args)...);
-      };
-
-  template<class Tag, class... Args>
-    concept nothrow_tag_invocable =
-      tag_invocable<Tag, Args...> &&
-      requires (Tag&& tag, Args&&... args) {
-        { tag_invoke(std::forward<Tag>(tag), std::forward<Args>(args)...) } noexcept;
-      };
-
-  template<class Tag, class... Args>
-    using tag_invoke_result_t =
-      decltype(tag_invoke(declval<Tag>(), declval<Args>()...));
-
-  template<class Tag, class... Args>
-    struct tag_invoke_result<Tag, Args...> {
-      using type =
-        tag_invoke_result_t<Tag, Args...>; // present if and only if tag_invocable<Tag, Args...> is true
-    };
-
-  struct tag; // exposition only
-}
-inline constexpr tag-invoke::tag tag_invoke {};
-using tag-invoke::tag_invocable;
-using tag-invoke::nothrow_tag_invocable;
-using tag-invoke::tag_invoke_result_t;
-using tag-invoke::tag_invoke_result;
-
 template<auto& Tag>
   using tag_t = decltype(auto(Tag));
 
-### `tag_invoke` [func.tag_invoke] ### {#spec-func.tag_invoke} - -Insert this subclause as a new subclause, between Searchers [func.search] and Class template `hash` [unord.hash]. - - -
- -1. Given a subexpression `E`, let REIFY(E) be expression-equivalent to - a glvalue with the same type and value as `E` as if by `identity()(E)`. - -2. The name `std::tag_invoke` denotes a customization point object [customization.point.object]. - Given subexpressions `T` and `A...`, the expression `std::tag_invoke(T, A...)` is - expression-equivalent [defns.expression-equivalent] to - tag_invoke(REIFY(T), REIFY(A)...) - with overload resolution performed in a context in which unqualified lookup for `tag_invoke` - finds only the declaration - - ```c++ - void tag_invoke(); - ``` - -2. [Note: Diagnosable ill-formed cases above result in substitution failure when `std::tag_invoke(T, A...)` appears in the immediate context of a template instantiation. —end note] - -
-
- # Thread support library [thread] # {#spec-thread} ## Stop tokens [thread.stoptoken] ## {#spec-thread.stoptoken} @@ -3373,7 +4016,8 @@ template<class T, class CB> ### Stop token concepts [thread.stoptoken.concepts] ### {#spec-thread.stoptoken.concepts} -Insert this subclause as a new subclause between Header `` synopsis [thread.stoptoken.syn] and Class `stop_token` [stoptoken]. +Insert this subclause as a new subclause between Header `` synopsis +[thread.stoptoken.syn] and Class `stop_token` [stoptoken].
@@ -3415,21 +4059,31 @@ template<class T> };
-
LWG directed me to replace `T::stop_possible()` with `t.stop_possible()` because -of the recent `constexpr` changes in [[P2280r2|P2280R2]]. However, even with those changes, a nested -requirement like `requires (!t.stop_possible())`, where `t` is an argument in the requirement-parameter-list, is ill-formed according to -[expr.prim.req.nested/p2]: +
LWG directed me to replace `T::stop_possible()` with +`t.stop_possible()` because of the recent `constexpr` changes in +[[P2280r2|P2280R2]]. However, even with those changes, a nested requirement like +`requires (!t.stop_possible())`, where `t` is an argument in the +requirement-parameter-list, is ill-formed according to [expr.prim.req.nested/p2]: -> A local parameter shall only appear as an unevaluated operand within the constraint-expression. +> A local parameter shall only appear as an unevaluated operand within the +> constraint-expression. This is the subject of core issue [[cwg2517|2517]].
-2. Let `t` and `u` be distinct, valid objects of type `T`. The type `T` models `stoppable_token` only if: +2. Let `t` and `u` be distinct, valid objects of type `T`. The type `T` models + `stoppable_token` only if: - 1. If `t.stop_possible()` evaluates to `false` then, if `t` and `u` reference the same logical shared stop state, `u.stop_possible()` shall also subsequently evaluate to `false` and `u.stop_requested()` shall also subsequently evaluate to `false`. + 1. If `t.stop_possible()` evaluates to `false` then, if `t` and `u` + reference the same logical shared stop state, `u.stop_possible()` shall + also subsequently evaluate to `false` and `u.stop_requested()` shall also + subsequently evaluate to `false`. - 2. If `t.stop_requested()` evaluates to `true` then, if `t` and `u` reference the same logical shared stop state, `u.stop_requested()` shall also subsequently evaluate to `true` and `u.stop_possible()` shall also subsequently evaluate to `true`. + 2. If `t.stop_requested()` evaluates to `true` then, if `t` and `u` + reference the same logical shared stop state, `u.stop_requested()` shall + also subsequently evaluate to `true` and `u.stop_possible()` shall also + subsequently evaluate to `true`. 3. Let `t` and `u` be distinct, valid objects of type `T` and let `init` be an object of type `Initializer`. Then for some type `CB`, the type `T` models @@ -3454,21 +4108,31 @@ This is the subject of core issue [[cwg2517|2517]]. registered then `callback` can be invoked on the thread executing `cb`'s constructor. - 2. If `callback` is invoked then, if `t` and `u` reference the same shared stop - state, an evaluation of `u.stop_requested()` will be `true` - if the beginning of the invocation of `callback` + 2. If `callback` is invoked then, if `t` and `u` reference the same + shared stop state, an evaluation of `u.stop_requested()` will be + `true` if the beginning of the invocation of `callback` strongly-happens-before the evaluation of `u.stop_requested()`. - 3. [*Note:* If `t.stop_possible()` evaluates to `false` then the construction of - `cb` is not required to construct and initialize `callback`. *--end note*] + 3. If `t.stop_possible()` evaluates to `false` + then the construction of `cb` is not required to construct and + initialize `callback`. - 3. Construction of a `T::callback_type` instance shall only throw exceptions thrown by the initialization of the `CB` instance from the value of type `Initializer`. + 3. Construction of a `T::callback_type` instance shall only throw + exceptions thrown by the initialization of the `CB` instance from the + value of type `Initializer`. - 4. Destruction of the `T::callback_type` object, `cb`, removes `callback` from the shared stop state such that `callback` will not be invoked after the destructor returns. + 4. Destruction of the `T::callback_type` object, `cb`, removes + `callback` from the shared stop state such that `callback` will not be + invoked after the destructor returns. - 1. If `callback` is currently being invoked on another thread then the destructor of `cb` will block until the invocation of `callback` returns such that the return from the invocation of `callback` strongly-happens-before the destruction of `callback`. + 1. If `callback` is currently being invoked on another thread then the + destructor of `cb` will block until the invocation of `callback` + returns such that the return from the invocation of `callback` + strongly-happens-before the destruction of `callback`. - 2. Destruction of a callback `cb` shall not block on the completion of the invocation of some other callback registered with the same shared stop state. + 2. Destruction of a callback `cb` shall not block on the completion of + the invocation of some other callback registered with the same shared + stop state. @@ -3477,7 +4141,8 @@ This is the subject of core issue [[cwg2517|2517]]. #### General [stoptoken.general] #### {#spec-stoptoken.general} -Modify the synopsis of class `stop_token` in subclause General [stoptoken.general] as follows: +Modify the synopsis of class `stop_token` in subclause General +[stoptoken.general] as follows:
 namespace std {
@@ -3494,11 +4159,15 @@ namespace std {
 
 ### Class `never_stop_token` [stoptoken.never] ### {#spec-stoptoken.never}
 
-Insert a new subclause, Class `never_stop_token` [stoptoken.never], after subclause Class template `stop_callback` [stopcallback], as a new subclause of Stop tokens [thread.stoptoken].
+Insert a new subclause, Class `never_stop_token` [stoptoken.never], after
+subclause Class template `stop_callback` [stopcallback], as a new
+subclause of Stop tokens [thread.stoptoken].
 
 #### General [stoptoken.never.general] #### {#spec-stoptoken.never.general}
 
-1. The class `never_stop_token` provides an implementation of the `unstoppable_token` concept. It provides a stop token interface, but also provides static information that a stop is never possible nor requested.
+1. The class `never_stop_token` provides an implementation of the
+    `unstoppable_token` concept. It provides a stop token interface, but also
+    provides static information that a stop is never possible nor requested.
 
 
 namespace std
@@ -3522,12 +4191,19 @@ namespace std
 
 ### Class `in_place_stop_token` [stoptoken.inplace] ### {#spec-stoptoken.inplace}
 
-Insert a new subclause, Class `in_place_stop_token` [stoptoken.inplace], after the subclause added above, as a new subclause of Stop tokens [thread.stoptoken].
+Insert a new subclause, Class `in_place_stop_token` [stoptoken.inplace],
+after the subclause added above, as a new subclause of Stop tokens
+[thread.stoptoken].
 
 #### General [stoptoken.inplace.general] #### {#spec-stoptoken.inplace.general}
 
-1. The class `in_place_stop_token` provides an interface for querying whether a stop request has been made (`stop_requested`) or can ever be made (`stop_possible`) using an associated `in_place_stop_source` object ([stopsource.inplace]).
-    An `in_place_stop_token` can also be passed to an `in_place_stop_callback` ([stopcallback.inplace]) constructor to register a callback to be called when a stop request has been made from an associated `in_place_stop_source`.
+1. The class `in_place_stop_token` provides an interface for querying whether a
+    stop request has been made (`stop_requested`) or can ever be made
+    (`stop_possible`) using an associated `in_place_stop_source` object
+    ([stopsource.inplace]). An `in_place_stop_token` can also be passed to an
+    `in_place_stop_callback` ([stopcallback.inplace]) constructor to register a
+    callback to be called when a stop request has been made from an associated
+    `in_place_stop_source`.
 
 
 namespace std {
@@ -3566,7 +4242,8 @@ in_place_stop_token() noexcept;
 void swap(stop_token& rhs) noexcept;
 
-2. *Effects*: Exchanges the values of source_ and rhs.source_. +2. *Effects*: Exchanges the values of source_ and + rhs.source_. #### Members [stoptoken.inplace.mem] #### {#spec-stoptoken.inplace.mem} @@ -3574,11 +4251,12 @@ void swap(stop_token& rhs) noexcept; [[nodiscard]] bool stop_requested() const noexcept;
-1. *Effects*: Equivalent to: return source_ != nullptr && source_->stop_requested(); +1. *Effects*: Equivalent to: return source_ != nullptr && + source_->stop_requested(); -2. [*Note*: The behavior of `stop_requested()` is undefined unless the call - strongly happens before the start of the destructor of the associated - `in_place_stop_source`, if any ([basic.life]). --*end note*] +2. The behavior of `stop_requested()` is undefined unless + the call strongly happens before the start of the destructor of the + associated `in_place_stop_source`, if any ([basic.life]).
 [[nodiscard]] bool stop_possible() const noexcept;
@@ -3586,9 +4264,10 @@ void swap(stop_token& rhs) noexcept;
 
 3. *Effects*: Equivalent to: return source_ != nullptr;
 
-4. [*Note*: The behavior of `stop_possible()` is implementation-defined unless
-    the call strongly happens before the end of the storage duration of the
-    associated `in_place_stop_source` object, if any ([basic.stc.general]). --*end note*]
+4. The behavior of `stop_possible()` is
+    implementation-defined unless the call strongly happens before the end of
+    the storage duration of the associated `in_place_stop_source` object, if any
+    ([basic.stc.general]).
 
 #### Non-member functions [stoptoken.inplace.nonmembers] #### {#spec-stoptoken.inplace.nonmembers}
 
@@ -3600,14 +4279,20 @@ friend void swap(in_place_stop_token& x, in_place_stop_token& y) noexcept;
 
 ### Class `in_place_stop_source` [stopsource.inplace] ### {#spec-stopsource.inplace}
 
-Insert a new subclause, Class `in_place_stop_source` [stopsource.inplace], after the subclause added above, as a new subclause of Stop tokens [thread.stoptoken].
+Insert a new subclause, Class `in_place_stop_source`
+[stopsource.inplace], after the subclause added above, as a new subclause
+of Stop tokens [thread.stoptoken].
 
 #### General [stopsource.inplace.general] #### {#spec-stopsource.inplace.general}
 
-1. The class `in_place_stop_source` implements the semantics of making a stop request, without the need for a dynamic allocation of a shared state.
-    A stop request made on a `in_place_stop_source` object is visible to all associated `in_place_stop_token` ([stoptoken.inplace]) objects.
-    Once a stop request has been made it cannot be withdrawn (a subsequent stop request has no effect).
-    All uses of `in_place_stop_token` objects associated with a given `in_place_stop_source` object must happen before the start of the destructor of that `in_place_stop_source` object.
+1. The class `in_place_stop_source` implements the semantics of making a stop
+    request, without the need for a dynamic allocation of a shared state. A stop
+    request made on a `in_place_stop_source` object is visible to all associated
+    `in_place_stop_token` ([stoptoken.inplace]) objects. Once a stop request has
+    been made it cannot be withdrawn (a subsequent stop request has no effect).
+    All uses of `in_place_stop_token` objects associated with a given
+    `in_place_stop_source` object must happen before the start of the destructor
+    of that `in_place_stop_source` object.
 
 
 namespace std {
@@ -3628,26 +4313,30 @@ namespace std {
 }
 
-2. An instance of `in_place_stop_source` maintains a list of registered callback invocations. - The registration of a callback invocation either succeeds or fails. When an invocation - of a callback is registered, the following happens atomically: +2. An instance of `in_place_stop_source` maintains a list of registered callback + invocations. The registration of a callback invocation either succeeds or + fails. When an invocation of a callback is registered, the following happens + atomically: - - The stop state is checked. If stop has not been requested, the callback invocation is - added to the list of registered callback invocations, and registration has succeeded. + - The stop state is checked. If stop has not been requested, the callback + invocation is added to the list of registered callback invocations, + and registration has succeeded. - - Otherwise, registration has failed. + - Otherwise, registration has failed. - When an invocation of a callback is unregistered, the invocation is atomically removed - from the list of registered callback invocations. The removal is not blocked by the concurrent - execution of another callback invocation in the list. If the callback invocation - being unregistered is currently executing, then: + When an invocation of a callback is unregistered, the invocation is + atomically removed from the list of registered callback invocations. The + removal is not blocked by the concurrent execution of another callback + invocation in the list. If the callback invocation being unregistered is + currently executing, then: - - If the execution of the callback invocation is happening concurrently on another thread, - the completion of the execution strongly happens before ([intro.races]) the end of the - callback's lifetime. + - If the execution of the callback invocation is happening concurrently on + another thread, the completion of the execution strongly happens + before ([intro.races]) the end of the callback's lifetime. - - Otherwise, the execution is happening on the current thread. Removal of the - callback invocation does not block waiting for the execution to complete. + - Otherwise, the execution is happening on the current thread. Removal of + the callback invocation does not block waiting for the execution to + complete. #### Constructors, copy, and assignment [stopsource.inplace.cons] #### {#spec-stopsource.inplace.cons} @@ -3671,16 +4360,20 @@ in_place_stop_source() noexcept; [[nodiscard]] bool stop_requested() const noexcept;
-3. *Returns*: `true` if the stop state inside `*this` has received a stop request; otherwise, `false`. +3. *Returns*: `true` if the stop state inside `*this` has received a stop + request; otherwise, `false`.
 bool request_stop() noexcept;
 
-4. *Effects*: Atomically determines whether the stop state inside `*this` has received a stop request, and if not, makes a stop request. - The determination and making of the stop request are an atomic read-modify-write operation ([intro.races]). - If the request was made, the registered invocations are executed and the evaluations of the invocations are indeterminately sequenced. - If an invocation of a callback exits via an exception then `terminate` is invoked ([except.terminate]). +4. *Effects*: Atomically determines whether the stop state inside `*this` has + received a stop request, and if not, makes a stop request. The determination + and making of the stop request are an atomic read-modify-write operation + ([intro.races]). If the request was made, the registered invocations are + executed and the evaluations of the invocations are indeterminately + sequenced. If an invocation of a callback exits via an exception then + `terminate` is invoked ([except.terminate]). 5. *Postconditions*: `stop_requested()` is `true`. @@ -3688,7 +4381,9 @@ bool request_stop() noexcept; ### Class template `in_place_stop_callback` [stopcallback.inplace] ### {#spec-stopcallback.inplace} -Insert a new subclause, Class template `in_place_stop_callback` [stopcallback.inplace], after the subclause added above, as a new subclause of Stop tokens [thread.stoptoken]. +Insert a new subclause, Class template `in_place_stop_callback` +[stopcallback.inplace], after the subclause added above, as a new +subclause of Stop tokens [thread.stoptoken]. #### General [stopcallback.inplace.general] #### {#spec-stopcallback.inplace.general} @@ -3719,11 +4414,17 @@ Insert a new subclause, Class template `in_place_stop_callback` [stopcallback }
-2. *Mandates*: `in_place_stop_callback` is instantiated with an argument for the template parameter `Callback` that satisfies both `invocable` and `destructible`. +2. *Mandates*: `in_place_stop_callback` is instantiated with an argument for the + template parameter `Callback` that satisfies both `invocable` and + `destructible`. -3. *Preconditions*: `in_place_stop_callback` is instantiated with an argument for the template parameter `Callback` that models both `invocable` and `destructible`. +3. *Preconditions*: `in_place_stop_callback` is instantiated with an argument + for the template parameter `Callback` that models both `invocable` and + `destructible`. -4. *Recommended practice*: Implementations should use the storage of the `in_place_stop_callback` objects to store the state necessary for their association with an `in_place_stop_source` object. +4. *Recommended practice*: Implementations should use the storage of the + `in_place_stop_callback` objects to store the state necessary for their + association with an `in_place_stop_source` object. #### Constructors and destructor [stopcallback.inplace.cons] #### {#spec-stopcallback.inplace.cons} @@ -3737,17 +4438,20 @@ template<class C> 2. *Preconditions*: `Callback` and `C` model `constructible_from`. -3. *Effects*: Initializes callback_ with `std::forward(cb)`. - Any `in_place_stop_source` associated with `st` becomes associated with `*this`. - Registers ([stopsource.inplace.general]) the callback invocation - std::forward<Callback>(callback_)() with the associated - `in_place_stop_source`, if any. If the registration fails, evaluates - the callback invocation. +3. *Effects*: Initializes callback_ with + `std::forward(cb)`. Any `in_place_stop_source` associated with `st` + becomes associated with `*this`. Registers ([stopsource.inplace.general]) + the callback invocation + std::forward<Callback>(callback_)() with the + associated `in_place_stop_source`, if any. If the registration fails, + evaluates the callback invocation. -4. *Throws*: Any exception thrown by the initialization of callback_. +4. *Throws*: Any exception thrown by the initialization of + callback_. -5. *Remarks*: If evaluating std::forward<Callback>(callback_)() - exits via an exception, then `terminate` is invoked ([except.terminate]). +5. *Remarks*: If evaluating + std::forward<Callback>(callback_)() exits via an + exception, then `terminate` is invoked ([except.terminate]).
 ~in_place_stop_callback();
@@ -3832,16 +4536,12 @@ template<class C>
 
 4. This clause makes use of the following exposition-only entities:
 
-    1. 
-        template<class Fn, class... Args>
-            requires callable<Fn, Args...>
-          constexpr auto mandate-nothrow-call(Fn&& fn, Args&&... args) noexcept
-            -> call-result-t<Fn, Args...> {
-            return std::forward<Fn>(fn)(std::forward<Args>(args)...);
-          }
-        
- - * Mandates: nothrow-callable<Fn, Args...> is `true`. + 1. For a subexpression expr, let + MANDATE-NOTHROW(expr) + be expression-equivalent to expr. + + * Mandates: noexcept(expr) is + true. 2.
         template<class T>
@@ -3851,9 +4551,10 @@ template<class C>
             (!is_array_v<remove_cvref_t<T>>);
         
- 3. For function types `F1` and `F2` denoting `R1(Args1...)` and `R2(Args2...)` - respectively, MATCHING-SIG(F1, F2) is `true` if and only if - `same_as` is `true`. + 3. For function types `F1` and `F2` denoting `R1(Args1...)` and + `R2(Args2...)` respectively, MATCHING-SIG(F1, F2) is + `true` if and only if `same_as` is + `true`. 4. For a subexpression `err`, let `Err` be `decltype((err))` and let AS-EXCEPT-PTR(err) be: @@ -3862,11 +4563,11 @@ template<class C> - *Mandates:* `err != exception_ptr()` is `true` - 2. Otherwise, `make_exception_ptr(system_error(err))` if `decay_t` denotes the type `error_code`, + 2. Otherwise, `make_exception_ptr(system_error(err))` if `decay_t` + denotes the type `error_code`, 3. Otherwise, `make_exception_ptr(err)`. - ## Queries and queryables [exec.queryable] ## {#spec-execution.queryable} ### General [exec.queryable.general] ### {#spec-execution.queryable.general} @@ -3892,9 +4593,8 @@ template<class C> ([concepts.equality]) and does not modify the function object or the arguments. -5. If tag_invoke(q, env, args...) is well-formed, then - q(env, args...) is expression-equivalent to - tag_invoke(q, env, args...). +5. If the expression env.query(q, args...) is well-formed, + then it is expression-equivalent to q(env, args...). 6. Unless otherwise specified, the value returned by the expression q(env, args...) is valid as long as `env` is valid. @@ -3909,11 +4609,11 @@ template<class C> 1. The `queryable` concept specifies the constraints on the types of queryable objects. -2. Let `env` be an object of type `Env`. The type `Env` models `queryable` if for each - callable object q and a pack of subexpressions `args`, - if requires { q(env, args...) } is `true` then - q(env, args...) meets any semantic requirements imposed by - q. +2. Let `env` be an object of type `Env`. The type `Env` models `queryable` if + for each callable object q and a pack of subexpressions + `args`, if requires { q(env, args...) } is `true` then + q(env, args...) meets any semantic requirements imposed + by q. ## Asynchronous operations [async.ops] ## {#spec-execution-async.ops} @@ -4111,17 +4811,13 @@ namespace std { // [exec.queryable], queryable objects template<class T> - concept queryable = destructible; + concept queryable = destructible<T>; // [exec.queries], queries - namespace queries { // exposition only - struct forwarding_query_t; - struct get_allocator_t; - struct get_stop_token_t; - } - using queries::forwarding_query_t; - using queries::get_allocator_t; - using queries::get_stop_token_t; + struct forwarding_query_t; + struct get_allocator_t; + struct get_stop_token_t; + inline constexpr forwarding_query_t forwarding_query{}; inline constexpr get_allocator_t get_allocator{}; inline constexpr get_stop_token_t get_stop_token{}; @@ -4138,19 +4834,13 @@ namespace std { namespace std::execution { // [exec.queries], queries enum class forward_progress_guarantee; - namespace queries { // exposition only - struct get_domain_t; - struct get_scheduler_t; - struct get_delegatee_scheduler_t; - struct get_forward_progress_guarantee_t; - template<class CPO> - struct get_completion_scheduler_t; - } - using queries::get_domain_t; - using queries::get_scheduler_t; - using queries::get_delegatee_scheduler_t; - using queries::get_forward_progress_guarantee_t; - using queries::get_completion_scheduler_t; + struct get_domain_t; + struct get_scheduler_t; + struct get_delegatee_scheduler_t; + struct get_forward_progress_guarantee_t; + template<class CPO> + struct get_completion_scheduler_t; + inline constexpr get_domain_t get_domain{}; inline constexpr get_scheduler_t get_scheduler{}; inline constexpr get_delegatee_scheduler_t get_delegatee_scheduler{}; @@ -4158,12 +4848,8 @@ namespace std::execution { template<class CPO> inline constexpr get_completion_scheduler_t<CPO> get_completion_scheduler{}; - namespace exec-envs { // exposition only - struct empty_env {}; - struct get_env_t; - } - using envs-envs::empty_env; - using envs-envs::get_env_t; + struct empty_env {}; + struct get_env_t; inline constexpr get_env_t get_env {}; template<class T> @@ -4173,49 +4859,40 @@ namespace std::execution { struct default_domain; // [exec.sched], schedulers + struct scheduler_t {}; + template<class Sch> concept scheduler = see below; // [exec.recv], receivers struct receiver_t {}; - template<class Rcvr> - inline constexpr bool enable_receiver = see below; - template<class Rcvr> concept receiver = see below; template<class Rcvr, class Completions> concept receiver_of = see below; - namespace receivers { // exposition only - struct set_value_t; - struct set_error_t; - struct set_stopped_t; - } - using receivers::set_value_t; - using receivers::set_error_t; - using receivers::set_stopped_t; + struct set_value_t; + struct set_error_t; + struct set_stopped_t; + inline constexpr set_value_t set_value{}; inline constexpr set_error_t set_error{}; inline constexpr set_stopped_t set_stopped{}; // [exec.opstate], operation states + struct operation_state_t {}; + template<class O> concept operation_state = see below; - namespace op-state { // exposition only - struct start_t; - } - using op-state::start_t; + struct start_t; inline constexpr start_t start{}; // [exec.snd], senders struct sender_t {}; - template<class Sndr> - inline constexpr bool enable_sender = see below; - template<class Sndr> concept sender = see below; @@ -4235,10 +4912,7 @@ namespace std::execution { concept single-sender = see below; // exposition only // [exec.getcomplsigs], completion signatures - namespace completion-signatures { // exposition only - struct get_completion_signatures_t; - } - using completion-signatures::get_completion_signatures_t; + struct get_completion_signatures_t; inline constexpr get_completion_signatures_t get_completion_signatures {}; template<class Sndr, class Env = empty_env> @@ -4272,40 +4946,38 @@ namespace std::execution { using tag_of_t = see below; // [exec.snd.transform], sender transformations - template<class Domain, sender Sndr> - constexpr sender decltype(auto) transform_sender(Domain dom, Sndr&& sndrv); - - template<class Domain, sender Sndr, queryable Env> - constexpr sender decltype(auto) transform_sender(Domain dom, Sndr&& sndr, const Env& env); + template<class Domain, sender Sndr, queryable... Env> + requires (sizeof...(Env) <= 1) + constexpr sender decltype(auto) transform_sender( + Domain dom, Sndr&& sndr, const Env&... env) noexcept(see below); + // [exec.snd.transform.env], environment transformations template<class Domain, sender Sndr, queryable Env> - constexpr decltype(auto) transform_env(Domain dom, Sndr&& sndr, Env&& env) noexcept; + constexpr queryable decltype(auto) transform_env( + Domain dom, Sndr&& sndr, Env&& env) noexcept; // [exec.snd.apply], sender algorithm application template<class Domain, class Tag, sender Sndr, class... Args> - constexpr decltype(auto) apply_sender(Domain dom, Tag, Sndr&& sndr, Args&&... args) noexcept(see below); + constexpr decltype(auto) apply_sender( + Domain dom, Tag, Sndr&& sndr, Args&&... args) noexcept(see below); // [exec.connect], the connect sender algorithm - namespace senders-connect { // exposition only - struct connect_t; - } - using senders-connect::connect_t; + struct connect_t; inline constexpr connect_t connect{}; template<class Sndr, class Rcvr> - using connect_result_t = decltype(connect(declval<Sndr>(), declval<Rcvr>())); + using connect_result_t = + decltype(connect(declval<Sndr>(), declval<Rcvr>())); // [exec.factories], sender factories - namespace sender-factories { // exposition only - struct just_t; - struct just_error_t; - struct just_stopped_t; - struct schedule_t; - } - inline constexpr just just{}; + struct just_t; + struct just_error_t; + struct just_stopped_t; + struct schedule_t; + + inline constexpr just_t just{}; inline constexpr just_error_t just_error{}; inline constexpr just_stopped_t just_stopped{}; - using sender-factories::schedule_t; inline constexpr schedule_t schedule{}; inline constexpr unspecified read{}; @@ -4313,109 +4985,62 @@ namespace std::execution { using schedule_result_t = decltype(schedule(declval<Sndr>())); // [exec.adapt], sender adaptors - namespace sender-adaptor-closure { // exposition only - template<class-type D> - struct sender_adaptor_closure { }; - } - using sender-adaptor-closure::sender_adaptor_closure; - - namespace sender-adaptors { // exposition only - struct on_t; - struct transfer_t; - struct schedule_from_t; - struct then_t; - struct upon_error_t; - struct upon_stopped_t; - struct let_value_t; - struct let_error_t; - struct let_stopped_t; - struct bulk_t; - struct split_t; - struct when_all_t; - struct when_all_with_variant_t; - struct into_variant_t; - struct stopped_as_optional_t; - struct stopped_as_error_t; - struct ensure_started_t; - } - using sender-adaptors::on_t; - using sender-adaptors::transfer_t; - using sender-adaptors::schedule_from_t; - using sender-adaptors::then_t; - using sender-adaptors::upon_error_t; - using sender-adaptors::upon_stopped_t; - using sender-adaptors::let_value_t; - using sender-adaptors::let_error_t; - using sender-adaptors::let_stopped_t; - using sender-adaptors::bulk_t; - using sender-adaptors::split_t; - using sender-adaptors::when_all_t; - using sender-adaptors::when_all_with_variant_t; - using sender-adaptors::into_variant_t; - using sender-adaptors::stopped_as_optional_t; - using sender-adaptors::stopped_as_error_t; - using sender-adaptors::ensure_started_t; + template<class-type D> + struct sender_adaptor_closure { }; + + struct on_t; + struct transfer_t; + struct schedule_from_t; + struct then_t; + struct upon_error_t; + struct upon_stopped_t; + struct let_value_t; + struct let_error_t; + struct let_stopped_t; + struct bulk_t; + struct split_t; + struct ensure_started_t; + struct when_all_t; + struct when_all_with_variant_t; + struct into_variant_t; + struct stopped_as_optional_t; + struct stopped_as_error_t; inline constexpr on_t on{}; inline constexpr transfer_t transfer{}; inline constexpr schedule_from_t schedule_from{}; - inline constexpr then_t then{}; inline constexpr upon_error_t upon_error{}; inline constexpr upon_stopped_t upon_stopped{}; - inline constexpr let_value_t let_value{}; inline constexpr let_error_t let_error{}; inline constexpr let_stopped_t let_stopped{}; - inline constexpr bulk_t bulk{}; - inline constexpr split_t split{}; + inline constexpr ensure_started_t ensure_started{}; inline constexpr when_all_t when_all{}; inline constexpr when_all_with_variant_t when_all_with_variant{}; - inline constexpr into_variant_t into_variant{}; - inline constexpr stopped_as_optional_t stopped_as_optional; - inline constexpr stopped_as_error_t stopped_as_error; - inline constexpr ensure_started_t ensure_started{}; - // [exec.consumers], sender consumers - namespace sender-consumers { // exposition only - struct start_detached_t; - } - using sender-consumers::start_detached_t; + struct start_detached_t; inline constexpr start_detached_t start_detached{}; - // [exec.utils], sender and receiver utilities - // [exec.utils.rcvr.adptr] - template< - class-type Derived, - receiver Base = unspecified> // arguments are not associated entities ([lib.tmpl-heads]) - class receiver_adaptor; - + // [exec.utils], sender and receiver utilities + // [exec.utils.cmplsigs] template<class Fn> concept completion-signature = // exposition only see below; - // [exec.utils.cmplsigs] template<completion-signature... Fns> struct completion_signatures {}; - template<class... Args> // exposition only - using default-set-value = - completion_signatures<set_value_t(Args...)>; - - template<class Err> // exposition only - using default-set-error = - completion_signatures<set_error_t(Err)>; - template<class Sigs> // exposition only concept valid-completion-signatures = see below; - // [exec.utils.mkcmplsigs] + // [exec.utils.tfxcmplsigs] template< valid-completion-signatures InputSignatures, valid-completion-signatures AdditionalSignatures = completion_signatures<>, @@ -4434,50 +5059,33 @@ namespace std::execution { requires sender_in<Sndr, Env> using transform_completion_signatures_of = transform_completion_signatures< - completion_signatures_of_t<Sndr, Env>, AdditionalSignatures, SetValue, SetError, SetStopped>; + completion_signatures_of_t<Sndr, Env>, + AdditionalSignatures, SetValue, SetError, SetStopped>; // [exec.ctx], execution resources + // [exec.run.loop], run_loop class run_loop; } namespace std::this_thread { // [exec.queries], queries - namespace queries { // exposition only - struct execute_may_block_caller_t; - } - using queries::execute_may_block_caller_t; + struct execute_may_block_caller_t; inline constexpr execute_may_block_caller_t execute_may_block_caller{}; - namespace this-thread { // exposition only - struct sync-wait-env; // exposition only - template<class Sndr> - requires sender_in<Sndr, sync-wait-env> - using sync-wait-result-type = see below; // exposition only - template<class Sndr> - using sync-wait-with-variant-result-type = see below; // exposition only + struct sync_wait_t; + struct sync_wait_with_variant_t; - struct sync_wait_t; - struct sync_wait_with_variant_t; - } - using this-thread::sync_wait_t; - using this-thread::sync_wait_with_variant_t; inline constexpr sync_wait_t sync_wait{}; inline constexpr sync_wait_with_variant_t sync_wait_with_variant{}; } namespace std::execution { // [exec.execute], one-way execution - namespace execute { // exposition only - struct execute_t; - } - using execute::execute_t; + struct execute_t; inline constexpr execute_t execute{}; // [exec.as.awaitable] - namespace coro-utils { // exposition only - struct as_awaitable_t; - } - using coro-utils::as_awaitable_t; + struct as_awaitable_t; inline constexpr as_awaitable_t as_awaitable; // [exec.with.awaitable.senders] @@ -4514,8 +5122,8 @@ namespace std::execution { object `q` of type `Q`, `forwarding_query(q)` is expression-equivalent to: - 1. mandate-nothrow-call(tag_invoke, forwarding_query, - q) if that expression is well-formed. + 1. MANDATE-NOTHROW(q.query(forwarding_query)) if that + expression is well-formed. * Mandates: The expression above has type `bool` and is a core constant expressions if `q` is a core constant expression. @@ -4529,15 +5137,15 @@ namespace std::execution { 1. `get_allocator` asks an object for its associated allocator. -2. The name `get_allocator` denotes a query object. For some subexpression `env`, +2. The name `get_allocator` denotes a query object. For a subexpression `env`, `get_allocator(env)` is expression-equivalent to - mandate-nothrow-call(tag_invoke, get_allocator, - as_const(env)). + MANDATE-NOTHROW(as_const(env).query(get_allocator)). - * Mandates: The type of the expression above - satisfies Allocator. + * Mandates: If the expression above is well-formed, its type + satisfies Allocator. -3. `forwarding_query(get_allocator)` is `true`. +3. `forwarding_query(get_allocator)` is a core constant + expression and has value `true`. 4. `get_allocator()` (with no arguments) is expression-equivalent to `execution::read(get_allocator)` ([exec.read]). @@ -4546,11 +5154,11 @@ namespace std::execution { 1. `get_stop_token` asks an object for an associated stop token. -2. The name `get_stop_token` denotes a query object. For some subexpression `env`, +2. The name `get_stop_token` denotes a query object. For a subexpression `env`, `get_stop_token(env)` is expression-equivalent to: - 1. mandate-nothrow-call(tag_invoke, get_stop_token, - as_const(env)), if this expression is well-formed. + 1. MANDATE-NOTHROW(as_const(env).query(get_stop_token)) + if that expression is well-formed. * Mandates: The type of the expression above satisfies `stoppable_token`. @@ -4565,10 +5173,10 @@ namespace std::execution { ### `execution::get_env` [exec.get.env] ### {#spec-execution.environment.get_env} -1. `get_env` is a customization point object. For some subexpression `o` of type - `O`, `get_env(o)` is expression-equivalent to +1. `execution::get_env` is a customization point object. For a subexpression + `o`, `execution::get_env(o)` is expression-equivalent to: - 1. `tag_invoke(get_env, const_cast(o))` if that expression is + 1. `as_const(o).get_env()` if that expression is well-formed. * Mandates: The expression above is not potentially throwing, and @@ -4578,17 +5186,17 @@ namespace std::execution { 2. The value of `get_env(o)` shall be valid while `o` is valid. -3. When passed a sender object, `get_env` returns the sender's attributes. When - passed a receiver, `get_env` returns the receiver's environment. +3. When passed a sender object, `get_env` returns the + sender's attributes. When passed a receiver, `get_env` returns the + receiver's environment. ### `execution::get_domain` [exec.get.domain] ### {#spec-execution.get_domain} 1. `get_domain` asks an object for an associated execution domain tag. -2. The name `get_domain` denotes a query object. For some subexpression `env`, +2. The name `get_domain` denotes a query object. For a subexpression `env`, `get_domain(env)` is expression-equivalent to - mandate-nothrow-call(tag_invoke, get_domain, as_const(env)), - if this expression is well-formed. + MANDATE-NOTHROW(as_const(env).query(get_domain)). 3. `forwarding_query(execution::get_domain)` is a core constant expression and has value `true`. @@ -4600,31 +5208,36 @@ namespace std::execution { 1. `get_scheduler` asks an object for its associated scheduler. -2. The name `get_scheduler` denotes a query object. For some +2. The name `get_scheduler` denotes a query object. For a subexpression `env`, `get_scheduler(env)` is expression-equivalent to - mandate-nothrow-call(tag_invoke, get_scheduler, as_const(env)). + MANDATE-NOTHROW(as_const(env).query(get_scheduler)). - * Mandates: The type of the expression above satisfies `scheduler`. + * Mandates: If the expression above is well-formed, its type + satisfies `scheduler`. 3. `forwarding_query(execution::get_scheduler)` is a core constant expression and has value `true`. -4. `get_scheduler()` (with no arguments) is expression-equivalent to `execution::read(get_scheduler)` ([exec.read]). +4. `get_scheduler()` (with no arguments) is expression-equivalent to + `execution::read(get_scheduler)` ([exec.read]). ### `execution::get_delegatee_scheduler` [exec.get.delegatee.scheduler] ### {#spec-execution.get_delegatee_scheduler} -1. `get_delegatee_scheduler` asks an object for a scheduler that can be used to delegate work to for the purpose of forward progress delegation. +1. `get_delegatee_scheduler` asks an object for a scheduler that can be used to + delegate work to for the purpose of forward progress delegation. -2. The name `get_delegatee_scheduler` denotes a query object. For some +2. The name `get_delegatee_scheduler` denotes a query object. For a subexpression `env`, `get_delegatee_scheduler(env)` is expression-equivalent to - mandate-nothrow-call(tag_invoke, get_delegatee_scheduler, as_const(env)). + MANDATE-NOTHROW(as_const(env).query(get_delegatee_scheduler)). - * Mandates: The type of the expression above is satisfies `scheduler`. + * Mandates: If the expression above is well-formed, its type + satisfies `scheduler`. 3. `forwarding_query(execution::get_delegatee_scheduler)` is a core constant expression and has value `true`. -4. `get_delegatee_scheduler()` (with no arguments) is expression-equivalent to `execution::read(get_delegatee_scheduler)` ([exec.read]). +4. `get_delegatee_scheduler()` (with no arguments) is expression-equivalent to + `execution::read(get_delegatee_scheduler)` ([exec.read]). ### `execution::get_forward_progress_guarantee` [exec.get.forward.progress.guarantee] ### {#spec-execution.get_forward_progress_guarantee} @@ -4636,35 +5249,52 @@ enum class forward_progress_guarantee { };
-1. `get_forward_progress_guarantee` asks a scheduler about the forward progress guarantees of execution agents created by that scheduler. +1. `get_forward_progress_guarantee` asks a scheduler about the forward progress + guarantee of execution agents created by that scheduler. -2. The name `get_forward_progress_guarantee` denotes a query object. For some subexpression `sch`, let `Sch` be `decltype((sch))`. If `Sch` does not satisfy `scheduler`, `get_forward_progress_guarantee` is ill-formed. - Otherwise, `get_forward_progress_guarantee(sch)` is expression-equivalent to: +2. The name `get_forward_progress_guarantee` denotes a query object. For a + subexpression `sch`, let `Sch` be `decltype((sch))`. If `Sch` does not + satisfy `scheduler`, `get_forward_progress_guarantee` is ill-formed. + Otherwise, `get_forward_progress_guarantee(sch)` is expression-equivalent + to: - 1. mandate-nothrow-call(tag_invoke, get_forward_progress_guarantee, as_const(sch)), if this expression is well-formed. + 1. MANDATE-NOTHROW(as_const(sch).query(get_forward_progress_guarantee)), + if this expression is well-formed. * Mandates: The type of the expression above is `forward_progress_guarantee`. 2. Otherwise, `forward_progress_guarantee::weakly_parallel`. -3. If `get_forward_progress_guarantee(sch)` for some scheduler `sch` returns `forward_progress_guarantee::concurrent`, all execution agents created by that scheduler shall provide the concurrent forward progress guarantee. If it returns - `forward_progress_guarantee::parallel`, all execution agents created by that scheduler shall provide at least the parallel forward progress guarantee. +3. If `get_forward_progress_guarantee(sch)` for some scheduler `sch` returns + `forward_progress_guarantee::concurrent`, all execution agents created by + that scheduler shall provide the concurrent forward progress guarantee. If + it returns `forward_progress_guarantee::parallel`, all execution agents + created by that scheduler shall provide at least the parallel forward + progress guarantee. ### `this_thread::execute_may_block_caller` [exec.execute.may.block.caller] ### {#spec-execution.execute_may_block_caller} -1. `this_thread::execute_may_block_caller` asks a scheduler `sch` whether a call `execute(sch, f)` with any invocable `f` may block the thread where such a call occurs. +1. `this_thread::execute_may_block_caller` asks a scheduler `sch` whether a call + `execute(sch, f)` with any invocable `f` may block the thread where such a + call occurs. -2. The name `this_thread::execute_may_block_caller` denotes a query object. For some subexpression `sch`, let `Sch` be `decltype((sch))`. If `Sch` does not satisfy `scheduler`, `this_thread::execute_may_block_caller` is ill-formed. Otherwise, - `this_thread::execute_may_block_caller(sch)` is expression-equivalent to: +2. The name `this_thread::execute_may_block_caller` denotes a query object. For + a subexpression `sch`, let `Sch` be `decltype((sch))`. If `Sch` does not + satisfy `scheduler`, `this_thread::execute_may_block_caller` is ill-formed. + Otherwise, `this_thread::execute_may_block_caller(sch)` is + expression-equivalent to: - 1. mandate-nothrow-call(tag_invoke, this_thread::execute_may_block_caller, as_const(sch)), if this expression is well-formed. + 1. MANDATE-NOTHROW(as_const(sch).query(this_thread::execute_may_block_caller)), + if this expression is well-formed. * Mandates: The type of the expression above is `bool`. 2. Otherwise, `true`. -3. If `this_thread::execute_may_block_caller(sch)` for some scheduler `sch` returns `false`, no `execute(sch, f)` call with some invocable `f` shall block the calling thread. +3. If `this_thread::execute_may_block_caller(sch)` for some scheduler `sch` + returns `false`, no `execute(sch, f)` call with some invocable `f` shall + block the calling thread. ### `execution::get_completion_scheduler` [exec.completion.scheduler] ### {#spec-execution.get_completion_scheduler} @@ -4672,17 +5302,16 @@ enum class forward_progress_guarantee { completion scheduler associated with a completion tag from a sender's attributes. -2. The name `get_completion_scheduler` denotes a query object template. For some +2. The name `get_completion_scheduler` denotes a query object template. For a subexpression `q`, let `Q` be `decltype((q))`. If the template argument `Tag` in `get_completion_scheduler(q)` is not one of `set_value_t`, `set_error_t`, or `set_stopped_t`, `get_completion_scheduler(q)` is ill-formed. Otherwise, `get_completion_scheduler(q)` is - expression-equivalent to mandate-nothrow-call(tag_invoke, - get_completion_scheduler<Tag>, as_const(q)) if this expression is - well-formed. + expression-equivalent to + MANDATE-NOTHROW(as_const(q).query(get_completion_scheduler<Tag>)). - * Mandates: The type of the expression above satisfies - `scheduler`. + * Mandates: If the expression above is well-formed, its type + satisfies `scheduler`. 3. If, for some sender `sndr` and completion function `C` that has an associated completion tag `Tag`, `get_completion_scheduler(get_env(sndr))` is @@ -4702,13 +5331,21 @@ enum class forward_progress_guarantee { scheduler. A valid invocation of `schedule` is a schedule-expression.
+    template<class Sch>
+      concept enable-scheduler = // exposition only
+        requires {
+          requires derived_from<typename Sch::scheduler_concept, scheduler_t>;
+        };
+
     template<class Sch>
       concept scheduler =
+        enable-scheduler<remove_cvref_t<Sch>> &&
         queryable<Sch> &&
-        requires(Sch&& sch, const get_completion_scheduler_t<set_value_t> tag) {
+        requires(Sch&& sch) {
           { schedule(std::forward<Sch>(sch)) } -> sender;
-          { tag_invoke(tag, get_env(
-              schedule(std::forward<Sch>(sch)))) } -> same_as<remove_cvref_t<Sch>>;
+          { get_completion_scheduler<set_value_t>(
+              get_env(schedule(std::forward<Sch>(sch)))) }
+                -> same_as<remove_cvref_t<Sch>>;
         } &&
         equality_comparable<remove_cvref_t<Sch>> &&
         copy_constructible<remove_cvref_t<Sch>>;
@@ -4756,15 +5393,14 @@ enum class forward_progress_guarantee {
 
     
     template<class Rcvr>
-      concept is-receiver = // exposition only
-        derived_from<typename Rcvr::receiver_concept, receiver_t>;
-
-    template<class Rcvr>
-      inline constexpr bool enable_receiver = is-receiver<Rcvr>;
+      concept enable-receiver = // exposition only
+        requires {
+          requires derived_from<typename Rcvr::receiver_concept, receiver_t>;
+        };
 
     template<class Rcvr>
       concept receiver =
-        enable_receiver<remove_cvref_t<Rcvr>> &&
+        enable-receiver<remove_cvref_t<Rcvr>> &&
         requires(const remove_cvref_t<Rcvr>& rcvr) {
           { get_env(rcvr) } -> queryable;
         } &&
@@ -4780,20 +5416,20 @@ enum class forward_progress_guarantee {
         };
 
     template<class Rcvr, class Completions>
-      concept receiver_of =
-        receiver<Rcvr> &&
+      concept has-completions = // exposition only
         requires (Completions* completions) {
           []<valid-completion-for<Rcvr>...Sigs>(completion_signatures<Sigs...>*)
           {}(completions);
         };
+
+    template<class Rcvr, class Completions>
+      concept receiver_of =
+        receiver<Rcvr> && has-completions<Rcvr, Completions>;
     
-3. Remarks: Pursuant to [namespace.std], users can specialize `enable_receiver` to - `true` for cv-unqualified program-defined types that model `receiver`, and `false` - for types that do not. Such specializations shall be usable in constant - expressions ([expr.const]) and have type `const bool`. +2. Class types that are `final` do not model the `receiver` concept. -4. Let `rcvr` be a receiver and let `op_state` be an operation state associated +3. Let `rcvr` be a receiver and let `op_state` be an operation state associated with an asynchronous operation created by connecting `rcvr` with a sender. Let `token` be a stop token equal to `get_stop_token(get_env(rcvr))`. `token` shall remain valid for the duration of the asynchronous operation's lifetime @@ -4807,25 +5443,23 @@ enum class forward_progress_guarantee { 1. `set_value` is a value completion function ([async.ops]). Its associated completion tag is `set_value_t`. The expression `set_value(rcvr, vs...)` for - some subexpression `rcvr` and pack of subexpressions `vs` is ill-formed if `rcvr` + a subexpression `rcvr` and pack of subexpressions `vs` is ill-formed if `rcvr` is an lvalue or a `const` rvalue. Otherwise, it is expression-equivalent to - mandate-nothrow-call(tag_invoke, set_value, rcvr, vs...). + MANDATE-NOTHROW(rcvr.set_value(vs...)). ### `execution::set_error` [exec.set.error] ### {#spec-execution.receivers.set_error} 1. `set_error` is an error completion function. Its associated completion tag is `set_error_t`. The expression `set_error(rcvr, err)` for some subexpressions `rcvr` and `err` is ill-formed if `rcvr` is an lvalue or a `const` rvalue. Otherwise, it is - expression-equivalent to mandate-nothrow-call(tag_invoke, - set_error, rcvr, err). + expression-equivalent to MANDATE-NOTHROW(rcvr.set_error(err)). ### `execution::set_stopped` [exec.set.stopped] ### {#spec-execution.receivers.set_stopped} 1. `set_stopped` is a stopped completion function. Its associated completion tag - is `set_stopped_t`. The expression `set_stopped(rcvr)` for some subexpression + is `set_stopped_t`. The expression `set_stopped(rcvr)` for a subexpression `rcvr` is ill-formed if `rcvr` is an lvalue or a `const` rvalue. Otherwise, it is - expression-equivalent to mandate-nothrow-call(tag_invoke, - set_stopped, rcvr). + expression-equivalent to MANDATE-NOTHROW(rcvr.set_stopped()). ## Operation states [exec.opstate] ## {#spec-execution.opstate} @@ -4833,8 +5467,15 @@ enum class forward_progress_guarantee { type ([async.ops]).
+    template<class Rcvr>
+      concept enable-opstate = // exposition only
+        requires {
+          requires derived_from<typename Rcvr::operation_state_concept, operation_state_t>;
+        };
+
     template<class O>
       concept operation_state =
+        enable-opstate<remove_cvref_t<O>> &&
         queryable<O> &&
         is_object_v<O> &&
         requires (O& o) {
@@ -4851,16 +5492,15 @@ enum class forward_progress_guarantee {
 
 1. The name `start` denotes a customization point object that starts
     ([async.ops]) the asynchronous operation associated with the operation state
-    object. The expression `start(O)` for some subexpression `O` is ill-formed
-    if `O` is an rvalue. Otherwise, it is expression-equivalent to:
+    object. For a subexpression `op`, the expression `start(op)` is ill-formed
+    if `op` is an rvalue. Otherwise, it is expression-equivalent to:
 
     
-    mandate-nothrow-call(tag_invoke, start, O)
+    MANDATE-NOTHROW(op.start())
     
-2. If the function selected by `tag_invoke` does not start the asynchronous - operation associated with the operation state `O`, the behavior of calling - `start(O)` is undefined. +2. If `op.start()` does not start the asynchronous operation associated with the + operation state `op`, the behavior of calling `start(op)` is undefined. ## Senders [exec.snd] ## {#spec-execution.senders} @@ -4893,28 +5533,28 @@ enum class forward_progress_guarantee { 1. For a queryable object `env`, let FWD-ENV(env) be a queryable object such that for a query object `q` and a pack of - subexpressions `as`, the expression tag_invoke(q, - FWD-ENV(env), as...) is ill-formed if + subexpressions `as`, the expression + FWD-ENV(env).query(q, as...) is ill-formed if `forwarding_query(q)` is `false`; - otherwise, it is expression-equivalent to `tag_invoke(q, env, as...)`. + otherwise, it is expression-equivalent to `env.query(q, as...)`. 2. For a query object `q` and a subexpression `v`, let MAKE-ENV(q, v) be a queryable object `env` such that - the result of `tag_invoke(q, env)` has a value equal to `v` + the result of `env.query(q)` has a value equal to `v` ([concepts.equality]). Unless otherwise stated, the object to which - `tag_invoke(q, env)` refers remains valid while `env` remains valid. + `env.query(q)` refers remains valid while `env` remains valid. 3. For two queryable objects `env1` and `env2`, a query object `q` and a pack of subexpressions `as`, let JOIN-ENV(env1, - env2) be a queryable object `env3` such that `tag_invoke(q, env3, - as...)` is expression-equivalent to: + env2) be a queryable object `env3` such that + `env3.query(q, as...)` is expression-equivalent to: - - `tag_invoke(q, env1, as...)` if that expression is well-formed, + - `env1.query(q, as...)` if that expression is well-formed, - - otherwise, `tag_invoke(q, env2, as...)` if that expression is + - otherwise, `env2.query(q, as...)` if that expression is well-formed, - - otherwise, `tag_invoke(q, env3, as...)` is ill-formed. + - otherwise, `env3.query(q, as...)` is ill-formed. 4. The expansions of `FWD-ENV`, `MAKE-ENV`, and `JOIN-ENV` can be context-dependent; *i.e.*, they can expand to @@ -4923,15 +5563,15 @@ enum class forward_progress_guarantee { 5. For a scheduler `sch`, let SCHED-ATTRS(sch) be a queryable object `o1` such that - tag_invoke(get_completion_scheduler<Tag>, o1) is a + o1.query(get_completion_scheduler<Tag>) is a prvalue with the same type and value as `sch` where `Tag` is one of `set_value_t` or `set_stopped_t`; and let - tag_invoke(get_domain, o1) be expression-equivalent to - tag_invoke(get_domain, sch). Let + o1.query(get_domain) be expression-equivalent to + sch.query(get_domain). Let SCHED-ENV(sch) be a queryable object `o2` such that - tag_invoke(get_scheduler, o2) is a prvalue with the same - type and value as `sch`, and let tag_invoke(get_domain, o2) - be expression-equivalent to tag_invoke(get_domain, sch). + o1.query(get_scheduler) is a prvalue with the same + type and value as `sch`, and let o2.query(get_domain) + be expression-equivalent to sch.query(get_domain). 6. For two subexpressions `rcvr` and `expr`, let SET-VALUE(rcvr, expr) be `(expr, set_value(rcvr))` if the type of `expr` is `void`; @@ -4953,48 +5593,46 @@ enum class forward_progress_guarantee { 7.
         template<class Default = default_domain, class Sndr>
-        constexpr auto completion-domain(const Sndr& sndr) noexcept;
+          constexpr auto completion-domain(const Sndr& sndr) noexcept;
         
- 1. *Effects:* Let COMPL-DOMAIN(T) be the type of the expression - `get_domain(get_completion_scheduler(get_env(sndr)))`. If - COMPL-DOMAIN(set_value_t), + 1. *Effects:* Let COMPL-DOMAIN(T) be the type of the + expression `get_domain(get_completion_scheduler(get_env(sndr)))`. + If COMPL-DOMAIN(set_value_t), COMPL-DOMAIN(set_error_t), and - COMPL-DOMAIN(set_stopped_t) all share a common type - [meta.trans.other] (ignoring those types that are ill-formed), then - completion-domain<Default>(sndr) is a default-constructed - prvalue of that type. - Otherwise, if all of those types are ill-formed, - completion-domain<Default>(sndr) is a default-constructed - prvalue of type `Default`. - Otherwise, completion-domain<Default>(sndr) is ill-formed. + COMPL-DOMAIN(set_stopped_t) all share a common + type [meta.trans.other] (ignoring those types that are ill-formed), + then completion-domain<Default>(sndr) is a + default-constructed prvalue of that type. Otherwise, if all of those + types are ill-formed, + completion-domain<Default>(sndr) is a + default-constructed prvalue of type `Default`. Otherwise, + completion-domain<Default>(sndr) is + ill-formed. 8.
         template<class Tag, class Env, class Default>
-        constexpr decltype(auto) query-with-default(Tag, const Env& env, Default&& value) noexcept(see below);
+          constexpr decltype(auto) query-with-default(
+            Tag, const Env& env, Default&& value) noexcept(see below);
         
- 1. Effects: Equivalent to: - - - `return Tag()(env);` if that expression is well-formed, + 1. Let e be the expression `Tag()(env)` if that + expression is well-formed; otherwise, it is + `static_cast(std::forward(value))`. - - `return static_cast(std::forward(value));` otherwise. + 2. Returns: e. - 2. Remarks: The expression in the `noexcept` clause is: - -
-                is_invocable_v<Tag, const Env&> ? is_nothrow_invocable_v<Tag, const Env&>
-                                                : is_nothrow_constructible_v<Default, Default>
-                
+ 3. Remarks: The expression in the `noexcept` clause is + noexcept(e). 9.
         template<class Sndr>
-        constexpr auto get-domain-early(const Sndr& sndr) noexcept;
+          constexpr auto get-domain-early(const Sndr& sndr) noexcept;
         
1. Effects: Equivalent to return Domain(); - where `Domain` is the decayed type of the first of the following - expressions that is well-formed: + where `Domain` is the decayed type of the first of the + following expressions that is well-formed: - `get_domain(get_env(sndr))` @@ -5004,7 +5642,7 @@ enum class forward_progress_guarantee { 10.
         template<class Sndr, class Env>
-        constexpr auto get-domain-late(const Sndr& sndr, const Env& env) noexcept;
+          constexpr auto get-domain-late(const Sndr& sndr, const Env& env) noexcept;
         
1. Effects: Equivalent to: @@ -5034,9 +5672,10 @@ enum class forward_progress_guarantee { - `default_domain()`. - The `transfer` algorithm is unique in that it ignores the - execution domain of its predecessor, using only the domain of its - destination scheduler to select a customization. + The `transfer` algorithm is unique in that it + ignores the execution domain of its predecessor, using only the + domain of its destination scheduler to select a + customization. 11.
         template<callable Fun>
@@ -5059,15 +5698,15 @@ enum class forward_progress_guarantee {
             non-movable types into containers like `tuple`, `optional`, and `variant`.
 
     12. 
-        struct on-stop-request {
-          in_place_stop_source& stop_src;
-          void operator()() noexcept { stop_src.request_stop(); }
+        struct on-stop-request { // exposition only
+          in_place_stop_source& stop-src; // exposition only
+          void operator()() noexcept { stop-src.request_stop(); }
         };
         
13.
         template<class... T>
-        struct product-type {
+        struct product-type {  // exposition only
           using type0 = T0;      // exposition only
           using type1 = T1;      // exposition only
             ...
@@ -5086,18 +5725,17 @@ enum class forward_progress_guarantee {
 
     14. 
         template <semiregular Tag, movable-value Data = see below, sender... Child>
-        constexpr auto make-sender(Tag, Data&& data, Child&&... child);
+          constexpr auto make-sender(Tag, Data&& data, Child&&... child);
         
- 1. *Remarks:* The default template argument for the `Data` template parameter - denotes an unspecified empty trivial class type. - - 2. *Returns:* A prvalue of type basic-sender<Tag, decay_t<Data>, decay_t<Child>...> - where the tag member has been default-initialized and the - data and childn... members have - been direct initialized from their respective forwarded arguments, where - basic-sender is the following exposition-only class template - except as noted below: + 1. *Returns:* A prvalue of type basic-sender<Tag, + decay_t<Data>, decay_t<Child>...> where the + tag member has been default-initialized and the + data and + childn... members have been direct + initialized from their respective forwarded arguments, where + basic-sender is the following exposition-only + class template except as noted below:
               template<class T, class... Us>
@@ -5136,25 +5774,38 @@ enum class forward_progress_guarantee {
                 impls-for<tag_of_t<Sndr>>::get-env, Index,
                 state-type<Sndr, Rcvr>&, const Rcvr&>>;
 
-              template<class Sndr, class Rcvr, class Index>  // arguments are not associated entities ([lib.tmpl-heads])
+              template<class Sndr, class Rcvr, class Index> 
                 requires well-formed<env-type, Index, Sndr, Rcvr>
               struct basic-receiver {  // exposition only
-                using tag_t = tag_of_t<Sndr>; // exposition only
+                using tag-type = tag_of_t<Sndr>; // exposition only
                 using receiver_concept = receiver_t;
 
-                template<completion-tag Tag, class... Args>
-                  requires cpo-callable<impls-for<tag_t>::complete,
-                    Index, state-type<Sndr, Rcvr>&, Rcvr&, Tag, Args...>
-                friend void tag_invoke(Tag, basic-receiver&& self, Args&&... args) noexcept {
-                  (void) impls-for<tag_t>::complete(
-                    Index(), self.op_->state_, self.op_->rcvr_, Tag(), std::forward<Args>(args)...);
+                template<class... Args>
+                  requires cpo-callable<impls-for<tag-type>::complete,
+                    Index, state-type<Sndr, Rcvr>&, Rcvr&, set_value_t, Args...>
+                void set_value(Args&&... args) && noexcept {
+                  (void) impls-for<tag-type>::complete(
+                    Index(), op_->state_, op_->rcvr_, set_value_t(), std::forward<Args>(args)...);
+                }
+
+                template<class Error>
+                  requires cpo-callable<impls-for<tag-type>::complete,
+                    Index, state-type<Sndr, Rcvr>&, Rcvr&, set_error_t, Error>
+                void set_error(Error&& err) && noexcept {
+                  (void) impls-for<tag-type>::complete(
+                    Index(), op_->state_, op_->rcvr_, set_error_t(), std::forward<Error>(err));
+                }
+
+                void set_stopped() && noexcept
+                  requires cpo-callable<impls-for<tag-type>::complete,
+                    Index, state-type<Sndr, Rcvr>&, Rcvr&, set_stopped_t> {
+                  (void) impls-for<tag-type>::complete(
+                    Index(), op_->state_, op_->rcvr_, set_stopped_t());
                 }
 
-                template<same_as<get_env_t> Tag>
-                friend auto tag_invoke(Tag, const basic-receiver& self) noexcept
-                  -> env-type<Index, Sndr, Rcvr> {
-                  const auto& rcvr = self.op_->rcvr_;
-                  return impls-for<tag_t>::get-env(Index(), self.op_->state_, rcvr);
+                auto get_env() const noexcept -> env-type<Index, Sndr, Rcvr> {
+                  const auto& rcvr = op_->rcvr_;
+                  return impls-for<tag-type>::get-env(Index(), op_->state_, rcvr);
                 }
 
                 basic-operation<Sndr, Rcvr>* op_; // exposition only
@@ -5178,11 +5829,12 @@ enum class forward_progress_guarantee {
                 cpo-result-t<connect-all, basic-operation<Sndr, Rcvr>*, Sndr,
                   indices-for<Sndr>>;
 
-              template<class Sndr, class Rcvr> // arguments are not associated entities ([lib.tmpl-heads])
+              template<class Sndr, class Rcvr>
                 requires well-formed<state-type, Sndr, Rcvr> &&
                   well-formed<inner-ops-tuple, Sndr, Rcvr>
               struct basic-operation {  // exposition only
-                using tag_t = tag_of_t<Sndr>; // exposition only
+                using operation_state_concept = operation_state_t;
+                using tag-type = tag_of_t<Sndr>; // exposition only
 
                 Rcvr rcvr_; // exposition only
                 state-type<Sndr, Rcvr> state_; // exposition only
@@ -5190,38 +5842,34 @@ enum class forward_progress_guarantee {
 
                 basic-operation(Sndr&& sndr, Rcvr rcvr)  // exposition only
                   : rcvr_(std::move(rcvr))
-                  , state_(impls-for<tag_t>::get-state(std::forward<Sndr>(sndr), rcvr_))
+                  , state_(impls-for<tag-type>::get-state(std::forward<Sndr>(sndr), rcvr_))
                   , inner_ops_(connect-all(this, std::forward<Sndr>(sndr), indices-for<Sndr>()))
                 {}
 
-                friend void tag_invoke(start_t, basic-operation& self) noexcept {
-                  auto& [...ops] = self.inner_ops_;
-                  impls-for<tag_t>::start(self.state_, self.rcvr_, ops...);
+                void start() & noexcept {
+                  auto& [...ops] = inner_ops_;
+                  impls-for<tag-type>::start(state_, rcvr_, ops...);
                 }
               };
 
               template<class Sndr, class Env>
               using completion-signatures-for =  see below; // exposition only
 
-              template<class Tag, class Data, class... Child> // arguments are not associated entities ([lib.tmpl-heads])
+              template<class Tag, class Data, class... Child>
               struct basic-sender {  // exposition only
                 using sender_concept = sender_t;
 
-                template<same_as<get_env_t> GetEnvTag>
-                friend decltype(auto) tag_invoke(GetEnvTag, const basic-sender& self) noexcept {
+                decltype(auto) get_env() const noexcept {
                   return impls-for<Tag>::get-attrs(data, child0, ... childn-1);
                 }
 
-                template<same_as<connect_t> ConnectTag,
-                         decays-to<basic-sender> Self, receiver Rcvr>
-                friend auto tag_invoke(ConnectTag, Self&& self, Rcvr rcvr)
-                  -> basic-operation<Self, Rcvr> {
+                template<decays-to<basic-sender> Self, receiver Rcvr>
+                auto connect(this Self&& self, Rcvr rcvr) -> basic-operation<Self, Rcvr> {
                   return {std::forward<Self>(self), std::move(rcvr)};
                 }
 
-                template<same_as<get_completion_signatures_t> GetComplSigsTag,
-                         decays-to<basic-sender> Self, class Env>
-                friend auto tag_invoke(GetComplSigsTag, Self&& self, Env&& env) noexcept
+                template<decays-to<basic-sender> Self, class Env>
+                auto get_completion_signatures(this Self&& self, Env&& env) noexcept
                   -> completion-signatures-for<Self, Env> {
                   return {};
                 }
@@ -5241,15 +5889,19 @@ enum class forward_progress_guarantee {
               using child-type = decltype((declval<Sndr>().childN)); // exposition only
               
+ 2. *Remarks:* The default template argument for the `Data` template parameter + denotes an unspecified empty trivial class type. + 3. It is unspecified whether instances of basic-sender can be aggregate initialized. - 4. An expression of type basic-sender is usable as the - initializer of a structured binding declaration - [dcl.struct.bind]. + 4. An expression of type + basic-sender is usable as the initializer of a + structured binding declaration [dcl.struct.bind]. - 5. The member default-impls::get-attrs is initialized - with a callable object equivalent to the following lambda: + 5. The member default-impls::get-attrs is + initialized with a callable object equivalent to the following + lambda:
               [](const auto& data, const auto&... child) noexcept -> decltype(auto) {
@@ -5334,17 +5986,18 @@ enum class forward_progress_guarantee {
 
     template<class Sndr>
       concept is-sender = // exposition only
-        derived_from<typename Sndr::sender_concept, sender_t>;
+        requires {
+          requires derived_from<typename Sndr::sender_concept, sender_t>;
+        };
 
     template<class Sndr>
-      inline constexpr bool enable_sender = is-sender<Sndr>;
-
-    template<is-awaitable<env-promise<empty_env>> Sndr> // [exec.awaitables]
-      inline constexpr bool enable_sender<Sndr> = true;
+      concept enable-sender = // exposition only
+        is-sender<Sndr> ||
+        is-awaitable<Sndr, env-promise<empty_env>>;  // [exec.awaitables]
 
     template<class Sndr>
       concept sender =
-        enable_sender<remove_cvref_t<Sndr>> &&
+        bool(enable-sender<remove_cvref_t<Sndr>>) && // atomic constraint
         requires (const remove_cvref_t<Sndr>& sndr) {
           { get_env(sndr) } -> queryable;
         } &&
@@ -5356,8 +6009,8 @@ enum class forward_progress_guarantee {
         sender<Sndr> &&
         queryable<Env> &&
         requires (Sndr&& sndr, Env&& env) {
-          { get_completion_signatures(std::forward<Sndr>(sndr), std::forward<Env>(env)) } ->
-            valid-completion-signatures;
+          { get_completion_signatures(std::forward<Sndr>(sndr), std::forward<Env>(env)) }
+            -> valid-completion-signatures;
         };
 
     template<class Sndr, class Rcvr>
@@ -5384,12 +6037,7 @@ enum class forward_progress_guarantee {
     valid-completion-signatures if it denotes a specialization
     of the `completion_signatures` class template.
 
-4. Remarks: Pursuant to [namespace.std], users can specialize `enable_sender` to
-    `true` for cv-unqualified program-defined types that model `sender`, and `false`
-    for types that do not. Such specializations shall be usable in constant
-    expressions ([expr.const]) and have type `const bool`.
-
-5. The exposition-only concepts sender-of and
+4. The exposition-only concepts sender-of and
     sender-of-in define the requirements for a sender
     type that completes with a given unique set of value result types.
 
@@ -5408,7 +6056,7 @@ enum class forward_progress_guarantee {
       concept sender-of = sender-of-in<Sndr, empty_env, Values...>;
     
-6. Let `sndr` be an expression such that `decltype((sndr))` is `Sndr`. The type +5. Let `sndr` be an expression such that `decltype((sndr))` is `Sndr`. The type `tag_of_t` is as follows: - If the declaration `auto&& [tag, data, ...children] = sndr;` would be @@ -5423,7 +6071,7 @@ enum class forward_progress_guarantee { makes it possible to implement this purely in the library. P2141 has already been approved by EWG for C++26.
-7. Let sender-for be an exposition-only concept defined as follows: +6. Let sender-for be an exposition-only concept defined as follows:
     template<class Sndr, class Tag>
@@ -5432,15 +6080,18 @@ enum class forward_progress_guarantee {
       same_as<tag_of_t<Sndr>, Tag>;
     
-8. For a type `T`, SET-VALUE-SIG(T) denotes the type +7. For a type `T`, SET-VALUE-SIG(T) denotes the type `set_value_t()` if `T` is *cv* `void`; otherwise, it denotes the type `set_value_t(T)`. -9. Library-provided sender types: - - Always expose an overload of a customization of `connect` - that accepts an rvalue sender. - - Only expose an overload of a customization of `connect` that - accepts an lvalue sender if they model `copy_constructible`. +8. Library-provided sender types: + + - Always expose an overload of a member `connect` that accepts an rvalue + sender. + + - Only expose an overload of a member `connect` that accepts an lvalue + sender if they model `copy_constructible`. + - Model `copy_constructible` if they satisfy `copy_constructible`. ### Awaitable helpers [exec.awaitables] ### {#spec.exec-awaitables} @@ -5493,31 +6144,36 @@ enum class forward_progress_guarantee { - `T` is `bool`, or - `T` is a specialization of `coroutine_handle`. -3. For a subexpression `c` such that `decltype((c))` is type `C`, and +4. For a subexpression `c` such that `decltype((c))` is type `C`, and an lvalue `p` of type `Promise`, await-result-type<C, Promise> denotes the type decltype(GET-AWAITER(c, p).await_resume()). -4. Let with-await-transform be the exposition-only class template: +5. Let with-await-transform be the exposition-only class template:
+    template<class T, class Promise>
+      concept has-as-awaitable = // exposition only
+        requires (T&& t, Promise& p) {
+          { std::forward<T>(t).as_awaitable(p) } -> is-awaitable<Promise&>;
+        };
+
     template<class Derived>
-    struct with-await-transform {
-      template<class T>
-      T&& await_transform(T&& value) noexcept {
-        return std::forward<T>(value);
-      }
+      struct with-await-transform {
+        template<class T>
+          T&& await_transform(T&& value) noexcept {
+            return std::forward<T>(value);
+          }
 
-      template<class T>
-        requires tag_invocable<as_awaitable_t, T, Derived&>
-      auto await_transform(T&& value)
-        noexcept(nothrow_tag_invocable<as_awaitable_t, T, Derived&>)
-        -> tag_invoke_result_t<as_awaitable_t, T, Derived&> {
-        return tag_invoke(as_awaitable, std::forward<T>(value), static_cast<Derived&>(*this));
-      }
-    };
+        template<has-as-awaitable<Derived> T>
+          auto await_transform(T&& value)
+            noexcept(noexcept(std::forward<T>(value).as_awaitable(declval<Derived&>())))
+            -> decltype(std::forward<T>(value).as_awaitable(declval<Derived&>())) {
+            return std::forward<T>(value).as_awaitable(static_cast<Derived&>(*this));
+          }
+      };
     
-5. Let env-promise be the exposition-only class template: +6. Let env-promise be the exposition-only class template:
     template<class Env>
@@ -5529,7 +6185,7 @@ enum class forward_progress_guarantee {
       void return_void() noexcept;
       coroutine_handle<> unhandled_stopped() noexcept;
 
-      friend const Env& tag_invoke(get_env_t, const env-promise&) noexcept;
+      const Env& get_env() const noexcept;
     };
     
@@ -5543,13 +6199,15 @@ enum class forward_progress_guarantee { struct default_domain { template <sender Sndr, queryable... Env> requires (sizeof...(Env) <= 1) - static constexpr sender decltype(auto) transform_sender(Sndr&& sndr, const Env&... env) noexcept(see below); + static constexpr sender decltype(auto) transform_sender(Sndr&& sndr, const Env&... env) + noexcept(see below); template <sender Sndr, queryable Env> static constexpr queryable decltype(auto) transform_env(Sndr&& sndr, Env&& env) noexcept; template<class Tag, sender Sndr, class... Args> - static constexpr decltype(auto) apply_sender(Tag, Sndr&& sndr, Args&&... args) noexcept(see below); + static constexpr decltype(auto) apply_sender(Tag, Sndr&& sndr, Args&&... args) + noexcept(see below); }; @@ -5558,7 +6216,8 @@ struct default_domain {
 template <sender Sndr, queryable... Env>
     requires (sizeof...(Env) <= 1)
-  constexpr sender decltype(auto) transform_sender(Sndr&& sndr, const Env&... env) noexcept(see below);
+  constexpr sender decltype(auto) transform_sender(Sndr&& sndr, const Env&... env)
+    noexcept(see below);
 
1. Let e be the expression @@ -5585,7 +6244,8 @@ template <sender Sndr, queryable Env>
 template<class Tag, sender Sndr, class... Args>
-  constexpr decltype(auto) apply_sender(Tag, Sndr&& sndr, Args&&... args) noexcept(see below);
+  constexpr decltype(auto) apply_sender(Tag, Sndr&& sndr, Args&&... args)
+    noexcept(see below);
 
7. Let e be the expression @@ -5595,14 +6255,16 @@ template<class Tag, sender Sndr, class... Args> 9. Returns: e. -10. Remarks: The exception specification is equivalent to noexcept(e). +10. Remarks: The exception specification is equivalent to + noexcept(e). ### `execution::transform_sender` [exec.snd.transform] ### {#spec-execution.sender_transform}
 template<class Domain, sender Sndr, queryable... Env>
     requires (sizeof...(Env) <= 1)
-  constexpr sender decltype(auto) transform_sender(Domain dom, Sndr&& sndr, const Env&... env) noexcept(see below);
+  constexpr sender decltype(auto) transform_sender(Domain dom, Sndr&& sndr, const Env&... env)
+    noexcept(see below);
 
1. Let transformed-sndr be the expression @@ -5641,7 +6303,8 @@ template<class Domain, sender Sndr, queryable Env>
 template<class Domain, class Tag, sender Sndr, class... Args>
-  constexpr decltype(auto) apply_sender(Domain dom, Tag, Sndr&& sndr, Args&&... args) noexcept(see below);
+  constexpr decltype(auto) apply_sender(Domain dom, Tag, Sndr&& sndr, Args&&... args)
+    noexcept(see below);
 
1. Let e be the expression `dom.apply_sender(Tag(), @@ -5659,11 +6322,11 @@ template<class Domain, class Tag, sender Sndr, class... Args> ### `execution::get_completion_signatures` [exec.getcomplsigs] ### {#spec-execution.getcomplsigs} 1. `get_completion_signatures` is a customization point object. Let `sndr` be an - expression such that `decltype((sndr))` is `Sndr`, and let `env` be an expression - such that `decltype((env))` is `Env`. Then `get_completion_signatures(sndr, env)` is - expression-equivalent to: + expression such that `decltype((sndr))` is `Sndr`, and let `env` be an + expression such that `decltype((env))` is `Env`. Then + `get_completion_signatures(sndr, env)` is expression-equivalent to: - 1. `tag_invoke_result_t{}` if that + 1. `decltype(sndr.get_completion_signatures(env)){}` if that expression is well-formed, 2. Otherwise, `remove_cvref_t::completion_signatures{}` if that expression is well-formed, @@ -5673,7 +6336,8 @@ template<class Domain, class Tag, sender Sndr, class... Args>
             completion_signatures<
-              SET-VALUE-SIG(await-result-type<Sndr, env-promise<Env>>), // see [exec.snd.concepts]
+              SET-VALUE-SIG(await-result-type<Sndr,
+                            env-promise<Env>>), // see [exec.snd.concepts]
               set_error_t(exception_ptr),
               set_stopped_t()>{}
             
@@ -5695,13 +6359,15 @@ template<class Domain, class Tag, sender Sndr, class... Args> 1. `connect` connects ([async.ops]) a sender with a receiver. 2. The name `connect` denotes a customization point object. For subexpressions - `sndr` and `rcvr`, let `Sndr` be `decltype((sndr))` and `Rcvr` be `decltype((rcvr))`, and let - `DS` and `DR` be the decayed types of `Sndr` and `Rcvr`, respectively. + `sndr` and `rcvr`, let `Sndr` be `decltype((sndr))` and `Rcvr` be + `decltype((rcvr))`, and let `DS` and `DR` be the decayed types of `Sndr` and + `Rcvr`, respectively. 3. Let connect-awaitable-promise be the following class:
-    struct connect-awaitable-promise : with-await-transform<connect-awaitable-promise> {
+    struct connect-awaitable-promise
+      : with-await-transform<connect-awaitable-promise> {
       DR& rcvr; // exposition only
 
       connect-awaitable-promise(DS&, DR& rcvr) noexcept : rcvr(rcvr) {}
@@ -5721,8 +6387,8 @@ template<class Domain, class Tag, sender Sndr, class... Args>
           coroutine_handle<connect-awaitable-promise>::from_promise(*this)};
       }
 
-      friend env_of_t<const DR&> tag_invoke(get_env_t, const connect-awaitable-promise& self) noexcept {
-        return get_env(self.rcvr);
+      env_of_t<const DR&> get_env() const noexcept {
+        return execution::get_env(rcvr);
       }
     };
     
@@ -5731,6 +6397,7 @@ template<class Domain, class Tag, sender Sndr, class... Args>
     struct operation-state-task {
+      using operation_state_concept = operation_state_t;
       using promise_type = connect-awaitable-promise;
       coroutine_handle<> coro; // exposition only
 
@@ -5739,8 +6406,8 @@ template<class Domain, class Tag, sender Sndr, class... Args>
         : coro(exchange(o.coro, {})) {}
       ~operation-state-task() { if (coro) coro.destroy(); }
 
-      friend void tag_invoke(start_t, operation-state-task& self) noexcept {
-        self.coro.resume();
+      void start() & noexcept {
+        coro.resume();
       }
     };
     
@@ -5793,11 +6460,10 @@ template<class Domain, class Tag, sender Sndr, class... Args> `connect(sndr, rcvr)` is ill-formed. Otherwise, the expression `connect(sndr, rcvr)` is expression-equivalent to: - 1. `tag_invoke(connect, sndr, rcvr)` if - connectable-with-tag-invoke<Sndr, Rcvr> is modeled. + 1. `sndr.connect(rcvr)` if that expression is well-formed. - * Mandates: The type of the `tag_invoke` expression above - satisfies `operation_state`. + * Mandates: The type of the expression above satisfies + `operation_state`. 2. Otherwise, connect-awaitable(sndr, rcvr) if that expression is well-formed. @@ -5810,16 +6476,14 @@ template<class Domain, class Tag, sender Sndr, class... Args> 1. `schedule` obtains a schedule-sender ([async.ops]) from a scheduler. -2. The name `schedule` denotes a customization point object. For some +2. The name `schedule` denotes a customization point object. For a subexpression `sch`, the expression `schedule(sch)` is expression-equivalent to: - 1. `tag_invoke(schedule, sch)`, if that expression is valid. If the function - selected by `tag_invoke` does not return a sender whose `set_value` - completion scheduler is equivalent to `sch`, the behavior of calling - `schedule(sch)` is undefined. + 1. `sch.schedule()` if that expression is valid. If `sch.schedule()` does + not return a sender whose `set_value` completion scheduler is equal + to `sch`, the behavior of calling `schedule(sch)` is undefined. - * Mandates: The type of the `tag_invoke` expression above - satisfies `sender`. + * Mandates: The type of `sch.schedule()` satisfies `sender`. 2. Otherwise, `schedule(sch)` is ill-formed. @@ -5917,35 +6581,7 @@ template<class Domain, class Tag, sender Sndr, class... Args> requirement applies to any sender returned from a function that is selected by the implementation of such sender adaptor. -6. For any sender type, receiver type, operation state type, queryable type, or - coroutine promise type that is part of the implementation of any sender - adaptor in this subclause and that is a class template, the template - arguments do not contribute to the associated entities - ([basic.lookup.argdep]) of a function call where a specialization of the - class template is an associated entity. - - [*Example:* - -
-    namespace sender-adaptors { // exposition only
-      template<class Sch, class Sndr> // arguments are not associated entities ([lib.tmpl-heads])
-      class on-sender {
-        // ...
-      };
-
-      struct on_t {
-        template<scheduler Sch, sender Sndr>
-        on-sender<Sch, Sndr> operator()(Sch&& sch, Sndr&& sndr) const {
-          // ...
-        }
-      };
-    }
-    inline constexpr sender-adaptors::on_t on{};
-    
- - -- end example] - -7. If a sender returned from a sender adaptor specified in this subclause is +6. If a sender returned from a sender adaptor specified in this subclause is specified to include `set_error_t(Err)` among its set of completion signatures where `decay_t` denotes the type `exception_ptr`, but the implementation does not potentially evaluate an error completion operation with an @@ -5965,43 +6601,65 @@ template<class Domain, class Tag, sender Sndr, class... Args> sndr | c - Given an additional pipeable sender adaptor closure object `d`, the expression `c | d` produces another pipeable sender adaptor closure object `e`: + Given an additional pipeable sender adaptor closure object `d`, the + expression `c | d` produces another pipeable sender adaptor closure object + `e`: - `e` is a perfect forwarding call wrapper ([func.require]) with the following properties: + `e` is a perfect forwarding call wrapper ([func.require]) with the following + properties: - - Its target object is an object `d2` of type `decay_t` direct-non-list-initialized with `d`. + - Its target object is an object `d2` of type `decay_t` + direct-non-list-initialized with `d`. - - It has one bound argument entity, an object `c2` of type `decay_t` direct-non-list-initialized with `C`. + - It has one bound argument entity, an object `c2` of type + `decay_t` direct-non-list-initialized with `C`. - - Its call pattern is `d2(c2(arg))`, where `arg` is the argument used in a function call expression of `e`. + - Its call pattern is `d2(c2(arg))`, where `arg` is the argument used in a + function call expression of `e`. - The expression `c | d` is well-formed if and only if the initializations of the state entities of `e` are all well-formed. + The expression `c | d` is well-formed if and only if the initializations of + the state entities of `e` are all well-formed. -2. An object `t` of type `T` is a pipeable sender adaptor closure object if `T` models `derived_from>`, `T` has no other base - classes of type `sender_adaptor_closure` for any other type `U`, and `T` does not model `sender`. +2. An object `t` of type `T` is a pipeable sender adaptor closure object if `T` + models `derived_from>`, `T` has no other base + classes of type `sender_adaptor_closure` for any other type `U`, and `T` + does not model `sender`. -3. The template parameter `D` for `sender_adaptor_closure` can be an incomplete type. Before any expression of type cv D appears as - an operand to the `|` operator, `D` shall be complete and model `derived_from>`. The behavior of an expression involving an - object of type cv D as an operand to the `|` operator is undefined if overload resolution selects a program-defined `operator|` - function. +3. The template parameter `D` for `sender_adaptor_closure` can be an incomplete + type. Before any expression of type cv D appears as an + operand to the `|` operator, `D` shall be complete and model + `derived_from>`. The behavior of an expression + involving an object of type cv D as an operand to the + `|` operator is undefined if overload resolution selects a program-defined + `operator|` function. -4. A pipeable sender adaptor object is a customization point object that accepts a `sender` as its first argument and returns a `sender`. +4. A pipeable sender adaptor object is a customization point object that + accepts a `sender` as its first argument and returns a `sender`. -5. If a pipeable sender adaptor object accepts only one argument, then it is a pipeable sender adaptor closure object. +5. If a pipeable sender adaptor object accepts only one argument, then it is a + pipeable sender adaptor closure object. -6. If a pipeable sender adaptor object `adaptor` accepts more than one argument, then let `sndr` be an expression such that `decltype((sndr))` models `sender`, - let `args...` be arguments such that `adaptor(sndr, args...)` is a well-formed expression as specified in the rest of this subclause - ([exec.adapt.objects]), and let `BoundArgs` be a pack that denotes `decay_t...`. The expression `adaptor(args...)` - produces a pipeable sender adaptor closure object `f` that is a perfect forwarding call wrapper with the following properties: +6. If a pipeable sender adaptor object `adaptor` accepts more than one argument, + then let `sndr` be an expression such that `decltype((sndr))` models + `sender`, let `args...` be arguments such that `adaptor(sndr, args...)` is a + well-formed expression as specified in the rest of this subclause + ([exec.adapt.objects]), and let `BoundArgs` be a pack that denotes + `decay_t...`. The expression `adaptor(args...)` produces a + pipeable sender adaptor closure object `f` that is a perfect forwarding call + wrapper with the following properties: - Its target object is a copy of `adaptor`. - - Its bound argument entities `bound_args` consist of objects of types `BoundArgs...` direct-non-list-initialized with `std::forward(args)...`, respectively. + - Its bound argument entities `bound_args` consist of objects of types + `BoundArgs...` direct-non-list-initialized with + `std::forward(args)...`, respectively. - - Its call pattern is `adaptor(rcvr, bound_args...)`, where `rcvr` is the argument used in a function call expression of `f`. + - Its call pattern is `adaptor(rcvr, bound_args...)`, where `rcvr` is the + argument used in a function call expression of `f`. - The expression `adaptor(args...)` is well-formed if and only if the initializations of the bound argument entities of the result, as specified above, - are all well-formed. + The expression `adaptor(args...)` is well-formed if and only if the + initializations of the bound argument entities of the result, as specified + above, are all well-formed. #### `execution::on` [exec.on] #### {#spec-execution.senders.adapt.on} @@ -6194,7 +6852,8 @@ template<class Domain, class Tag, sender Sndr, class... Args> 3. Let `receiver-type` denote the following class:
-                struct receiver-type : receiver_adaptor<receiver-type> {
+                struct receiver-type {
+                  using receiver_concept = receiver_t;
                   state-type* state; // exposition only
 
                   Rcvr&& base() && noexcept { return std::move(state->rcvr); }
@@ -6210,6 +6869,19 @@ template<class Domain, class Tag, sender Sndr, class... Args>
                       },
                       state->async-result);
                   }
+
+                  template<class Error>
+                  void set_error(Error&& err) && noexcept {
+                    execution::set_error(std::move(state->rcvr), std::forward<Error>(err));
+                  }
+
+                  void set_stopped() && noexcept {
+                    execution::set_stopped(std::move(state->rcvr));
+                  }
+
+                  decltype(auto) get_env() const noexcept {
+                    return FWD-ENV(execution::get_env(state->rcvr));
+                  }
                 };
                 
@@ -6356,12 +7028,13 @@ template<class Domain, class Tag, sender Sndr, class... Args>
             template<class Rcvr, class Env>
-            struct receiver2 : receiver_adaptor<receiver2<Rcvr, Env>, Rcvr> {
+            struct receiver2 : Rcvr {
               explicit receiver2(Rcvr rcvr, Env env)
-                : receiver2::receiver_adaptor{std::move(rcvr)}, env(std::move(env)) {}
+                : Rcvr(std::move(rcvr)), env(std::move(env)) {}
 
               auto get_env() const noexcept {
-                return JOIN-ENV(env, FWD-ENV(execution::get_env(this->base())));
+                const Rcvr& rcvr = *this;
+                return JOIN-ENV(env, FWD-ENV(execution::get_env(rcvr)));
               }
 
               Env env; // exposition only
@@ -6653,20 +7326,42 @@ template<class Domain, class Tag, sender Sndr, class... Args>
     struct shared-receiver {
       using receiver_concept = receiver_t;
 
-      template<completion-tag Tag, class... Args>
-      friend void tag_invoke(Tag, shared-receiver&& self, Args&&... args) noexcept {
+      template<class Tag, class... Args>
+      void complete(Tag, Args&&... args) noexcept { // exposition only
         try {
           using tuple_t = decayed-tuple<Tag, Args...>;
-          self.sh_state->result.template emplace<tuple_t>(Tag(), std::forward<Args>(args)...);
+          sh_state->result.template emplace<tuple_t>(Tag(), std::forward<Args>(args)...);
         } catch (...) {
           using tuple_t = tuple<set_error_t, exception_ptr>;
-          self.sh_state->result.template emplace<tuple_t>(set_error, current_exception());
+          sh_state->result.template emplace<tuple_t>(set_error, current_exception());
         }
-        self.sh_state->notify();
+        sh_state->notify();
+      }
+
+      template<class... Args>
+      void set_value(Args&&... args) && noexcept {
+        complete(execution::set_value, std::forward<Args>(args)...);
+      }
+
+      template<class Error>
+      void set_error(Error&& err) && noexcept {
+        complete(execution::set_error, std::forward<Error>(err));
+      }
+
+      void set_stopped() && noexcept {
+        complete(execution::set_stopped);
       }
 
-      friend decltype(auto) tag_invoke(get_env_t, const shared-receiver& self) noexcept {
-        return MAKE-ENV(get_stop_token, self.sh_state->stop_src.get_token());
+      struct env { // exposition only
+        shared-state<Sndr>* sh-state; // exposition only
+
+        in_place_stop_source query(get_stop_token_t) const noexcept {
+          return sh-state->stop_src.get_token();
+        }
+      };
+
+      env get_env() const noexcept {
+        return env{sh_state};
       }
 
       shared-state<Sndr>* sh_state;
@@ -7229,7 +7924,7 @@ template<class Domain, class Tag, sender Sndr, class... Args>
 
 1. `stopped_as_optional` maps an input sender's stopped completion operation into the value completion operation as an empty optional. The input sender's value completion operation is also converted into an optional. The result is a sender that never completes with stopped, reporting cancellation by completing with an empty optional.
 
-2. The name `stopped_as_optional` denotes a customization point object. For some subexpression `sndr`, let `Sndr` be `decltype((sndr))`.
+2. The name `stopped_as_optional` denotes a customization point object. For a subexpression `sndr`, let `Sndr` be `decltype((sndr))`.
     The expression `stopped_as_optional(sndr)` is expression-equivalent to:
 
     
@@ -7284,7 +7979,7 @@ template<class Domain, class Tag, sender Sndr, class... Args>
 1. `start_detached` eagerly starts a sender without the caller needing to manage
     the lifetimes of any objects.
 
-2. The name `start_detached` denotes a customization point object. For some
+2. The name `start_detached` denotes a customization point object. For a
     subexpression `sndr`, let `Sndr` be `decltype((sndr))`. If
     `sender_in` is `false`, `start_detached` is ill-formed.
     Otherwise, the expression `start_detached(sndr)` is expression-equivalent to
@@ -7305,32 +8000,34 @@ template<class Domain, class Tag, sender Sndr, class... Args>
 3. Let `sndr` be a subexpression such that `Sndr` is `decltype((sndr))`, and let
     detached-receiver and
     detached-operation be the following exposition-only
-    class types:
+    class templates:
 
     
+    template<class Sndr>
     struct detached-receiver {
       using receiver_concept = receiver_t;
-      detached-operation* op; // exposition only
+      detached-operation<Sndr>* op; // exposition only
 
-      friend void tag_invoke(set_value_t, detached-receiver&& self) noexcept { delete self.op; }
-      friend void tag_invoke(set_error_t, detached-receiver&&, auto&&) noexcept { terminate(); }
-      friend void tag_invoke(set_stopped_t, detached-receiver&& self) noexcept { delete self.op; }
-      friend empty_env tag_invoke(get_env_t, const detached-receiver&) noexcept { return {}; }
+      void set_value() && noexcept { delete op; }
+      void set_error() && noexcept { terminate(); }
+      void set_stopped() && noexcept { delete op; }
+      empty_env get_env() const noexcept { return {}; }
     };
 
+    template<class Sndr>
     struct detached-operation {
-      connect_result_t<Sndr, detached-receiver> op; // exposition only
+      connect_result_t<Sndr, detached-receiver<Sndr>> op; // exposition only
 
       explicit detached-operation(Sndr&& sndr)
-        : op(connect(std::forward<Sndr>(sndr), detached-receiver{this}))
+        : op(connect(std::forward<Sndr>(sndr), detached-receiver<Sndr>{this}))
       {}
     };
     
-4. If sender_to<Sndr, detached-receiver> is `false`, the +4. If sender_to<Sndr, detached-receiver<Sndr>> is `false`, the expression `start_detached.apply_sender(sndr)` is ill-formed; otherwise, it is expression-equivalent to start((new - detached-operation(sndr))->op). + detached-operation<Sndr>(sndr))->op). #### `this_thread::sync_wait` [exec.sync.wait] #### {#spec-execution.senders.consumers.sync_wait} @@ -7343,16 +8040,11 @@ template<class Domain, class Tag, sender Sndr, class... Args> type:
-    template<class Tag>
-    concept get-sched-query = // exposition only
-      one-of<Tag, execution::get_scheduler_t, execution::get_delegatee_scheduler_t>;
-
     struct sync-wait-env {
       execution::run_loop* loop; // exposition only
 
-      friend auto tag_invoke(get-sched-query auto, sync-wait-env self) noexcept {
-        return self.loop->get_scheduler();
-      }
+      auto query(execution::get_scheduler_t) const noexcept { loop->get_scheduler(); }
+      auto query(execution::get_delegatee_scheduler_t) const noexcept { loop->get_scheduler(); }
     };
     
@@ -7407,40 +8099,53 @@ template<class Domain, class Tag, sender Sndr, class... Args> using receiver_concept = receiver_t; sync-wait-state<Sndr>* state; // exposition only - template<class Tag, class... Ts> - void complete(Tag, Ts&&... ts) noexcept; // exposition only + template<class... Args> + void set_value(Args&&... args) && noexcept; - template<completion-tag Tag, class... Ts> - friend void tag_invoke(Tag, sync-wait-receiver&& self, Ts&&... ts) noexcept { - self.complete(Tag(), std::forward<Ts>(ts)...); - self.state->loop.finish(); - } + template<class Error> + void set_error(Error&& err) && noexcept; - friend sync-wait-env tag_invoke(get_env_t, const sync-wait-receiver& self) noexcept { - return {&self.state->loop}; - } + void set_stopped() && noexcept; + + sync-wait-env get_env() const noexcept { return {&state->loop}; } };
- 1. The member sync-wait-receiver::complete behaves as follows: + 1.
+        template<class... Args>
+        void sync-wait-receiver::set_value(Args&&... args) && noexcept;
+        
- 1. If `Tag` is `set_value_t`, evaluates: + 1. *Effects:* Equivalent to:
             try {
-              state->result.emplace(std::forward<Ts>(ts)...);
+              state->result.emplace(std::forward<Args>(args)...);
             } catch (...) {
               state->error = current_exception();
             }
+            state->loop.finish();
             
- 2. Otherwise, if `Tag` is `set_error_t`, evaluates: + 2.
+        template<class Error>
+        void sync-wait-receiver::set_error(Error&& err) && noexcept;
+        
+ + 1. *Effects:* Equivalent to:
-            state->error = AS-EXCEPT-PTR(std::forward(ts)...); // see [exec.general]
+            state->error = AS-EXCEPT-PTR(std::forward<Error>(err)); // see [exec.general]
+            state->loop.finish();
             
- 3. Otherwise, does nothing. + 3.
+        template<class Error>
+        void sync-wait-receiver::set_stopped() && noexcept;
+        
+ + 1. *Effects:* Equivalent to state->loop.finish(). + 6. For a subexpression `sndr`, let `Sndr` be `decltype((sndr))`. If sender_to<Sndr, sync-wait-receiver<Sndr>> is `false`, the @@ -7486,9 +8191,9 @@ template<class Domain, class Tag, sender Sndr, class... Args> expression-equivalent to the following, except `sndr` is evaluated only once: -
-    apply_sender(get-domain-early(sndr), sync_wait_with_variant, sndr)
-    
+
+      apply_sender(get-domain-early(sndr), sync_wait_with_variant, sndr)
+      
Mandates: @@ -7552,174 +8257,6 @@ template<class Domain, class Tag, sender Sndr, class... Args> ## Sender/receiver utilities [exec.utils] ## {#spec-execution.snd_rec_utils} -1. This subclause makes use of the following exposition-only entities: - -
-    // [Editorial note: copy_cvref_t as in [[P1450R3]] -- end note]
-    // Mandates: is_base_of_v<T, remove_reference_t<U>> is true
-    template<class T, class U>
-      copy_cvref_t<U&&, T> c-style-cast(U&& u) noexcept requires decays-to<T, T> {
-        return (copy_cvref_t<U&&, T>) std::forward<U>(u);
-      }
-    
- -2. [Note: The C-style cast in c-style-cast is to disable accessibility checks. -- end note] - -### `execution::receiver_adaptor` [exec.utils.rcvr.adptr] ### {#spec-execution.snd_rec_utils.rcvr_adptr} - -
-    template<
-        class-type Derived,
-        receiver Base = unspecified> // arguments are not associated entities ([lib.tmpl-heads])
-      class receiver_adaptor;
-    
- -1. `receiver_adaptor` simplifies the implementation of one receiver type in terms of another. It defines `tag_invoke` overloads that forward to named members if they exist, and to the adapted receiver otherwise. - -2. If `Base` is an alias for the unspecified default template argument, then: - - - Let HAS-BASE be `false`, and - - Let GET-BASE(d) be `d.base()`. - - otherwise, let: - - - Let HAS-BASE be `true`, and - - Let GET-BASE(d) be c-style-cast<receiver_adaptor<Derived, Base>>(d).base(). - - Let BASE-TYPE(D) be the type of GET-BASE(declval<D>()). - -3. `receiver_adaptor` is equivalent to the following: - -
-    template<
-      class-type Derived,
-      receiver Base = unspecified> // arguments are not associated entities ([lib.tmpl-heads])
-    class receiver_adaptor {
-      friend Derived;
-     public:
-      using receiver_concept = receiver_t;
-
-      // Constructors
-      receiver_adaptor() = default;
-      template<class B>
-          requires HAS-BASE && constructible_from<Base, B>
-        explicit receiver_adaptor(B&& base) : base_(std::forward<B>(base)) {}
-
-     private:
-      using set_value = unspecified;
-      using set_error = unspecified;
-      using set_stopped = unspecified;
-      using get_env = unspecified;
-
-      // Member functions
-      template<class Self>
-        requires HAS-BASE
-      decltype(auto) base(this Self&& self) noexcept {
-        return (std::forward<Self>(self).base_);
-      }
-
-      // [exec.utils.rcvr.adptr.nonmembers] Non-member functions
-      template<class... As>
-        friend void tag_invoke(set_value_t, Derived&& self, As&&... as) noexcept;
-
-      template<class Err>
-        friend void tag_invoke(set_error_t, Derived&& self, Err&& err) noexcept;
-
-      friend void tag_invoke(set_stopped_t, Derived&& self) noexcept;
-
-      friend decltype(auto) tag_invoke(get_env_t, const Derived& self) noexcept;
-
-      [[no_unique_address]] Base base_; // present if and only if HAS-BASE is true
-    };
-    
- -4. [Note: `receiver_adaptor` provides `tag_invoke` overloads on behalf of - the derived class `Derived`, which is incomplete when `receiver_adaptor` is - instantiated.] - -5. [Example: -
-     using _int_completion =
-       completion_signatures<set_value_t(int)>;
-
-     template<receiver_of<_int_completion> Rcvr>
-       class my_receiver : receiver_adaptor<my_receiver<Rcvr>, Rcvr> {
-         friend receiver_adaptor<my_receiver, Rcvr>;
-         void set_value() && {
-           set_value(std::move(*this).base(), 42);
-         }
-        public:
-         using receiver_adaptor<my_receiver, Rcvr>::receiver_adaptor;
-       };
-     
- -- end example] - -#### Non-member functions [exec.utils.rcvr.adptr.nonmembers] #### {#spec-execution.snd_rec_utils.receiver_adaptor.nonmembers} - -
-    template<class... As>
-      friend void tag_invoke(set_value_t, Derived&& self, As&&... as) noexcept;
-    
- - 1. Let `SET-VALUE-MBR` be the expression `std::move(self).set_value(std::forward(as)...)`. - - 2. Constraints: Either `SET-VALUE-MBR` is a valid expression or `typename Derived::set_value` denotes a type and callable<set_value_t, BASE-TYPE(Derived), As...> is `true`. - - 3. Mandates: `SET-VALUE-MBR`, if that expression is valid, is not potentially-throwing. - - 4. Effects: Equivalent to: - - * If `SET-VALUE-MBR` is a valid expression, `SET-VALUE-MBR`; - - * Otherwise, set_value(GET-BASE(std::move(self)), std::forward<As>(as)...). - -
-    template<class Err>
-      friend void tag_invoke(set_error_t, Derived&& self, Err&& err) noexcept;
-    
- - 1. Let `SET-ERROR-MBR` be the expression `std::move(self).set_error(std::forward(err))`. - - 2. Constraints: Either `SET-ERROR-MBR` is a valid expression or `typename Derived::set_error` denotes a type and callable<set_error_t, BASE-TYPE(Derived), Err> is `true`. - - 3. Mandates: `SET-ERROR-MBR`, if that expression is valid, is not potentially-throwing. - - 4. Effects: Equivalent to: - - * If `SET-ERROR-MBR` is a valid expression, `SET-ERROR-MBR`; - - * Otherwise, set_error(GET-BASE(std::move(self)), std::forward<Err>(err)). - -
-    friend void tag_invoke(set_stopped_t, Derived&& self) noexcept;
-    
- - 1. Let `SET-STOPPED-MBR` be the expression `std::move(self).set_stopped()`. - - 2. Constraints: Either `SET-STOPPED-MBR` is a valid expression or `typename Derived::set_stopped` denotes a type and callable<set_stopped_t, BASE-TYPE(Derived)> is `true`. - - 3. Mandates: `SET-STOPPED-MBR`, if that expression is valid, is not potentially-throwing. - - 4. Effects: Equivalent to: - - * If `SET-STOPPED-MBR` is a valid expression, `SET-STOPPED-MBR`; - - * Otherwise, set_stopped(GET-BASE(std::move(self))). - -
-    friend decltype(auto) tag_invoke(get_env_t, const Derived& self) noexcept;
-    
- - 1. Constraints: Either `self.get_env()` is a valid expression or `typename Derived::get_env` denotes a type and callable<get_env_t, BASE-TYPE(const Derived&)> is `true`. - - 2. Mandates: `noexcept(self.get_env())` is `true` if it is a valid expression. - - 3. Effects: Equivalent to: - - * If `self.get_env()` is a valid expression, `self.get_env()`; - - * Otherwise, get_env(GET-BASE(self)). - ### `execution::completion_signatures` [exec.utils.cmplsigs] ### {#spec-execution.snd_rec_utils.completion_sigs} 1. `completion_signatures` is a type that encodes a set of completion signatures @@ -7763,10 +8300,13 @@ template<class Domain, class Tag, sender Sndr, class... Args> concept always-true = true; // exposition only
- 1. A type `Fn` satisfies completion-signature if and only if it is a function type with one of the following forms: + 1. A type `Fn` satisfies completion-signature if and + only if it is a function type with one of the following forms: - * set_value_t(Vs...), where Vs is an arbitrary parameter pack. - * set_error_t(Err), where Err is an arbitrary type. + * set_value_t(Vs...), where Vs + is an arbitrary parameter pack. + * set_error_t(Err), where Err is + an arbitrary type. * `set_stopped_t()`
@@ -7778,29 +8318,30 @@ template<class Domain, class Tag, sender Sndr, class... Args>
     
2. Let `Fns...` be a template parameter pack of the arguments of the - `completion_signatures` specialization named by - `Completions`, let TagFns be a - template parameter pack of the function types in `Fns` whose return types - are `Tag`, and let - Tsn be a template parameter - pack of the function argument types in the n-th type - in TagFns. Then, given two variadic templates + `completion_signatures` specialization named by `Completions`, let + TagFns be a template parameter pack of the function + types in `Fns` whose return types are `Tag`, and let + Tsn be a template parameter pack + of the function argument types in the n-th type in + TagFns. Then, given two variadic templates Tuple and Variant, the type - gather-signatures<Tag, Completions, Tuple, Variant> - names the type - META-APPLY(Variant, META-APPLY(Tuple, Ts0...), - META-APPLY(Tuple, Ts1...), ... - META-APPLY(Tuple, Tsm-1...)), where - m is the size of the parameter pack - TagFns and META-APPLY(T, As...) is - equivalent to: + gather-signatures<Tag, Completions, Tuple, + Variant> names the type + META-APPLY(Variant, META-APPLY(Tuple, + Ts0...), META-APPLY(Tuple, + Ts1...), ... META-APPLY(Tuple, + Tsm-1...)), where m + is the size of the parameter pack TagFns and + META-APPLY(T, As...) is equivalent to:
         typename indirect-meta-apply<always-true<As...>>::template meta-apply<T, As...>;
         
- 3. The purpose of META-APPLY is to make it - valid to use non-variadic templates as Variant and Tuple arguments to gather-signatures. + 3. The purpose of META-APPLY is + to make it valid to use non-variadic templates as + Variant and Tuple arguments to + gather-signatures. 4.
     template<completion-signature... Fns>
@@ -7880,53 +8421,70 @@ template<class Domain, class Tag, sender Sndr, class... Args>
       completion_signatures<see below>;
     
- * `SetValue` shall name an alias template such that for any template + 1. `SetValue` shall name an alias template such that for any template parameter pack `As...`, the type `SetValue` is either ill-formed - or else valid-completion-signatures<SetValue<As...>> + or else + valid-completion-signatures<SetValue<As...>> is satisfied. - * `SetError` shall name an alias template such that for any type `Err`, + 2. `SetError` shall name an alias template such that for any type `Err`, `SetError` is either ill-formed or else - valid-completion-signatures<SetError<Err>> - is satisfied. + valid-completion-signatures<SetError<Err>> is + satisfied. Then: - * Let `Vs...` be a pack of the types in the type-list named - by gather-signatures<set_value_t, InputSignatures, SetValue, type-list>. + 3. Let `Vs...` be a pack of the types in the type-list + named by gather-signatures<set_value_t, InputSignatures, + SetValue, type-list>. - * Let `Es...` be a pack of the types in the - type-list named by gather-signatures<set_error_t, InputSignatures, - type_identity_t, error-list>, where error-list is an - alias template such that error-list<Ts...> names - type-list<SetError<Ts>...>. + 4. Let `Es...` be a pack of the types in the type-list + named by gather-signatures<set_error_t, InputSignatures, + type_identity_t, error-list>, where + error-list is an alias template such that + error-list<Ts...> names + type-list<SetError<Ts>...>. - * Let `Ss` name the type `completion_signatures<>` if gather-signatures<set_stopped_t, InputSignatures, - type-list, type-list> is an alias for the type type-list<>; otherwise, `SetStopped`. + 5. Let `Ss` name the type `completion_signatures<>` if + gather-signatures<set_stopped_t, InputSignatures, + type-list, type-list> is an alias for the type + type-list<>; otherwise, `SetStopped`. - Then: + Then: - 1. If any of the above types are ill-formed, then - `transform_completion_signatures` is ill-formed, + 6. If any of the above types are ill-formed, then + `transform_completion_signatures` is ill-formed, - 2. Otherwise, `transform_completion_signatures` names the type `completion_signatures` - where `Sigs...` is the unique set of types in all the template arguments - of all the `completion_signatures` specializations in `[AdditionalSignatures, Vs..., Es..., Ss]`. + 7. Otherwise, `transform_completion_signatures` names the type + `completion_signatures` where `Sigs...` is the unique set of + types in all the template arguments of all the `completion_signatures` + specializations in `[AdditionalSignatures, Vs..., Es..., Ss]`. ## Execution contexts [exec.ctx] ## {#spec-execution.contexts} -1. This subclause specifies some execution resources on which work can be scheduled. +1. This subclause specifies some execution resources on which work can be + scheduled. ### `run_loop` [exec.run.loop] ### {#spec-execution.contexts.run_loop} -1. A `run_loop` is an execution resource on which work can be scheduled. It maintains a simple, thread-safe first-in-first-out queue of work. Its `run()` member function removes elements from the queue and executes them in a loop on whatever thread of execution calls `run()`. +1. A `run_loop` is an execution resource on which work can be scheduled. It + maintains a simple, thread-safe first-in-first-out queue of work. Its `run()` + member function removes elements from the queue and executes them in a loop + on whatever thread of execution calls `run()`. -2. A `run_loop` instance has an associated count that corresponds to the number of work items that are in its queue. Additionally, a `run_loop` has an associated state that can be one of starting, running, or finishing. +2. A `run_loop` instance has an associated count that corresponds to the + number of work items that are in its queue. Additionally, a `run_loop` has an + associated state that can be one of starting, running, + or finishing. -3. Concurrent invocations of the member functions of `run_loop`, other than `run` and its destructor, do not introduce data races. The member functions `pop_front`, `push_back`, and `finish` execute atomically. +3. Concurrent invocations of the member functions of `run_loop`, other than + `run` and its destructor, do not introduce data races. The member functions + `pop_front`, `push_back`, and `finish` execute atomically. -4. [Note: Implementations are encouraged to use an intrusive queue of operation states to hold the work units to make scheduling allocation-free. — end note] +4. Implementations are encouraged to use an intrusive + queue of operation states to hold the work units to make scheduling + allocation-free.
     class run_loop {
@@ -7960,52 +8518,78 @@ template<class Domain, class Tag, sender Sndr, class... Args>
 
 #### Associated types [exec.run.loop.types] #### {#spec-execution.contexts.run_loop.types}
 
-    
-    class run-loop-scheduler;
-    
- - 1. run-loop-scheduler is an unspecified type that models the `scheduler` concept. - - 2. Instances of run-loop-scheduler remain valid until the end of the lifetime of the `run_loop` instance from which they were obtained. - - 3. Two instances of run-loop-scheduler compare equal if and only if they were obtained from the same `run_loop` instance. - - 4. Let sch be an expression of type run-loop-scheduler. The expression schedule(sch) is not potentially-throwing and has type run-loop-sender. +
+class run-loop-scheduler;
+
-
-  class run-loop-sender;
-  
+1. run-loop-scheduler is an unspecified type that models + the `scheduler` concept. - 1. run-loop-sender is an unspecified type such that - sender-of<run-loop-sender> is `true`. - Additionally, the types reported by its `error_types` associated type is - `exception_ptr`, and the value of its `sends_stopped` trait is `true`. +2. Instances of run-loop-scheduler remain valid until the + end of the lifetime of the `run_loop` instance from which they were + obtained. - 2. An instance of run-loop-sender remains valid until the - end of the lifetime of its associated `run_loop` instance. +3. Two instances of run-loop-scheduler compare equal if + and only if they were obtained from the same `run_loop` instance. - 3. Let sndr be an expression of type - run-loop-sender, let rcvr be an - expression such that decltype(rcvr) models the - `receiver_of` concept, and let `C` be either `set_value_t` or - `set_stopped_t`. Then: +4. Let sch be an expression of type + run-loop-scheduler. The expression + schedule(sch) is not potentially-throwing and has type + run-loop-sender. - * The expression connect(sndr, rcvr) has type run-loop-opstate<decay_t<decltype(rcvr)>> and is potentially-throwing if and only if the initialiation of decay_t<decltype(rcvr)> from rcvr is potentially-throwing. +
+class run-loop-sender;
+
- * The expression get_completion_scheduler<C>(get_env(sndr)) is not potentially-throwing, has type run-loop-scheduler, and compares equal to the run-loop-scheduler instance from which sndr was obtained. +1. run-loop-sender is an unspecified type such that + sender-of<run-loop-sender> is `true`. + Additionally, the types reported by its `error_types` associated type is + `exception_ptr`, and the value of its `sends_stopped` trait is `true`. + +2. An instance of run-loop-sender remains valid until the + end of the lifetime of its associated `run_loop` instance. + +3. Let sndr be an expression of type + run-loop-sender, let rcvr be an + expression such that decltype(rcvr) models the + `receiver_of` concept, and let `C` be either `set_value_t` or + `set_stopped_t`. Then: + + * The expression connect(sndr, rcvr) has type + run-loop-opstate<decay_t<decltype(rcvr)>> + and is potentially-throwing if and only if the initialiation of + decay_t<decltype(rcvr)> from + rcvr is potentially-throwing. + + * The expression + get_completion_scheduler<C>(get_env(sndr)) is + not potentially-throwing, has type + run-loop-scheduler, and compares equal to the + run-loop-scheduler instance from which + sndr was obtained. -
-  template<receiver_of<completion_signatures<set_value_t()>> Rcvr> // arguments are not associated entities ([lib.tmpl-heads])
-    struct run-loop-opstate;
-  
+
+template<receiver_of<completion_signatures<set_value_t()>> Rcvr>
+  struct run-loop-opstate;
+
- 1. run-loop-opstate<Rcvr> inherits unambiguously from run-loop-opstate-base. +1. run-loop-opstate<Rcvr> inherits unambiguously + from run-loop-opstate-base. - 2. Let o be a non-`const` lvalue of type run-loop-opstate<Rcvr>, and let REC(o) be a non-`const` lvalue reference to an instance of type Rcvr that was initialized with the expression rcvr passed to the invocation of `connect` that returned o. Then: +2. Let o be a non-`const` lvalue of type + run-loop-opstate<Rcvr>, and let + REC(o) be a non-`const` lvalue reference to an + instance of type Rcvr that was initialized with the + expression rcvr passed to the invocation of `connect` + that returned o. Then: - * The object to which REC(o) refers remains valid for the lifetime of the object to which o refers. + * The object to which REC(o) refers remains + valid for the lifetime of the object to which o + refers. - * The type run-loop-opstate<Rcvr> overrides run-loop-opstate-base::execute() such that o.execute() is equivalent to the following: + * The type run-loop-opstate<Rcvr> overrides + run-loop-opstate-base::execute() such that + o.execute() is equivalent to the following:
         if (get_stop_token(REC(o)).stop_requested()) {
@@ -8015,7 +8599,8 @@ template<class Domain, class Tag, sender Sndr, class... Args>
         }
         
- * The expression start(o) is equivalent to the following: + * The expression start(o) is equivalent to the + following:
         try {
@@ -8027,75 +8612,88 @@ template<class Domain, class Tag, sender Sndr, class... Args>
 
 #### Constructor and destructor [exec.run.loop.ctor] #### {#spec-execution.contexts.run_loop.ctor}
 
-    
-    run_loop::run_loop() noexcept;
-    
+
+run_loop::run_loop() noexcept;
+
- 1. Postconditions: count is `0` and state is starting. +1. Postconditions: count is `0` and state is + starting. -
-    run_loop::~run_loop();
-    
+
+run_loop::~run_loop();
+
- 1. Effects: If count is not `0` or if state is running, invokes `terminate()`. Otherwise, has no effects. +1. Effects: If count is not `0` or if state is + running, invokes `terminate()`. Otherwise, has no effects. #### Member functions [exec.run.loop.members] #### {#spec-execution.contexts.run_loop.members} -
-    run-loop-opstate-base* run_loop::pop_front();
-    
+
+run-loop-opstate-base* run_loop::pop_front();
+
- 1. Effects: Blocks ([defns.block]) until one of the following conditions is `true`: +1. Effects: Blocks ([defns.block]) until one of the following conditions + is `true`: - * count is `0` and state is finishing, in which case `pop_front` returns `nullptr`; or + * count is `0` and state is finishing, in which case + `pop_front` returns `nullptr`; or - * count is greater than `0`, in which case an item is removed from the front of the queue, count is decremented by `1`, and the removed item is returned. + * count is greater than `0`, in which case an item is removed from + the front of the queue, count is decremented by `1`, and the + removed item is returned. -
-    void run_loop::push_back(run-loop-opstate-base* item);
-    
+
+void run_loop::push_back(run-loop-opstate-base* item);
+
- 1. Effects: Adds `item` to the back of the queue and increments count by `1`. +1. Effects: Adds `item` to the back of the queue and increments + count by `1`. - 2. Synchronization: This operation synchronizes with the `pop_front` operation that obtains `item`. +2. Synchronization: This operation synchronizes with the `pop_front` + operation that obtains `item`. -
-    run-loop-scheduler run_loop::get_scheduler();
-    
+
+run-loop-scheduler run_loop::get_scheduler();
+
- 1. Returns: an instance of run-loop-scheduler that can be used to schedule work onto this `run_loop` instance. +1. Returns: an instance of run-loop-scheduler that + can be used to schedule work onto this `run_loop` instance. -
-    void run_loop::run();
-    
+
+void run_loop::run();
+
- 1. Effects: Equivalent to: +1. Effects: Equivalent to: -
-        while (auto* op = pop_front()) {
-          op->execute();
-        }
-        
+
+    while (auto* op = pop_front()) {
+      op->execute();
+    }
+    
- 2. Precondition: state is starting. +2. Precondition: state is starting. - 3. Postcondition: state is finishing. +3. Postcondition: state is finishing. - 4. Remarks: While the loop is executing, state is running. When state changes, it does so without introducing data races. +4. Remarks: While the loop is executing, state is running. + When state changes, it does so without introducing data races. -
-    void run_loop::finish();
-    
+
+void run_loop::finish();
+
- 1. Effects: Changes state to finishing. +1. Effects: Changes state to finishing. - 2. Synchronization: This operation synchronizes with all `pop_front` operations on this object. +2. Synchronization: This operation synchronizes with all `pop_front` + operations on this object. ## Coroutine utilities [exec.coro.utils] ## {#spec-execution.coro_utils} ### `execution::as_awaitable` [exec.as.awaitable] ### {#spec-execution.coro_utils.as_awaitable} -1. `as_awaitable` transforms an object into one that is awaitable within a particular coroutine. This subclause makes use of the following exposition-only entities: +1. `as_awaitable` transforms an object into one that is awaitable within a + particular coroutine. This subclause makes use of the following + exposition-only entities:
     template<class Sndr, class Env>
@@ -8120,18 +8718,29 @@ template<class Domain, class Tag, sender Sndr, class... Args>
 
     1. Alias template single-sender-value-type is defined as follows:
 
-        1. If `value_types_of_t` would have the form `Variant>`, then single-sender-value-type<Sndr, Env> is an alias for type `decay_t`.
+        1. If `value_types_of_t` would have the form
+            `Variant>`, then
+            single-sender-value-type<Sndr, Env> is an
+            alias for type `decay_t`.
 
-        2. Otherwise, if `value_types_of_t` would have the form `Variant>` or `Variant<>`, then single-sender-value-type<Sndr, Env> is an alias for type `void`.
+        2. Otherwise, if `value_types_of_t` would
+            have the form `Variant>` or `Variant<>`, then
+            single-sender-value-type<Sndr, Env> is an
+            alias for type `void`.
 
-        3. Otherwise, if `value_types_of_t` would have the form `Variant>` where `Ts` is a parameter pack, then single-sender-value-type<Sndr, Env> is an alias for type `std::tuple...>`.
+        3. Otherwise, if `value_types_of_t` would
+            have the form `Variant>` where `Ts` is a parameter pack,
+            then single-sender-value-type<Sndr, Env> is an
+            alias for type `std::tuple...>`.
 
-        4. Otherwise, single-sender-value-type<Sndr, Env> is ill-formed.
+        4. Otherwise, single-sender-value-type<Sndr, Env>
+            is ill-formed.
 
-    2. The type sender-awaitable<Sndr, Promise> is equivalent to the following:
+    2. The type sender-awaitable<Sndr, Promise> is
+        equivalent to the following:
 
         
-        template<class Sndr, class Promise> // arguments are not associated entities ([lib.tmpl-heads])
+        template<class Sndr, class Promise>
         class sender-awaitable {
           struct unit {};
           using value_t = single-sender-value-type<Sndr, env_of_t<Promise>>;
@@ -8160,39 +8769,50 @@ template<class Domain, class Tag, sender Sndr, class... Args>
             };
             
- Let `rcvr` be an rvalue expression of type awaitable-receiver, let `crcvr` be a `const` lvalue that refers to `rcvr`, let `vs` be a parameter pack of types `Vs...`, and let `err` be an arbitrary expression of type `Err`. Then: + Let `rcvr` be an rvalue expression of type + awaitable-receiver, let `crcvr` be a `const` + lvalue that refers to `rcvr`, let `vs` be a parameter pack of types + `Vs...`, and let `err` be an arbitrary expression of type `Err`. + Then: - 1. If `constructible_from` is satisfied, the expression `set_value(rcvr, vs...)` is equivalent to: + 1. If `constructible_from` is satisfied, the + expression `set_value(rcvr, vs...)` is equivalent to: -
-                  try {
-                    rcvr.result_ptr_->emplace<1>(vs...);
-                  } catch(...) {
-                    rcvr.result_ptr_->emplace<2>(current_exception());
-                  }
-                  rcvr.continuation_.resume();
-                  
+
+                try {
+                  rcvr.result_ptr_->emplace<1>(vs...);
+                } catch(...) {
+                  rcvr.result_ptr_->emplace<2>(current_exception());
+                }
+                rcvr.continuation_.resume();
+                
- Otherwise, `set_value(rcvr, vs...)` is ill-formed. + Otherwise, `set_value(rcvr, vs...)` is ill-formed. - 2. The expression `set_error(rcvr, err)` is equivalent to: + 2. The expression `set_error(rcvr, err)` is equivalent to: -
-                  rcvr.result_ptr_->emplace<2>(AS-EXCEPT-PTR(err)); // see [exec.general]
-                  rcvr.continuation_.resume();
-                  
+
+                rcvr.result_ptr_->emplace<2>(AS-EXCEPT-PTR(err)); // see [exec.general]
+                rcvr.continuation_.resume();
+                
- 3. The expression `set_stopped(rcvr)` is equivalent to + 3. The expression `set_stopped(rcvr)` is equivalent to static_cast<coroutine_handle<>>(rcvr.continuation_.promise().unhandled_stopped()).resume(). - 4. For any expression `tag` whose type satisfies forwarding-query - and for any pack of subexpressions `as`, `tag_invoke(tag, get_env(crcvr), as...)` - is expression-equivalent to tag(get_env(as_const(crcvr.continuation_.promise())), - as...) when that expression is well-formed. + 4. For any expression `tag` whose type satisfies + forwarding-query and for any pack of + subexpressions `as`, `get_env(crcvr).query(tag, as...)` is + expression-equivalent to + tag(get_env(as_const(crcvr.continuation_.promise())), + as...) when that expression is well-formed. - 2. sender-awaitable::sender-awaitable(Sndr&& sndr, Promise& p) + 2. sender-awaitable::sender-awaitable(Sndr&& + sndr, Promise& p) - - Effects: initializes `state_` with connect(std::forward<Sndr>(sndr), awaitable-receiver{&result_, coroutine_handle<Promise>::from_promise(p)}). + - Effects: initializes `state_` with + connect(std::forward<Sndr>(sndr), + awaitable-receiver{&result_, + coroutine_handle<Promise>::from_promise(p)}). 3. value_t sender-awaitable::await_resume() @@ -8205,32 +8825,43 @@ template<class Domain, class Tag, sender Sndr, class... Args> return std::forward<value_t>(get<1>(result_));
-2. `as_awaitable` is a customization point object. For some subexpressions `expr` and `p` where `p` is an lvalue, `Expr` names the type `decltype((expr))` and `Promise` names the type `decltype((p))`, `as_awaitable(expr, p)` is expression-equivalent to the following: +2. `as_awaitable` is a customization point object. For some subexpressions + `expr` and `p` where `p` is an lvalue, `Expr` names the type + `decltype((expr))` and `Promise` names the type `decltype((p))`, + `as_awaitable(expr, p)` is expression-equivalent to the following: - 1. `tag_invoke(as_awaitable, expr, p)` if that expression is well-formed. + 1. `expr.as_awaitable(p)` if that expression is well-formed. - * Mandates: is-awaitable<A, Promise> is `true`, where `A` is the type of the `tag_invoke` expression above. + * Mandates: is-awaitable<A, Promise> is + `true`, where `A` is the type of the expression above. - 2. Otherwise, `expr` if is-awaitable<Expr, U> is - `true`, where U is an unspecified class type that + 2. Otherwise, `expr` if is-awaitable<Expr, U> + is `true`, where U is an unspecified class type that lacks a member named `await_transform`. The - condition is not is-awaitable<Expr, Promise> as that - creates the potential for constraint recursion. + condition is not is-awaitable<Expr, Promise> as + that creates the potential for constraint recursion. - * Preconditions: is-awaitable<Expr, Promise> is - `true` and the expression `co_await expr` in a coroutine with promise - type U is expression-equivalent to the same - expression in a coroutine with promise type `Promise`. + * Preconditions: is-awaitable<Expr, + Promise> is `true` and the expression `co_await expr` in a + coroutine with promise type U is + expression-equivalent to the same expression in a coroutine with + promise type `Promise`. - 3. Otherwise, sender-awaitable{expr, p} if awaitable-sender<Expr, Promise> is `true`. + 3. Otherwise, sender-awaitable{expr, p} if + awaitable-sender<Expr, Promise> is `true`. 4. Otherwise, `expr`. ### `execution::with_awaitable_senders` [exec.with.awaitable.senders] ### {#spec-execution.coro_utils.with_awaitable_senders} - 1. `with_awaitable_senders`, when used as the base class of a coroutine promise type, makes senders awaitable in that coroutine type. +1. `with_awaitable_senders`, when used as the base class of a coroutine promise + type, makes senders awaitable in that coroutine type. - In addition, it provides a default implementation of `unhandled_stopped()` such that if a sender completes by calling `set_stopped`, it is treated as if an uncatchable "stopped" exception were thrown from the await-expression. In practice, the coroutine is never resumed, and the `unhandled_stopped` of the coroutine caller's promise type is called. + In addition, it provides a default implementation of `unhandled_stopped()` + such that if a sender completes by calling `set_stopped`, it is treated as + if an uncatchable "stopped" exception were thrown from the + await-expression. In practice, the coroutine is never resumed, and + the `unhandled_stopped` of the coroutine caller's promise type is called.
     template<class-type Promise>
@@ -8248,7 +8879,7 @@ template<class Domain, class Tag, sender Sndr, class... Args>
         template<class Value>
         see below await_transform(Value&& value);
 
-       private:
+        private:
         // exposition only
         [[noreturn]] static coroutine_handle<> default_unhandled_stopped(void*) noexcept {
           terminate();
@@ -8259,29 +8890,29 @@ template<class Domain, class Tag, sender Sndr, class... Args>
       };
     
- 2. `void set_continuation(coroutine_handle h) noexcept` +2. `void set_continuation(coroutine_handle h) noexcept` - - Effects: equivalent to: + - Effects: equivalent to: -
-        continuation_ = h;
-        if constexpr ( requires(OtherPromise& other) { other.unhandled_stopped(); } ) {
-          stopped_handler_ = [](void* p) noexcept -> coroutine_handle<> {
-            return coroutine_handle<OtherPromise>::from_address(p)
-              .promise().unhandled_stopped();
-          };
-        } else {
-          stopped_handler_ = default_unhandled_stopped;
-        }
-        
+
+      continuation_ = h;
+      if constexpr ( requires(OtherPromise& other) { other.unhandled_stopped(); } ) {
+        stopped_handler_ = [](void* p) noexcept -> coroutine_handle<> {
+          return coroutine_handle<OtherPromise>::from_address(p)
+            .promise().unhandled_stopped();
+        };
+      } else {
+        stopped_handler_ = default_unhandled_stopped;
+      }
+      
- 3. call-result-t<as_awaitable_t, Value, Promise&> await_transform(Value&& value) +3. call-result-t<as_awaitable_t, Value, Promise&> await_transform(Value&& value) - - Effects: equivalent to: + - Effects: equivalent to: -
-        return as_awaitable(std::forward<Value>(value), static_cast<Promise&>(*this));
-        
+
+      return as_awaitable(std::forward<Value>(value), static_cast<Promise&>(*this));
+      
 {