Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor termination during ReceiveAsync may cause unexpected behavior #7485

Open
nkreipke opened this issue Jan 23, 2025 · 3 comments
Open

Actor termination during ReceiveAsync may cause unexpected behavior #7485

nkreipke opened this issue Jan 23, 2025 · 3 comments

Comments

@nkreipke
Copy link

Hi,

I've run into the same issue that was discussed in #3259. The issue in that ticket boils down to how task scheduling works in Akka, which is by sending instances of ActorTaskSchedulerMessage to itself to execute task continuations. The actor may handle system messages inbetween these messages including the Terminate message, which causes a ReceiveAsync message handler to stop executing at any await point.

In the original ticket discussion this was explained as a feature, as it would terminate actors that are waiting on a continuation which will never complete.

I'm creating this ticket to re-open a discussion about this, as there is an important consequence of this not mentioned in the original discussion, which is using and try-finally statements in async methods.

ReceiveAsync<Test>(async message =>
{
    using var file = File.OpenRead("...");

    await file.WriteAsync(message.Bytes);
});

Due to how the compiler generates the async state machine, file will not get disposed if the actor gets terminated during WriteAsync. Similarly, a finally block is not executed if a task is awaited in the corresponding try block. This can lead to all sorts of problems, including memory leaks. This behavior is unexpected and breaks the guarantees of using and try-finally so it leads to bugs which are almost impossible to diagnose unless the developer knows about it.

It is also not as difficult to reproduce as initially assumed ("just avoid using Context.Stop and send a PoisonPill instead"): While a PoisonPill is not a system message and therefore would not trigger this behavior, an actor receiving a PoisonPill will send Terminate to all its children. This is how I've fallen victim to this in production.

There are ways around this by sending custom messages instead of PoisonPill and manually terminating children, but I bet most users of Akka don't know about that. While it is probably impossible to fix Akka 1 without breaking other stuff, it might be worth considering adding a warning to the documentation or even discouraging using ReceiveAsync with using or try-finally altogether.

I've provided a repro for this here: https://github.com/nkreipke/AkkaUsingTest

@Aaronontheweb
Copy link
Member

There are ways around this by sending custom messages instead of PoisonPill and manually terminating children, but I bet most users of Akka don't know about that. While it is probably impossible to fix Akka 1 without breaking other stuff, it might be worth considering adding a warning to the documentation or even discouraging using ReceiveAsync with using or try-finally altogether.

So just as an explanation for why Akka.NET allows this - it's because users regularly hung their ActorSystems during termination while await-ing on Ask<T> operations (and other Tasks) if we gave an individual actor's await priority over the Terminate message. It also meant that functionality like Context.Watch couldn't work reliably when actors were await-ing either. Thus we re-designed the ReceiveAsync / RunTask behavior to allow system messages to still be processed while await-ing - this is a side effect of that.

Async Cleanup Without Custom Shutdowns

If you're worried about resource leaks stemming from an actor, the right way to do this is to probably:

  • have a CancellationToken as a private field on the actor
  • pass that CancellationToken into all async operations either directly or as a LinkedCancellationToken if you want to combine it with a per-request CancellationToken
  • Invoke that token in the actor's PostStop

The trouble you're going to have with resource clean-up et al is that if the await-able task hasn't finished yet, then there won't be a pending system message for the actor to process sitting in its mailbox, and one of the pending messages will be the task continuation function.

So the other option you have to addressing this is to run your Task as a detached Task and then deliver the results back to the actor via PipeTo - that will require implementing a bit of behavior-switching but if you need to do things like complete a using call to dispose of a file, you're no longer anchored to the actor's life-cycle if that's an issue.

I have some videos on the latter here:

Design Changes

I appreciate the problems this caused for you @nkreipke or the time and effort you took to report the bug. I'm not sure we're going to be able to automatically solve this though, because when faced with the choice between:

  1. Allowing await to be king and supersede everything else in actor messaging or
  2. Allowing system messages to still be king

Option 1 basically left us with all of the problems of option 2 plus some many additional ones, like cluster nodes never being able to finish termination. Actors have to be able to shutdown on short-notice.

@nkreipke
Copy link
Author

Appreciate the in-depth reply. It certainly is a difficult problem to solve and I get why it is done that way.

My issue with it is that it's just that easy for developers to shoot themselves in the foot with this, especially because vulnerable code looks completely fine and the available workarounds are non-trivial.

The CancellationToken idea works in my test case and it brings up a good point: Nobody expects the task to actually run to completion, they just expect the cleanup to work, which it would also do in case of cancellation. Maybe there is a way to mark all pending tasks as cancelled during termination? Or maybe throw an ExecutingActorTerminatedException? I'm not familiar enough with .NET task scheduling to say that's possible, should be worth looking into though.

@Aaronontheweb
Copy link
Member

I wonder if it's possible for us to offer a "last chance execution" on the ActorTaskScheduler as part of the shutdown routine. That might be possible, so that way at least the continuation function can exit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants