-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data race during shutdown #26
Comments
Are you able to post the complete code for your client? Stripped down if possible, but something that still reproduces this data race report from tsan. |
I've distilled this down as much as I can. Attached is the source for a trivial program. It implements a very simple server and client. The client is implemented using NTF. It initializes the interface and connects to the server, which works fine. The main() function instantiates the server and the client and sleeps for five seconds. Then it shuts down the NTF run time. At this point, I'm getting complaints from thread sanitizer (attached). The code is compiled with If port 10001 is in use on your machine, change it to something unused at the top of the source. To keep things simple, I hardwired it. I also used assert() to bail out if there is an error, rather than confusing the issue with irrelevant error-handling noise. |
Apologies, I closed this accidentally. |
@michihenning I played around with this and the problem seems to be with the way the shutdown sequence is initiated in the Client destructor. You have: Client::~Client()
{
interface_->closeAll(); // <-- this one should go below everything
interface_->shutdown();
interface_->linger();
} It looks like you want to call |
It never occurred to me to do this. I find this highly surprising. After all, I want to tell it to tear everything down and then wait until it’s all finished. Seems really counter-intuitive. In particular, if I call closeAll() as the last thing, how do I know when it’s done? I don’t want to exit with threads still running, and I might want to shut down something else (not NTF-related) after NTF has finished shutting down. I had a look at InterfaceStopGuard. It calls shutdown() and linger(), but not closeAll(). The doc is really lacking here. There is no mention of what actually happens when I call closeAll() or shutdown(). Does shutdown() imply closeAll()? When would I call closeAll() instead of shutdown()? When I call shutdown(), will the completion handlers for everything get invoked with a special error? I would hope so, because might have to do some cleanup of my own. |
To clarify, I was just experimenting with various ways of shutting down. It may have gotten rid of the tsan complaints but certainly is not necessarily the right thing to do. Let's wait for the official word. |
Hello @michihenning and @rdilipk. First of all sorry that it took us so long to check that issue. It may be that the issue is no longet there as sometime ago we had a major redesign of the way how sockets are shutdown and closed in ntc. I plan to examine it further. |
I don't speak for @michihenning but I think he gave up on NTF long ago. It may not matter at this point. |
I have a client implemented with NTF that talks to a pre-existing server. Everything works fine. I can resolve, connect, and read and write messages. The code runs clean under address sanitizer. However, thread sanitizer complains about data races whenever I shut down the interface. The interface is created like so:
There is a single thread in the interface. I have assertions all through the code to verify that callbacks are invoked on the single interface thread. All callbacks are created using the appropriate createCallback functions, and they specify the single strand for the interface.
To shut down the interface, I have this code:
At the time I call stop(), there is no activity, other than a single receive() call pending, whose completion handler has not popped yet because the server has not written anything. There are no pending send() calls or other async operations.
In case it matters, my reads specify a timeout:
Once I call my stop code, thread sanitizer complains. I have pasted some of the salient output here. T11 is the interface thread.
The text was updated successfully, but these errors were encountered: