Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the race condition while calling rcl_shutdown #1353

Open
wants to merge 3 commits into
base: rolling
Choose a base branch
from

Conversation

Barry-Xu-2018
Copy link
Contributor

Address #1352

@Barry-Xu-2018
Copy link
Contributor Author

Barry-Xu-2018 commented Sep 3, 2024

@sloretz

Regarding the fix for issue #1352, I'd like to hear your thoughts.

About issue #1352, there is an issue

One thread is calling rclpy::shutdown()

void shutdown_contexts()
{
// graceful shutdown all contexts
std::lock_guard<std::mutex> guard{g_contexts_mutex};
for (auto * c : g_contexts) {
rcl_ret_t ret = rcl_shutdown(c);
(void)ret;
}
g_contexts.clear();
}

Another thread is calling Context::shutdown()

Context::shutdown()
{
{
std::lock_guard<std::mutex> guard{g_contexts_mutex};
auto iter = std::find(g_contexts.begin(), g_contexts.end(), rcl_context_.get());
if (iter != g_contexts.end()) {
g_contexts.erase(iter);
}
}
rcl_ret_t ret = rcl_shutdown(rcl_context_.get());
if (RCL_RET_OK != ret) {
throw RCLError("failed to shutdown");
}
}

rcl_shutdown() will be called twice. And rcl_shutdown() may be called at the same time (Calling rcl_shutdown() isn't protected by g_contexts_mutex in Context::shutdown()).
I changed the codes in Context::shutdown().
But this brings a problem: multiple calls to Context::shutdown() will no longer trigger an exception. This will cause the below existing test to fail.

def test_double_shutdown():
context = rclpy.context.Context()
rclpy.init(context=context)
assert context.ok()
rclpy.shutdown(context=context)
with pytest.raises(RuntimeError):
rclpy.shutdown(context=context)

My question is whether we still need to ensure that multiple calls to Context::shutdown() throw an exception ? Maybe It is more appropriate to display a warning In this scenario. What do you think ?

@Barry-Xu-2018
Copy link
Contributor Author

Friendly ping @sloretz ?

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is correct fix.

rclpy.init(context=context)
assert context.ok()
rclpy.shutdown(context=context)
with pytest.raises(RuntimeError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to remove this test?

I think we can keep this test, but expects RCLError exception here to catch from 2nd rcl_shutdown (internally RCL_RET_ALREADY_SHUTDOWN returns in rcl) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments.

I think we can keep this test, but expects RCLError exception here to catch from 2nd rcl_shutdown (internally RCL_RET_ALREADY_SHUTDOWN returns in rcl) ?

With the current fix, if the context is not in g_contexts, rcl_shutdown will not be called.

According to the issue described in #1352, it is indeed possible for two threads to call rcl_shutdown on the same context. However, one call is from rclpy::shutdown() and the other from Context::shutdown(). In this situation, we do not want to see an exception being thrown.

The test calls Context::shutdown() twice, and it is reasonable to expect an exception on the second call. Therefore, I am considering setting a flag variable in the Context class. If Context::shutdown() has been successfully called, this variable will be set. If Context::shutdown() is called again, it will check if the flag has been set and, if so, will throw an exception CONTEXT_ALREADY_SHUTDOWN (not from RCL). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, I am considering setting a flag variable in the Context class. If Context::shutdown() has been successfully called, this variable will be set. If Context::shutdown() is called again, it will check if the flag has been set and, if so, will throw an exception CONTEXT_ALREADY_SHUTDOWN (not from RCL). What do you think?

Please review 0eb7ee5

rcl_ret_t ret = rcl_shutdown(rcl_context_.get());
if (RCL_RET_OK != ret) {
throw RCLError("failed to shutdown");
auto iter = std::find(g_contexts.begin(), g_contexts.end(), rcl_context_.get());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the g_contexts is the collection of valid context, and this means if it cannot find the context in the g_contexts during this shutdown call, that is the invalid operation? so we can generate the exception without introducing already_shutdown_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the g_contexts is the collection of valid context, and this means if it cannot find the context in the g_contexts during this shutdown call, that is the invalid operation?

Yes.
rc_context_ is added to g_contexts in constructor of Context. Before calling rcl_shutdown(), rc_context_ is removed from g_contexts.

so we can generate the exception without introducing already_shutdown_?

already_shutdown_ is only used for multiple calls to Context::shutdown() to throw the exception.

About issue reported by #1352, it describes a possible scenario. There are two threads to call rcl_shutdown for the same context. One call is from rclpy::shutdown() and the other from Context::shutdown(). In this situation, we do not want to get an exception in Context::shutdown().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is exactly why we have std::lock_guard<std::mutex> guard{g_contexts_mutex}; lock escalated to the this function scope? so that when rclpy::shutdown_contexts is under process, we will not find the iterator with this context, so generate the exception. i really do not understand to have already_shutdown_ flag internally...

@Yadunund
Copy link
Member

@fujitatomoya do you mind doing another round of review here whenever you get a chance?

@fujitatomoya
Copy link
Collaborator

@Barry-Xu-2018 is this still draft? are we waiting for someone or something to make if official?

@Barry-Xu-2018 Barry-Xu-2018 marked this pull request as ready for review September 27, 2024 09:18
@Barry-Xu-2018
Copy link
Contributor Author

@Barry-Xu-2018 is this still draft? are we waiting for someone or something to make if official?

Oh, I forgot to change the status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants