Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle SIGINT and SIGTERM in rosbag2 recorder #1301

Merged
merged 5 commits into from
Jun 13, 2023

Conversation

MichaelOrlov
Copy link
Contributor

@MichaelOrlov MichaelOrlov commented Apr 20, 2023

Note: Will address Windows specific signals handling in a separate PR.

@delete-merged-branch delete-merged-branch bot deleted the branch rolling May 3, 2023 20:23
@MichaelOrlov MichaelOrlov force-pushed the morlov/gracefully_handle_sigint_in_recorder branch from ee6475f to 1f2c53a Compare May 16, 2023 23:19
@MichaelOrlov MichaelOrlov changed the base branch from morlov/add_recorder_close_api to rolling May 16, 2023 23:20
@MichaelOrlov MichaelOrlov marked this pull request as ready for review May 20, 2023 21:07
@MichaelOrlov MichaelOrlov requested a review from a team as a code owner May 20, 2023 21:07
@MichaelOrlov MichaelOrlov requested review from gbiggs, james-rms and emersonknapp and removed request for a team May 20, 2023 21:07
@MichaelOrlov
Copy link
Contributor Author

Gist: https://gist.githubusercontent.com/MichaelOrlov/95900a255259b630779ca19eff1f12d4/raw/db8ac815a5d0ab97586d75360a4a7b13d9f2786c/ros2.repos
BUILD args: --packages-above-and-dependencies rosbag2_py rosbag2_tests
TEST args: --packages-above rosbag2_py rosbag2_tests
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/12090

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@MichaelOrlov
Copy link
Contributor Author

@emersonknapp @james-rms This PR is relatively small and ready for review

@MichaelOrlov
Copy link
Contributor Author

@clalancette @emersonknapp @james-rms Kindly ping for review.
This PR is blocking another #1342 PR

@@ -190,16 +190,23 @@ class Player
class Recorder
{
private:
std::unique_ptr<rclcpp::executors::SingleThreadedExecutor> exec_;
static std::unique_ptr<rclcpp::executors::SingleThreadedExecutor> exec_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this member static now? Because of the rclcpp::init/shutdown in here, it doesn't really make sense to have multiple rclpy::Recorder instances, but the semantics of this global variable don't really make sense to me. Would these be better as just a global variable, or a singleton object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emersonknapp Hmm, I didn't think about the situation when multiple instances of the rclpy::Recorder and one static executor. This is a good point.
I made the executor static to be able to call its static exec_->cancel(); method from signal handlers to interrupt execution.
In the case of the situation when multiple instances of the rclpy::Recorder exists it looks reasonable I would say even must have to make the executor as static to convey signal interruption event to all instances.

However, I agree that it would be better to wrap the static executor in the singleton pattern. I am sorry I missed it, - will fix.
Good catch!

@MichaelOrlov MichaelOrlov force-pushed the morlov/gracefully_handle_sigint_in_recorder branch from 48c870b to 3197b6f Compare June 3, 2023 18:45
@MichaelOrlov
Copy link
Contributor Author

The build_and_test job failed with exactly the same error messages as I have seen recently in #1342

FAIL: test_record.TestRecordAfterShutdown.test_exit_code[sqlite3]
2023-06-03T19:45:38.7299964Z ----------------------------------------------------------------------------------------------------------------------------------
2023-06-03T19:45:38.7300321Z Traceback (most recent call last):
2023-06-03T19:45:38.7300670Z   File "/__w/rosbag2/rosbag2/ros_ws/src/s1oktrgw0j/rosbag2/ros2bag/test/test_record.py", line 73, in test_exit_code
2023-06-03T19:45:38.7301027Z     launch_testing.asserts.assertExitCodes(
2023-06-03T19:45:38.7301467Z   File "/__w/rosbag2/rosbag2/ros_ws/build/launch_testing/launch_testing/asserts/assert_exit_codes.py", line 62, in assertExitCodes
2023-06-03T19:45:38.7301967Z     assert info.returncode in allowable_exit_codes, 'Proc {} exited with code {}'.format(
2023-06-03T19:45:38.7302418Z AssertionError: Proc ros2bag-cli-7 exited with code -9
2023-06-03T19:45:38.7302839Z ----------------------------- Captured stdout call -----------------------------
2023-06-03T19:45:38.7303047Z 
2023-06-03T19:45:38.7303143Z Starting test run test_record[sqlite3]
2023-06-03T19:45:38.7303591Z [INFO] [launch]: All log files can be found below /github/home/.ros/log/2023-06-03-19-45-17-529104-0dd7204ee0ae-126136
2023-06-03T19:45:38.7303940Z [INFO] [launch]: Default logging verbosity is set to INFO
2023-06-03T19:45:38.7304291Z [INFO] [ros2bag-cli-7]: process started with pid [126220]
2023-06-03T19:45:38.7304840Z [ros2bag-cli-7] stdin is not a terminal device. Keyboard handling disabled.[INFO] [1685821526.257381155] [rosbag2_recorder]: Press SPACE for pausing/resuming
2023-06-03T19:45:38.7305542Z [ros2bag-cli-7] [INFO] [1685821526.260584248] [rosbag2_storage]: Opened database '/tmp/tmpcx5wk9g_/ros2bag_test_record/ros2bag_test_record_0.db3' for READ_WRITE.
2023-06-03T19:45:38.7306069Z [ros2bag-cli-7] [INFO] [1685821526.264542138] [rosbag2_recorder]: Event publisher thread: Starting
2023-06-03T19:45:38.7306561Z [ros2bag-cli-7] [INFO] [1685821526.265532736] [rosbag2_recorder]: Listening for topics...
2023-06-03T19:45:38.7307011Z [ros2bag-cli-7] [INFO] [1685821526.270718724] [rosbag2_recorder]: Subscribed to topic '/rosout'
2023-06-03T19:45:38.7307436Z [INFO] [ros2bag-cli-7]: sending signal 'SIGINT' to process[ros2bag-cli-7]
2023-06-03T19:45:38.7307965Z [ERROR] [ros2bag-cli-7]: process[ros2bag-cli-7] failed to terminate '5' seconds after receiving 'SIGINT', escalating to 'SIGTERM'
2023-06-03T19:45:38.7308438Z [INFO] [ros2bag-cli-7]: sending signal 'SIGTERM' to process[ros2bag-cli-7]
2023-06-03T19:45:38.7308974Z [ERROR] [ros2bag-cli-7]: process[ros2bag-cli-7] failed to terminate '10.0' seconds after receiving 'SIGTERM', escalating to 'SIGKILL'
2023-06-03T19:45:38.7309436Z [INFO] [ros2bag-cli-7]: sending signal 'SIGKILL' to process[ros2bag-cli-7]
2023-06-03T19:45:38.7310002Z [ERROR] [ros2bag-cli-7]: process has died [pid 126220, exit code -9, cmd 'ros2 bag record -a -s sqlite3 --output /tmp/tmpcx5wk9g_/ros2bag_test_record'].
2023-06-03T19:45:38.7310263Z 
2023-06-03T19:45:38.7310369Z Starting test run test_record[mcap]
2023-06-03T19:45:38.7310818Z [INFO] [launch]: All log files can be found below /github/home/.ros/log/2023-06-03-19-45-17-529104-0dd7204ee0ae-126136
2023-06-03T19:45:38.7311165Z [INFO] [launch]: Default logging verbosity is set to INFO
2023-06-03T19:45:38.7311511Z [INFO] [ros2bag-cli-8]: process started with pid [126237]
2023-06-03T19:45:38.7312041Z [ros2bag-cli-8] stdin is not a terminal device. Keyboard handling disabled.[INFO] [1685821536.633422675] [rosbag2_recorder]: Press SPACE for pausing/resuming
2023-06-03T19:45:38.7312563Z [ros2bag-cli-8] [INFO] [1685821536.638780051] [rosbag2_recorder]: Event publisher thread: Starting
2023-06-03T19:45:38.7312999Z [ros2bag-cli-8] [INFO] [1685821536.640277344] [rosbag2_recorder]: Listening for topics...
2023-06-03T19:45:38.7313438Z [ros2bag-cli-8] [INFO] [1685821536.645916818] [rosbag2_recorder]: Subscribed to topic '/rosout'
2023-06-03T19:45:38.7313859Z [INFO] [ros2bag-cli-8]: sending signal 'SIGINT' to process[ros2bag-cli-8]
2023-06-03T19:45:38.7314332Z [ros2bag-cli-8] [INFO] [1685821536.649572001] [rosbag2_recorder]: Subscribed to topic '/events/write_split'
2023-06-03T19:45:38.7314847Z [ros2bag-cli-8] [INFO] [1685821536.650399197] [rosbag2_cpp]: Writing remaining messages from cache to the bag. It may take a while
2023-06-03T19:45:38.7315323Z [ros2bag-cli-8] [INFO] [1685821536.651652191] [rosbag2_recorder]: Event publisher thread: Exiting
2023-06-03T19:45:38.7315982Z [INFO] [ros2bag-cli-8]: process has finished cleanly [pid 126237]

It looks like something got changed recently in underlying sources which is leading to those error.

@MichaelOrlov
Copy link
Contributor Author

@emersonknapp Well after a more detailed analysis and debug session, I wasn't able to prove that freeze happening inside executor->cancel() call. However executor->cancel() can throw an exception.

I rolled back to the non-static executor. And the rationale for that is that it seems the previous implementation with static executor member looks a bit overengineering and doesn't align well with the design of the class since we have context init/shutdown in the constructor and destructor. But the existence of the executor doesn't make sense without a valid context.
In case when the signal will arrive we will trigger our internal static exit_ variable and wait while the current exec_>spin_all(10msec) will exit.

I think we can turn back to the version with static executor later on if we will find out that the current implementation doesn't satisfy our needs.

Thoughts?

Copy link
Collaborator

@emersonknapp emersonknapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now. If we do want to support multiple Recorders from python we will need to do some other changes anyways.

@MichaelOrlov MichaelOrlov force-pushed the morlov/gracefully_handle_sigint_in_recorder branch from 744b09a to 9819e5f Compare June 11, 2023 06:48
- Call recorder->stop() and exec_->cancel() instead of rclcpp::shutdown

Signed-off-by: Michael Orlov <[email protected]>
- In case when signal will arrive we will trigger our internal exit_
variable and wait while current exec_->spin_all(10msec) will exit.

Signed-off-by: Michael Orlov <[email protected]>
- Run exec->spin() in a separate thread, because we need to call
exec->cancel() after recorder->stop() to be able to send notifications
about bag split and close.
- Wait on conditional variable for exit_ flag

Signed-off-by: Michael Orlov <[email protected]>
- Add `record_thread.join()` before trying to parse metadata.

Signed-off-by: Michael Orlov <[email protected]>
@MichaelOrlov MichaelOrlov force-pushed the morlov/gracefully_handle_sigint_in_recorder branch from 45303b8 to 896d8b8 Compare June 12, 2023 23:45
@MichaelOrlov
Copy link
Contributor Author

@emersonknapp I found a better way to handle executor spin() interruption via spinning it in a separate thread and wait on the conditional variable before triggering recorder->stop() method.
I realized that we need spin() executor until the end of the recorder->stop() call to avoid failure on publishing events about bag close or split. Also waiting on the conditional variable for exit will bring more deterministic behavior.
Please review my changes in the new commit c0b75eb

@MichaelOrlov
Copy link
Contributor Author

Re-run CI
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/12218

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@MichaelOrlov MichaelOrlov merged commit 46a23e9 into rolling Jun 13, 2023
@delete-merged-branch delete-merged-branch bot deleted the morlov/gracefully_handle_sigint_in_recorder branch June 13, 2023 06:21
@MichaelOrlov
Copy link
Contributor Author

https://github.com/Mergifyio backport iron

@mergify
Copy link

mergify bot commented Jun 13, 2023

backport iron

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jun 13, 2023
* Gracefully handle SIGINT and SIGTERM signal for rosbag2 recorder

- Call recorder->stop() and exec_->cancel() instead of rclcpp::shutdown

Signed-off-by: Michael Orlov <[email protected]>

* Use singleton for static executor in the rosbag2_py::Recorder

Signed-off-by: Michael Orlov <[email protected]>

* Rollback to the non-static executor and don't call executor->cancel()

- In case when signal will arrive we will trigger our internal exit_
variable and wait while current exec_->spin_all(10msec) will exit.

Signed-off-by: Michael Orlov <[email protected]>

* Spin recorder node in a separate thread for better handling exit

- Run exec->spin() in a separate thread, because we need to call
exec->cancel() after recorder->stop() to be able to send notifications
about bag split and close.
- Wait on conditional variable for exit_ flag

Signed-off-by: Michael Orlov <[email protected]>

* Address race condition in rosbag2_py.test_record_cancel

- Add `record_thread.join()` before trying to parse metadata.

Signed-off-by: Michael Orlov <[email protected]>

---------

Signed-off-by: Michael Orlov <[email protected]>
(cherry picked from commit 46a23e9)
@MichaelOrlov
Copy link
Contributor Author

https://github.com/Mergifyio backport humble

@mergify
Copy link

mergify bot commented Jun 13, 2023

backport humble

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jun 13, 2023
* Gracefully handle SIGINT and SIGTERM signal for rosbag2 recorder

- Call recorder->stop() and exec_->cancel() instead of rclcpp::shutdown

Signed-off-by: Michael Orlov <[email protected]>

* Use singleton for static executor in the rosbag2_py::Recorder

Signed-off-by: Michael Orlov <[email protected]>

* Rollback to the non-static executor and don't call executor->cancel()

- In case when signal will arrive we will trigger our internal exit_
variable and wait while current exec_->spin_all(10msec) will exit.

Signed-off-by: Michael Orlov <[email protected]>

* Spin recorder node in a separate thread for better handling exit

- Run exec->spin() in a separate thread, because we need to call
exec->cancel() after recorder->stop() to be able to send notifications
about bag split and close.
- Wait on conditional variable for exit_ flag

Signed-off-by: Michael Orlov <[email protected]>

* Address race condition in rosbag2_py.test_record_cancel

- Add `record_thread.join()` before trying to parse metadata.

Signed-off-by: Michael Orlov <[email protected]>

---------

Signed-off-by: Michael Orlov <[email protected]>
(cherry picked from commit 46a23e9)

# Conflicts:
#	rosbag2_py/src/rosbag2_py/_transport.cpp
#	rosbag2_py/test/test_transport.py
MichaelOrlov added a commit that referenced this pull request Jun 13, 2023
* Gracefully handle SIGINT and SIGTERM signal for rosbag2 recorder

- Call recorder->stop() and exec_->cancel() instead of rclcpp::shutdown

Signed-off-by: Michael Orlov <[email protected]>

* Use singleton for static executor in the rosbag2_py::Recorder

Signed-off-by: Michael Orlov <[email protected]>

* Rollback to the non-static executor and don't call executor->cancel()

- In case when signal will arrive we will trigger our internal exit_
variable and wait while current exec_->spin_all(10msec) will exit.

Signed-off-by: Michael Orlov <[email protected]>

* Spin recorder node in a separate thread for better handling exit

- Run exec->spin() in a separate thread, because we need to call
exec->cancel() after recorder->stop() to be able to send notifications
about bag split and close.
- Wait on conditional variable for exit_ flag

Signed-off-by: Michael Orlov <[email protected]>

* Address race condition in rosbag2_py.test_record_cancel

- Add `record_thread.join()` before trying to parse metadata.

Signed-off-by: Michael Orlov <[email protected]>

---------

Signed-off-by: Michael Orlov <[email protected]>
(cherry picked from commit 46a23e9)

# Conflicts:
#	rosbag2_py/src/rosbag2_py/_transport.cpp
#	rosbag2_py/test/test_transport.py
MichaelOrlov added a commit that referenced this pull request Jun 14, 2023
* Gracefully handle SIGINT and SIGTERM signal for rosbag2 recorder

- Call recorder->stop() and exec_->cancel() instead of rclcpp::shutdown

Signed-off-by: Michael Orlov <[email protected]>

* Use singleton for static executor in the rosbag2_py::Recorder

Signed-off-by: Michael Orlov <[email protected]>

* Rollback to the non-static executor and don't call executor->cancel()

- In case when signal will arrive we will trigger our internal exit_
variable and wait while current exec_->spin_all(10msec) will exit.

Signed-off-by: Michael Orlov <[email protected]>

* Spin recorder node in a separate thread for better handling exit

- Run exec->spin() in a separate thread, because we need to call
exec->cancel() after recorder->stop() to be able to send notifications
about bag split and close.
- Wait on conditional variable for exit_ flag

Signed-off-by: Michael Orlov <[email protected]>

* Address race condition in rosbag2_py.test_record_cancel

- Add `record_thread.join()` before trying to parse metadata.

Signed-off-by: Michael Orlov <[email protected]>

---------

Signed-off-by: Michael Orlov <[email protected]>
(cherry picked from commit 46a23e9)

Co-authored-by: Michael Orlov <[email protected]>
MichaelOrlov added a commit that referenced this pull request Jul 3, 2023
* Gracefully handle SIGINT and SIGTERM signal for rosbag2 recorder

- Call recorder->stop() and exec_->cancel() instead of rclcpp::shutdown

Signed-off-by: Michael Orlov <[email protected]>

* Use singleton for static executor in the rosbag2_py::Recorder

Signed-off-by: Michael Orlov <[email protected]>

* Rollback to the non-static executor and don't call executor->cancel()

- In case when signal will arrive we will trigger our internal exit_
variable and wait while current exec_->spin_all(10msec) will exit.

Signed-off-by: Michael Orlov <[email protected]>

* Spin recorder node in a separate thread for better handling exit

- Run exec->spin() in a separate thread, because we need to call
exec->cancel() after recorder->stop() to be able to send notifications
about bag split and close.
- Wait on conditional variable for exit_ flag

Signed-off-by: Michael Orlov <[email protected]>

* Address race condition in rosbag2_py.test_record_cancel

- Add `record_thread.join()` before trying to parse metadata.

Signed-off-by: Michael Orlov <[email protected]>

---------

Signed-off-by: Michael Orlov <[email protected]>
(cherry picked from commit 46a23e9)

# Conflicts:
#	rosbag2_py/src/rosbag2_py/_transport.cpp
#	rosbag2_py/test/test_transport.py
MichaelOrlov added a commit that referenced this pull request Jul 4, 2023
…ckport #1301) (#1395)

* Gracefully handle SIGINT and SIGTERM in rosbag2 recorder (#1301)

* Gracefully handle SIGINT and SIGTERM signal for rosbag2 recorder

- Call recorder->stop() and exec_->cancel() instead of rclcpp::shutdown

Signed-off-by: Michael Orlov <[email protected]>

* Use singleton for static executor in the rosbag2_py::Recorder

Signed-off-by: Michael Orlov <[email protected]>

* Rollback to the non-static executor and don't call executor->cancel()

- In case when signal will arrive we will trigger our internal exit_
variable and wait while current exec_->spin_all(10msec) will exit.

Signed-off-by: Michael Orlov <[email protected]>

* Spin recorder node in a separate thread for better handling exit

- Run exec->spin() in a separate thread, because we need to call
exec->cancel() after recorder->stop() to be able to send notifications
about bag split and close.
- Wait on conditional variable for exit_ flag

Signed-off-by: Michael Orlov <[email protected]>

* Address race condition in rosbag2_py.test_record_cancel

- Add `record_thread.join()` before trying to parse metadata.

Signed-off-by: Michael Orlov <[email protected]>

---------

Signed-off-by: Michael Orlov <[email protected]>
(cherry picked from commit 46a23e9)

# Conflicts:
#	rosbag2_py/src/rosbag2_py/_transport.cpp
#	rosbag2_py/test/test_transport.py

* Address merge conflicts

Signed-off-by: Michael Orlov <[email protected]>

* Workaround for segmentation fault in rclcpp::SignalHandler::uninstall()

Signed-off-by: Michael Orlov <[email protected]>

---------

Signed-off-by: Michael Orlov <[email protected]>
Co-authored-by: Michael Orlov <[email protected]>
@tonynajjar
Copy link

tonynajjar commented Sep 5, 2023

Hey, I'm on Ubuntu, Iron and this doesn't seem to work. You can easily reproduce it with:

import rclpy

from rclpy.node import Node
from rosbag2_py import Recorder

class RosBagRecorder(Node):
    def __init__(self):
        super().__init__("rosbag_recorder")
        self.recorder = Recorder()

def main():
    rclpy.init()
    node = RosBagRecorder()
    rclpy.spin(node)


if __name__ == "__main__":
    main()

The process never exits with CTRL-C

opened #1458

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants