Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks that raise an exception crash the entire executor, even when the exception is caught #1098

Open
haudren-woven opened this issue Mar 28, 2023 · 5 comments
Labels
question Further information is requested

Comments

@haudren-woven
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 20.04
  • Installation type:
    • Binary
  • Version or commit hash:
    • Foxy
  • DDS implementation:
    • Cyclone
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

import rclpy


async def throwing_task():
    raise FileNotFoundError()


async def test_1():
    try:
        await throwing_task()
    except FileNotFoundError:
        pass
    return True


async def test_2(executor):
    task = executor.create_task(throwing_task)
    try:
        await task
    except FileNotFoundError:
        print("Error but should keep going")
    return True


rclpy.init()

node = rclpy.create_node("test_wait")

executor = rclpy.get_global_executor()
executor.add_node(node)

task = executor.create_task(test_1)

executor.spin_until_future_complete(task)

assert task.result()

task = executor.create_task(test_2, executor)

executor.spin_until_future_complete(task)

assert task.result()

rclpy.shutdown()

Expected behavior

The above code should complete and pass since the FileNotFoundError is caught in both cases

Actual behavior

The executor itself raises an exception on test_2 which is uncaught and brings down the entire test...

Additional information

I was trying to run a number of async callbacks in the rclpy executor, and I was raising exceptions in some cases. For some reason, those would bring down the entire node. I tracked it down to these lines:

if handler.exception() is not None:

I am not sure why the executor is raising the exception of the task 🤔 It might be due to the fact that subscribers, timers etc... are tasks that are never explicitly awaited? I don't have much knowledge about the internals of rclpy but this is very surprising coming from asyncio... It is also very surprising to see that the exception is only improperly raised when packaged as an asynchronous task, and not when awaited directly.

Thank you!

fujitatomoya added a commit to fujitatomoya/ros2_test_prover that referenced this issue Mar 29, 2023
@fujitatomoya
Copy link
Collaborator

I think task = executor.create_task(throwing_task) is called with another handler in the executor, which does not have try&catch statement at all.

@fujitatomoya fujitatomoya added the question Further information is requested label Mar 29, 2023
@haudren-woven
Copy link
Author

The stacktrace is fairly clear:

Traceback (most recent call last):
  File "/tmp/test_rclpy.py", line 40, in <module>
    executor.spin_until_future_complete(task)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 299, in spin_until_future_complete
    self.spin_once_until_future_complete(future, timeout_sec)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 722, in spin_once_until_future_complete
    self.spin_once(timeout_sec)
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/executors.py", line 719, in spin_once
    raise handler.exception()
  File "/opt/ros/foxy/lib/python3.8/site-packages/rclpy/task.py", line 239, in __call__
    self._handler.send(None)
  File "/tmp/test_rclpy.py", line 5, in throwing_task
    raise FileNotFoundError()
FileNotFoundError

The exception is first raised in the throwing task (as expected!), then re-raised inside the executor. I'm wondering why we do this? I have the feeling that exceptions raised by asynchronous tasks should only be raised in an await call 🤔

@clalancette
Copy link
Contributor

There is some possibility that this would be improved by #971 (though it needs a rebase). It might be worthwhile to test something like that.

@Deric-W
Copy link

Deric-W commented Nov 29, 2023

I encountered the same problem and created a workaround:

def schedule_task(coro: Coroutine[None, None, Any], executor: Executor) -> Future:
    """Create a task which can be canceled and does not crash the executor."""
    future = Future(executor=executor)
    # interrupt any pending spins
    future.add_done_callback(lambda _: executor.wake())
    executor.create_task(condom_task(future, coro))
    return future


@types.coroutine
def coroutine_yield() -> Generator[None, None, None]:
    """Yield control to the executor."""
    yield


async def condom_task(future: Future, coro: Coroutine[None, None, Any]) -> None:
    """Redirect exceptions to a future to prevent the executor from seeing them."""
    while not future.cancelled():
        try:
            coro.send(None)
        except StopIteration as result:
            future.set_result(result.value)
            return
        except Exception as error:
            future.set_exception(error)
            return
        await coroutine_yield()
    # run any cleanup code
    try:
        coro.close()
    except Exception as error:
        if isinstance(error.__context__, GeneratorExit):
            future.set_exception(error)
        else:
            raise

The workaround involves wrapping the Coroutine with a second one which acts as shield to redirect exceptions from the coroutine away from the executor to a future.

@sloretz
Copy link
Contributor

sloretz commented Feb 16, 2024

Maybe we could copy the behavior of asyncio by adding an api to set an exception handler on the executor, where the default is to log and move on?

@sloretz sloretz removed their assignment Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants