Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix durability of service introspection topics #1068

Closed
nirwester opened this issue May 11, 2023 · 9 comments
Closed

Fix durability of service introspection topics #1068

nirwester opened this issue May 11, 2023 · 9 comments

Comments

@nirwester
Copy link

Hi, while testing the new service introspection topic, I noticed an anomaly on the durability of the published service response.

If executing the following sequence of actions:

-starting a server
-starting a client
-wait some seconds to make sure that the execution is finished
-subscribe to the introspection topic (ex ros2 topic echo /add_two_ints/_service_event)
The last response sent by the server is received, even if the service is no more running. Ex:

info:
event_type: 2
stamp:
sec: 1683653877
nanosec: 751816926
client_gid:
...
sequence_number: 1
request: []
response:
sum: 3

I used this code, where the basic services example was modified to enable full intropsection:
https://github.com/nirwester/ros2_journey_examples/tree/master/homeworks/chap4/service_with_introspection
Running on this docker container (jammy-Iron):
osrf/docker_images#673

This would seem an issue in the durability of the corresponding publisher, probably set to transient_local. I don't know if there's any technical reason for this kind of implementation, but getting the last published results upon subscription seems confusing to me.

The problem was reproduced by @clalancette that proposed two possible solutions:

  • Leave it as TRANSIENT_LOCAL, but give it a large queue depth. That means that late-joining subscribers would be able to get all of the previous data, at the expense of using a lot of memory.
  • Change it to VOLATILE and just don't worry about late-joining subscribers.

The volatile choice sounds like the more reasonable, but we might be missing something.

@jacobperron
Copy link
Member

Volatile durability makes sense to me. I believe this is what is suggested in the design doc: ros-infrastructure/rep#360

The service event topics proposed in this REP shall use the default quality of service settings.

I'm pretty sure that the default durability QoS setting for topics is volatile. If someone wants to change the QoS for a specific service event topic, they can still do that with a runtime config file.

@fujitatomoya
Copy link
Collaborator

probably I might be mistaken...

currently configure_introspection allows user application to set QoS for service event publisher. that said user can change the QoS based on the use case, that leaves the flexibility for the user application. (example sets SystemDefaultsQoS that leads to the implementation specific setting)

why do we want to set a static QoS for service event publisher to remove the flexibility for user application?

@nirwester
Copy link
Author

nirwester commented May 12, 2023

@fujitatomoya It apparently can, yet doing something like:

service->configure_introspection(node->get_clock(),
                                   rclcpp::SystemDefaultsQoS(),
                                   RCL_SERVICE_INTROSPECTION_CONTENTS);

Results in a transient local durability on the published result (or at least on the published result being sent upon subscription), which looks strange to me.

@nirwester
Copy link
Author

For completeness, here are the details of the published topic when using SystemDefaultQoS on both service and client:

ros2 topic info -v /add_two_ints/_service_event
Type: example_interfaces/srv/AddTwoInts_Event

Publisher count: 1

Node name: add_two_ints_server
Node namespace: /
Topic type: example_interfaces/srv/AddTwoInts_Event
Topic type hash: UNKNOWN
Endpoint type: PUBLISHER
GID: 01.0f.a2.5c.c5.00.59.e0.00.00.00.00.00.00.14.03
QoS profile:
  Reliability: RELIABLE
  History (Depth): UNKNOWN
  Durability: TRANSIENT_LOCAL
  Lifespan: Infinite
  Deadline: Infinite
  Liveliness: AUTOMATIC
  Liveliness lease duration: Infinite

Subscription count: 0

@fujitatomoya
Copy link
Collaborator

@nirwester yes i understand, and that because application sets SystemDefaultQoS, right?

i mean, can user not do the following? that explicitly sets RMW_QOS_POLICY_DURABILITY_VOLATILE? (just one of example QoS setting to set RMW_QOS_POLICY_DURABILITY_VOLATILE, user actually have full control for any QoS setting.)

service->configure_introspection(node->get_clock(),
                                   rclcpp::ServicesQoS(),
                                   RCL_SERVICE_INTROSPECTION_CONTENTS);

https://github.com/ros2/rmw/blob/36003aaed8fd9718c20b19f4da6fb21929574058/rmw/include/rmw/qos_profiles.h#LL64C1-L75C3

@nirwester
Copy link
Author

@fujitatomoya Yes, I guess my confusion stems from the fact that RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT is behaving as TRANSIENT_LOCAL, while I expected it to behave as volatile. If I manually configure it to be something else that uses volatile, then it works as expected. There might be no issue then, but why is RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT behaving as transient local?

@iuhilnehc-ynos
Copy link
Collaborator

iuhilnehc-ynos commented May 12, 2023

why is RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT behaving as transient local?

Currently, The behavior of RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT depends on the rmw_implement(or DDS).

  switch (qos_policies.durability) {
    case RMW_QOS_POLICY_DURABILITY_TRANSIENT_LOCAL:
      entity_qos.durability().kind = eprosima::fastdds::dds::TRANSIENT_LOCAL_DURABILITY_QOS;
      break;
    case RMW_QOS_POLICY_DURABILITY_VOLATILE:
      entity_qos.durability().kind = eprosima::fastdds::dds::VOLATILE_DURABILITY_QOS;
      break;
    case RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT:
      break;
    default:
      RMW_SET_ERROR_MSG("Unknown QoS durability policy");
      return false;
  }

the default value for publisher(writer) is TRANSIENT_LOCAL_DURABILITY_QOS in FastDDS,
https://fast-dds.docs.eprosima.com/en/latest/fastdds/api_reference/dds_pim/core/policy/durabilityqospolicy.html#classeprosima_1_1fastdds_1_1dds_1_1DurabilityQosPolicy

By default the value for DataReaders: VOLATILE_DURABILITY_QOS, for DataWriters TRANSIENT_LOCAL_DURABILITY_QOS.
    case RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT:
    case RMW_QOS_POLICY_DURABILITY_VOLATILE:
      dds_qset_durability(qos, DDS_DURABILITY_VOLATILE);
  switch (qos_policies->durability) {
    case RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT:
      {
        break;
      }

I am not able to find out the doc about the default setting, but the source code.

/opt/rti.com/rti_connext_dds-6.0.1/include/ndds/dds_c/dds_c_infrastructure.h:1417

#define DDS_DURABILITY_QOS_POLICY_DEFAULT \
{ DDS_VOLATILE_DURABILITY_QOS, \
  RTI_TRUE }

If we use rmw_cyclonedds_cpp and rmw_connextdds to test the case, the demo can work as expected.
I think it's better not to use rclcpp::SystemDefaultsQoS with RMW_QOS_POLICY_DURABILITY_SYSTEM_DEFAULT .

@nirwester
Copy link
Author

Understood, thanks a lot for the clear explanation! Given these differences, that can deeply influence the behavior of the application, I'll stay away from rclcpp::SystemDefaultsQoS :)

@fujitatomoya
Copy link
Collaborator

fujitatomoya commented May 12, 2023

@iuhilnehc-ynos thanks for the follow-up.

@nirwester if you find any other issues, please feel free to open it.

btw, i was thinking that probably we can set the default QoS with durability VOLATILE for service event publisher, if user does not specify the QoS for that? i think that what happened here, user probably sets SystemDefaultsQoS and that leads to this question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants