Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sudden Unregistering of Remote Node (Unsync Node Register) in Multi-Machine Setup #201

Open
KhairulM opened this issue Sep 4, 2024 · 7 comments

Comments

@KhairulM
Copy link

KhairulM commented Sep 4, 2024

Setup

I have 2 machines running ubuntu 20.04 and ROS Noetic on the same network:

  • Machine 1 is for running a gazebo simulation (Alias: "FMS")
  • Machine 2 is for spawning the robot in the simulation and running the navigation stack

I've configured the /etc/hosts of both of the machine and multicasting have been enabled (/proc/sys/net/ipv4/ip_forward set to 1 and /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts set to 0)

This is my launch file for Machine 1:

<launch>
	<arg name="gui" default="false"/>
	
	<node name="master_discovery" pkg="fkie_master_discovery" type="master_discovery" output="screen">
		<rosparam param="robot_hosts">['ROBOT1', 'ROBOT2']</rosparam>
	</node>
	<include file="$(find gazebo_ros)/launch/empty_world.launch">
	    <arg name="world_name" value="$(find turtlebot3_gazebo)/worlds/turtlebot3_house.world"/>
	    <arg name="paused" value="false"/>
	    <arg name="use_sim_time" value="true"/>
	    <arg name="gui" value="$(arg gui)"/>
	    <arg name="headless" value="false"/>
	    <arg name="debug" value="false"/>
            <arg name="verbose" value="true"/>
	</include>
</launch>

This is my launch file for Machine 2:

<launch>
        <arg name="model" default="waffle_pi" doc="model type [burger, waffle, waffle_pi]"/>
        <arg name="robot_name" default="robot1" doc="name of the robot (must be unique and does not contain whitespaces)"/>
        <arg name="x_pos" default="-4.0"/>
        <arg name="y_pos" default="1.0"/>
        <arg name="z_pos" default="0.0"/>
        
        <node name="master_discovery" pkg="fkie_master_discovery" type="master_discovery" output="screen">
		<!-- <rosparam param="robot_hosts">['FMS']</rosparam> -->
		<rosparam param="heartbeat_hz">2</rosparam>
		<!-- <rosparam param="send_mcast">False</rosparam> -->
		<!-- <rosparam param="listen_mcast">False</rosparam> -->
	</node>
        <node name="master_sync" pkg="fkie_master_sync" type="master_sync" output="screen">
                <rosparam param="ignore_hosts"> ['ROBOT2'] </rosparam>
                <rosparam param="sync_hosts"> ['FMS'] </rosparam>
		<!-- <rosparam param="ignore_topics"> ['*theora*', '*compressedDepth*'] </rosparam> -->
		<!-- <rosparam param="resync_on_reconnect">False</rosparam> -->
        </node>

        <param name="/use_sim_time" value="true"/>
        
        <group ns="$(arg robot_name)">
                <param name="robot_description" command="$(find xacro)/xacro --inorder '$(find turtlebot3_description)/urdf/turtlebot3_$(arg model).urdf.xacro' robot_name:=$(arg robot_name)" />
                <node name="spawn_urdf" pkg="gazebo_ros" type="spawn_model" args="-urdf -model $(arg robot_name)_turtlebot3 -x $(arg x_pos) -y $(arg y_pos) -z $(arg z_pos) -param robot_description -robot_namespace $(arg robot_name)" output="screen" />
                <node name="robot_state_publisher" pkg="robot_state_publisher" type="robot_state_publisher" output="screen">
                        <param name="publish_frequency" type="double" value="50.0"/>
                </node>
                <node name="base_link_to_laser" pkg="tf" type="static_transform_publisher" args="-0.032 0 0.1397 0 0 0 base_link base_scan 100"/>

                <!-- tranformations relay -->
                <node name="prefixed_odom_relay" pkg="tf" type="static_transform_publisher" args="0 0 0 0 0 0 odom $(arg robot_name)/odom 100"/>
                <node name="prefixed_base_footprint_relay" pkg="tf" type="static_transform_publisher" args="0 0 0 0 0 0 $(arg robot_name)/base_footprint base_footprint 100"/>
        </group>
	

	<!-- topic relay nodes -->
	<node name="$(arg robot_name)_cmd_vel_relay" pkg="topic_tools" type="relay" args="/cmd_vel /$(arg robot_name)/cmd_vel" output="screen" />
	<node name="$(arg robot_name)_imu_relay" pkg="topic_tools" type="relay" args="/$(arg robot_name)/imu /imu" output="screen" />
	<node name="$(arg robot_name)_joint_states_relay" pkg="topic_tools" type="relay" args="/$(arg robot_name)/joint_states /joint_states" output="screen"/>
	<node name="$(arg robot_name)_odom_relay" pkg="topic_tools" type="relay" args="/$(arg robot_name)/odom /odom" output="screen" />
	<node name="$(arg robot_name)_scan_relay" pkg="topic_tools" type="relay" args="/$(arg robot_name)/scan /scan" output="screen"/>
</launch>

I spawned a turtlebot3 model that I have edited so the topics and tf are prefixed properly and the local nodes on Machine 2 can communicate using regular topics/services name such as /odom, /scan, etc

Problem

When I run the navigation stack (this is a custom navigation stack we develop) in Machine 2, sometimes this log appears Unregistering /gazebo, its pretty inconsistent to when the log appears after running the navigation stack. But when I tried to get the info of the gazebo node (rosnode info /gazebo) in Machine 2 I got this
WhatsApp Image 2024-09-03 at 11 42 54 AM
The publications and services is empty on Machine 2, but on Machine 1 the gazebo node still has the correct publications and services and still publishing the correct topic

What could cause this error?

@atiderko
Copy link
Member

atiderko commented Sep 4, 2024

I don't see any configuration errors that could lead to this.

Normally a node (e.g. /gazebo) is defined via topics/services. Only if all are unregistered, you will not find any information about the node in roscore. That is why I am a bit surprised about the log message Unregistering /gazebo. I do not know where this comes from.

But the question remains why no publishers and services are displayed. This raises two questions for me:

  1. is it different before the log message , i.e. directly after the start?
  2. robot_hosts ROBOT1 and ROBOT2 were specified in the configuration. However, RNS3 is displayed in the screenshot. Question: are ROBOT1 and ROBOT2 correctly resolved to the IP addresses 10.10.10.4 and 10.10.10.15 on Machine1 (FMS)? If not, it can happen that the robot sees the FSM at the beginning via the multicast communication and then not because the names on the FSM are resolved to the wrong addresses.

Perhaps the log files of master_discovery and master_sync could provide more information.

@KhairulM
Copy link
Author

KhairulM commented Sep 4, 2024

Hi @atiderko thank you for the quick response, to answer your questions:

  1. yes, before the Unregistering /gazebo log, the publications and services of /gazebo if viewed from Machine 2 are not empty, this is what it looked like
    image

For reference, this is the output of rosnode info /gazebo on FMS (Machine 1), I copied this after I run another robot (RNS5)

--------------------------------------------------------------------------------
Node [/gazebo]
Publications: 
 * /clock [rosgraph_msgs/Clock]
 * /gazebo/link_states [gazebo_msgs/LinkStates]
 * /gazebo/model_states [gazebo_msgs/ModelStates]
 * /gazebo/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /gazebo/parameter_updates [dynamic_reconfigure/Config]
 * /gazebo/performance_metrics [gazebo_msgs/PerformanceMetrics]
 * /rns2/camera/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /rns2/camera/parameter_updates [dynamic_reconfigure/Config]
 * /rns2/camera/rgb/camera_info [sensor_msgs/CameraInfo]
 * /rns2/camera/rgb/image_raw [sensor_msgs/Image]
 * /rns2/camera/rgb/image_raw/compressed [sensor_msgs/CompressedImage]
 * /rns2/camera/rgb/image_raw/compressed/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /rns2/camera/rgb/image_raw/compressed/parameter_updates [dynamic_reconfigure/Config]
 * /rns2/camera/rgb/image_raw/compressedDepth [sensor_msgs/CompressedImage]
 * /rns2/camera/rgb/image_raw/compressedDepth/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /rns2/camera/rgb/image_raw/compressedDepth/parameter_updates [dynamic_reconfigure/Config]
 * /rns2/camera/rgb/image_raw/theora [theora_image_transport/Packet]
 * /rns2/camera/rgb/image_raw/theora/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /rns2/camera/rgb/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
 * /rns2/imu [sensor_msgs/Imu]
 * /rns2/joint_states [sensor_msgs/JointState]
 * /rns2/odom [nav_msgs/Odometry]
 * /rns2/scan [sensor_msgs/LaserScan]
 * /rns5/camera/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /rns5/camera/parameter_updates [dynamic_reconfigure/Config]
 * /rns5/camera/rgb/camera_info [sensor_msgs/CameraInfo]
 * /rns5/camera/rgb/image_raw [sensor_msgs/Image]
 * /rns5/camera/rgb/image_raw/compressed [sensor_msgs/CompressedImage]
 * /rns5/camera/rgb/image_raw/compressed/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /rns5/camera/rgb/image_raw/compressed/parameter_updates [dynamic_reconfigure/Config]
 * /rns5/camera/rgb/image_raw/compressedDepth [sensor_msgs/CompressedImage]
 * /rns5/camera/rgb/image_raw/compressedDepth/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /rns5/camera/rgb/image_raw/compressedDepth/parameter_updates [dynamic_reconfigure/Config]
 * /rns5/camera/rgb/image_raw/theora [theora_image_transport/Packet]
 * /rns5/camera/rgb/image_raw/theora/parameter_descriptions [dynamic_reconfigure/ConfigDescription]
 * /rns5/camera/rgb/image_raw/theora/parameter_updates [dynamic_reconfigure/Config]
 * /rns5/imu [sensor_msgs/Imu]
 * /rns5/joint_states [sensor_msgs/JointState]
 * /rns5/odom [nav_msgs/Odometry]
 * /rns5/scan [sensor_msgs/LaserScan]
 * /rosout [rosgraph_msgs/Log]
 * /tf [tf2_msgs/TFMessage]

Subscriptions: 
 * /clock [rosgraph_msgs/Clock]
 * /gazebo/set_link_state [unknown type]
 * /gazebo/set_model_state [unknown type]
 * /rns2/cmd_vel [unknown type]
 * /rns5/cmd_vel [unknown type]

Services: 
 * /gazebo/apply_body_wrench
 * /gazebo/apply_joint_effort
 * /gazebo/clear_body_wrenches
 * /gazebo/clear_joint_forces
 * /gazebo/delete_light
 * /gazebo/delete_model
 * /gazebo/get_joint_properties
 * /gazebo/get_light_properties
 * /gazebo/get_link_properties
 * /gazebo/get_link_state
 * /gazebo/get_loggers
 * /gazebo/get_model_properties
 * /gazebo/get_model_state
 * /gazebo/get_physics_properties
 * /gazebo/get_world_properties
 * /gazebo/pause_physics
 * /gazebo/reset_simulation
 * /gazebo/reset_world
 * /gazebo/set_joint_properties
 * /gazebo/set_light_properties
 * /gazebo/set_link_properties
 * /gazebo/set_link_state
 * /gazebo/set_logger_level
 * /gazebo/set_model_configuration
 * /gazebo/set_model_state
 * /gazebo/set_parameters
 * /gazebo/set_physics_properties
 * /gazebo/spawn_sdf_model
 * /gazebo/spawn_urdf_model
 * /gazebo/unpause_physics
 * /rns2/camera/rgb/image_raw/compressed/set_parameters
 * /rns2/camera/rgb/image_raw/compressedDepth/set_parameters
 * /rns2/camera/rgb/image_raw/theora/set_parameters
 * /rns2/camera/set_camera_info
 * /rns2/camera/set_parameters
 * /rns2/imu_service
 * /rns5/camera/rgb/image_raw/compressed/set_parameters
 * /rns5/camera/rgb/image_raw/compressedDepth/set_parameters
 * /rns5/camera/rgb/image_raw/theora/set_parameters
 * /rns5/camera/set_camera_info
 * /rns5/camera/set_parameters
 * /rns5/imu_service


contacting node http://FMS:36921/ ...
Pid: 3591519
Connections:
 * topic: /rosout
    * to: /rosout
    * direction: outbound (47647 - 127.0.0.1:55576) [21]
    * transport: TCPROS
 * topic: /clock
    * to: /gazebo
    * direction: outbound
    * transport: INTRAPROCESS
 * topic: /clock
    * to: /master_discovery
    * direction: outbound (47647 - 127.0.0.1:55614) [49]
    * transport: TCPROS
 * topic: /clock
    * to: /rosout
    * direction: outbound (47647 - 127.0.0.1:55906) [50]
    * transport: TCPROS
 * topic: /clock
    * to: /master_sync
    * direction: outbound (47647 - 10.0.0.105:41834) [61]
    * transport: TCPROS
 * topic: /clock
    * to: /master_discovery
    * direction: outbound (47647 - 10.0.0.105:41822) [62]
    * transport: TCPROS
 * topic: /clock
    * to: /task_master
    * direction: outbound (47647 - 10.0.0.105:39326) [101]
    * transport: TCPROS
 * topic: /clock
    * to: /rosapi
    * direction: outbound (47647 - 10.0.0.105:39356) [69]
    * transport: TCPROS
 * topic: /clock
    * to: /prohibition_layer_mongo
    * direction: outbound (47647 - 10.0.0.105:39368) [102]
    * transport: TCPROS
 * topic: /clock
    * to: /cpu_monitor
    * direction: outbound (47647 - 10.0.0.105:39378) [103]
    * transport: TCPROS
 * topic: /clock
    * to: /mongo_bridge
    * direction: outbound (47647 - 10.0.0.105:39390) [104]
    * transport: TCPROS
 * topic: /clock
    * to: /laptop_battery_monitor
    * direction: outbound (47647 - 10.0.0.105:48692) [105]
    * transport: TCPROS
 * topic: /clock
    * to: /launch_manager
    * direction: outbound (47647 - 10.0.0.105:48708) [107]
    * transport: TCPROS
 * topic: /clock
    * to: /rosbridge_websocket
    * direction: outbound (47647 - 10.0.0.105:48698) [106]
    * transport: TCPROS
 * topic: /clock
    * to: /map_server
    * direction: outbound (47647 - 10.0.0.105:60880) [55]
    * transport: TCPROS
 * topic: /clock
    * to: /map_server_nav
    * direction: outbound (47647 - 10.0.0.105:37550) [56]
    * transport: TCPROS
 * topic: /tf
    * to: /planner_utils_node
    * direction: outbound (47647 - 10.0.0.105:60830) [47]
    * transport: TCPROS
 * topic: /tf
    * to: /cart_footprint_publisher
    * direction: outbound (47647 - 10.0.0.105:60844) [53]
    * transport: TCPROS
 * topic: /tf
    * to: /amcl
    * direction: outbound (47647 - 10.0.0.105:60858) [54]
    * transport: TCPROS
 * topic: /tf
    * to: /zone_coverage_server
    * direction: outbound (47647 - 10.0.0.105:60864) [52]
    * transport: TCPROS
 * topic: /rns5/camera/rgb/image_raw/compressed
    * to: /rosbridge_websocket
    * direction: outbound (47647 - 10.0.0.105:44718) [109]
    * transport: TCPROS
 * topic: /clock
    * to: /gazebo (http://FMS:36921/)
    * direction: inbound
    * transport: INTRAPROCESS
 * topic: /rns2/cmd_vel
    * to: http://RNS2:39695/
    * direction: inbound (50092 - RNS2:41049) [57]
    * transport: TCPROS
  1. sorry, I used ROBOT1 and ROBOT2 as an example, there are actually 5 robots with hostnames RNS1, RNS2, RNS3, RNS4, and RNS5. In this case, Machine 2 is the robot machines running the navigation stack. The FMS (Machine 1) could correctly resolve all robot machines into its IP address because I have edited the /etc/hosts on FMS and all of the 5 robot machine

here is the log of master_discovery from FMS
master_discovery-2.log

[rospy.client][INFO] 2024-09-04 18:57:36,186: init_node, name[/master_discovery], pid[3591471]
[xmlrpc][INFO] 2024-09-04 18:57:36,187: XML-RPC server binding to 0.0.0.0:0
[xmlrpc][INFO] 2024-09-04 18:57:36,188: Started XML-RPC server [http://FMS:38295/]
[rospy.init][INFO] 2024-09-04 18:57:36,189: ROS Slave URI: [http://FMS:38295/]
[rospy.impl.masterslave][INFO] 2024-09-04 18:57:36,189: _ready: http://FMS:38295/
[rospy.registration][INFO] 2024-09-04 18:57:36,190: Registering with master node http://localhost:11311
[xmlrpc][INFO] 2024-09-04 18:57:36,191: xml rpc node: starting XML-RPC server
[rospy.init][INFO] 2024-09-04 18:57:36,290: registered with master
[rospy.rosout][INFO] 2024-09-04 18:57:36,290: initializing /rosout core topic
[rospy.rosout][INFO] 2024-09-04 18:57:36,294: connected to core topic /rosout
[rospy.simtime][INFO] 2024-09-04 18:57:36,301: initializing /clock core topic
[rospy.simtime][INFO] 2024-09-04 18:57:36,305: connected to core topic /clock
[rosout][INFO] 2024-09-04 18:57:36,322: Check the ROS Master[Hz]: 1
[rosout][INFO] 2024-09-04 18:57:36,324: Heart beat [Hz]: 0.02
[rosout][INFO] 2024-09-04 18:57:36,333: Active request after [sec]: 60
[rosout][INFO] 2024-09-04 18:57:36,335: Remove after [sec]: 300
[rosout][INFO] 2024-09-04 18:57:36,339: Robot hosts: ['RNS2', 'RNS1', 'RNS5', 'RNS4', 'RNS3']
[rosout][INFO] 2024-09-04 18:57:36,340: Approx. mininum avg. network load: 8.16 bytes/s
[rosout][INFO] 2024-09-04 18:57:36,348: Start RPC-XML Server at ('0.0.0.0', 11611)
[rosout][INFO] 2024-09-04 18:57:36,356: hide_nodes: []
[rosout][INFO] 2024-09-04 18:57:36,363: hide_topics: []
[rosout][INFO] 2024-09-04 18:57:36,368: hide_services: []
[rosout][INFO] 2024-09-04 18:57:36,369: Subscribe to parameter `/roslaunch/uris`
[rosout][INFO] 2024-09-04 18:57:36,524: Detected master discovery: http://10.2.23.10:11611
[rospy.internal][INFO] 2024-09-04 18:57:36,616: topic[/rosout] adding connection to [/rosout], count 0
[rosout][INFO] 2024-09-04 18:57:36,632: Added master with ROS_MASTER_URI=http://FMS:11311/
[rospy.internal][INFO] 2024-09-04 18:57:38,475: topic[/clock] adding connection to [http://FMS:36921/], count 0
[rosout][INFO] 2024-09-04 18:58:27,408: Detected master discovery: http://10.0.0.102:11611
[rosout][INFO] 2024-09-04 18:58:27,527: Added master with ROS_MASTER_URI=http://RNS2:11311/
[rosout][INFO] 2024-09-04 19:08:27,994: Set host to offline: rns2
[rosout][INFO] 2024-09-04 19:12:23,087: Remove master discovery: http://10.0.0.102:11611
[rospy.core][INFO] 2024-09-04 19:21:51,588: signal_shutdown [signal-2]
[rosout][INFO] 2024-09-04 19:21:51,589: Unsubscribe from parameter `/roslaunch/uris`
[rosout][INFO] 2024-09-04 19:21:51,594: shutdown own RPC server
[rospy.internal][INFO] 2024-09-04 19:21:51,702: topic[/clock] removing connection to http://FMS:36921/
[rospy.internal][INFO] 2024-09-04 19:21:51,979: topic[/rosout] removing connection to /rosout
[rospy.impl.masterslave][INFO] 2024-09-04 19:21:51,979: signal-2
[rospy.core][INFO] 2024-09-04 19:21:52,774: signal_shutdown [atexit]

here is the log of master_discovery and master_sync from RNS5
master_sync-3.log

[rospy.client][INFO] 2024-09-04 19:09:35,674: init_node, name[/master_sync], pid[389728]
[xmlrpc][INFO] 2024-09-04 19:09:35,674: XML-RPC server binding to 0.0.0.0:0
[xmlrpc][INFO] 2024-09-04 19:09:35,675: Started XML-RPC server [http://RNS5:33119/]
[rospy.impl.masterslave][INFO] 2024-09-04 19:09:35,675: _ready: http://RNS5:33119/
[rospy.init][INFO] 2024-09-04 19:09:35,675: ROS Slave URI: [http://RNS5:33119/]
[rospy.registration][INFO] 2024-09-04 19:09:35,675: Registering with master node http://localhost:11311
[xmlrpc][INFO] 2024-09-04 19:09:35,675: xml rpc node: starting XML-RPC server
[rospy.init][INFO] 2024-09-04 19:09:35,775: registered with master
[rospy.rosout][INFO] 2024-09-04 19:09:35,776: initializing /rosout core topic
[rospy.rosout][INFO] 2024-09-04 19:09:35,777: connected to core topic /rosout
[rospy.simtime][INFO] 2024-09-04 19:09:35,777: initializing /clock core topic
[rospy.simtime][INFO] 2024-09-04 19:09:35,779: connected to core topic /clock
[rosout][INFO] 2024-09-04 19:09:35,789: ignore_hosts: ['RNS3', 'RNS2', 'RNS1', 'RNS4']
[rosout][INFO] 2024-09-04 19:09:35,791: sync_hosts: ['FMS']
[rosout][INFO] 2024-09-04 19:09:35,793: sync_topics_on_demand: False
[rosout][INFO] 2024-09-04 19:09:35,795: resync_on_reconnect: True
[rosout][INFO] 2024-09-04 19:09:35,797: resync_on_reconnect_timeout: 0
[rosout][WARNING] 2024-09-04 19:09:35,804: master_discovery node appear not to running @rns5, only found on []. Wait for topic with type 'MasterState' @rns5.
[rospy.internal][INFO] 2024-09-04 19:09:36,147: topic[/rosout] adding connection to [/rosout], count 0
[rosout][INFO] 2024-09-04 19:09:36,812: listen for updates on /master_discovery/changes
[rospy.internal][INFO] 2024-09-04 19:09:36,819: topic[/master_discovery/changes] adding connection to [http://RNS5:32953/], count 0
[rosout][INFO] 2024-09-04 19:09:36,827: ROS masters obtained from '/master_discovery/list_masters': ['rns5']
[rosout][INFO] 2024-09-04 19:10:06,841: ROS masters obtained from '/master_discovery/list_masters': ['rns5']
[rosout][INFO] 2024-09-04 19:10:07,546: [fms] ignore_nodes: ['/master_discovery', '/node_manager', '/master_sync', '/rosout', '/node_manager_daemon', '/zeroconf', '/param_sync']
[rosout][INFO] 2024-09-04 19:10:07,551: [fms] sync_nodes: []
[rosout][INFO] 2024-09-04 19:10:07,555: [fms] ignore_topics: ['/master_discovery/*', '/master_sync/*', '/rosout', '/zeroconf/*', '/rosout_agg']
[rosout][INFO] 2024-09-04 19:10:07,559: [fms] sync_topics: []
[rosout][INFO] 2024-09-04 19:10:07,563: [fms] ignore_services: ['/master_discovery/*', '/*get_loggers', '/master_sync/*', '/*set_logger_level', '/zeroconf/*', '/node_manager_daemon/*']
[rosout][INFO] 2024-09-04 19:10:07,567: [fms] sync_services: []
[rosout][INFO] 2024-09-04 19:10:07,570: [fms] ignore_type: ['fkie_multimaster_msgs/MasterState', 'fkie_multimaster_msgs/SyncServiceInfo', 'bond/Status', 'fkie_multimaster_msgs/SyncMasterInfo', 'fkie_multimaster_msgs/SyncTopicInfo']
[rosout][INFO] 2024-09-04 19:10:07,574: [fms] ignore_publishers: []
[rosout][INFO] 2024-09-04 19:10:07,578: [fms] ignore_subscribers: []
[rosout][INFO] 2024-09-04 19:10:08,893: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:08,912: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:10:08,931: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rospy.internal][INFO] 2024-09-04 19:10:08,951: topic[/clock] adding connection to [http://FMS:36921/], count 0
[rosout][INFO] 2024-09-04 19:10:08,970: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:10:10,928: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:10,945: SyncThread[fms] Applying remote state...
[rosout][WARNING] 2024-09-04 19:10:10,948: Different topic types detected for topic: /rns5/scan, own type: None remote type: sensor_msgs/LaserScan, local host: localhost, remote host: fms
[rosout][WARNING] 2024-09-04 19:10:10,950: Different topic types detected for topic: /rns5/odom, own type: None remote type: nav_msgs/Odometry, local host: localhost, remote host: fms
[rosout][WARNING] 2024-09-04 19:10:10,952: Different topic types detected for topic: /rns5/imu, own type: None remote type: sensor_msgs/Imu, local host: localhost, remote host: fms
[rosout][INFO] 2024-09-04 19:10:10,967: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:10,974: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:10:13,218: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:13,238: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:10:13,248: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:13,255: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:10:29,689: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:29,706: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:10:29,719: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:29,725: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:10:36,888: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:10:51,171: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:51,191: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:10:51,205: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:51,220: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:10:53,461: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:53,507: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:10:53,542: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:10:53,580: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:10:57,721: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:11:04,898: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:11:04,907: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:11:04,922: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:11:06,921: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:11:36,957: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:12:06,994: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:12:37,019: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:13:07,039: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:13:37,062: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:14:07,089: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:14:37,109: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:15:07,127: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:15:37,157: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:15:50,735: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:15:50,758: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:15:50,762: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:15:50,768: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:15:52,202: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:15:52,217: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:15:52,224: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:15:52,231: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:15:56,177: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:15:56,198: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:15:56,204: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:15:56,210: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:16:03,956: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:16:03,971: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:16:03,975: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:16:03,981: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:16:05,802: SyncThread[fms] Requesting remote state from 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:16:05,814: SyncThread[fms] Applying remote state...
[rosout][INFO] 2024-09-04 19:16:05,818: SyncThread[fms] Requesting remote md5sums 'http://10.10.100.23:11611'
[rosout][INFO] 2024-09-04 19:16:05,824: SyncThread[fms] remote state applied.
[rosout][INFO] 2024-09-04 19:16:07,168: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rosout][INFO] 2024-09-04 19:16:37,180: ROS masters obtained from '/master_discovery/list_masters': ['rns5', 'fms']
[rospy.core][INFO] 2024-09-04 19:16:37,371: signal_shutdown [signal-2]
[rosout][INFO] 2024-09-04 19:16:37,371: Stop synchronization...
[rosout][INFO] 2024-09-04 19:16:37,373:   Stop timers...
[rosout][INFO] 2024-09-04 19:16:37,374:   Unregister from master discovery...
[rospy.internal][INFO] 2024-09-04 19:16:37,376: topic impl's ref count is zero, deleting topic /master_discovery/changes...
[rospy.internal][INFO] 2024-09-04 19:16:37,376: topic[/master_discovery/changes] removing connection to http://RNS5:32953/
[rosout][INFO] 2024-09-04 19:16:37,378:   Remove master: fms
[rosout][INFO] 2024-09-04 19:16:37,379:   Stop synchronization to `fms`
[rospy.internal][INFO] 2024-09-04 19:16:37,393: topic[/rosout] removing connection to /rosout
[rospy.internal][INFO] 2024-09-04 19:16:37,393: topic[/clock] removing connection to http://FMS:36921/
[rospy.internal][INFO] 2024-09-04 19:16:37,393: topic[/clock] removing connection to http://FMS:36921/
[rospy.impl.masterslave][INFO] 2024-09-04 19:16:37,394: signal-2
[rosout][INFO] 2024-09-04 19:16:37,408:   Finished synchronization to `fms`
[rospy.core][INFO] 2024-09-04 19:16:37,984: signal_shutdown [atexit]

master_discovery-2.log

[rospy.client][INFO] 2024-09-04 19:09:35,675: init_node, name[/master_discovery], pid[389727]
[xmlrpc][INFO] 2024-09-04 19:09:35,675: XML-RPC server binding to 0.0.0.0:0
[xmlrpc][INFO] 2024-09-04 19:09:35,675: Started XML-RPC server [http://RNS5:32953/]
[rospy.impl.masterslave][INFO] 2024-09-04 19:09:35,675: _ready: http://RNS5:32953/
[rospy.init][INFO] 2024-09-04 19:09:35,676: ROS Slave URI: [http://RNS5:32953/]
[rospy.registration][INFO] 2024-09-04 19:09:35,676: Registering with master node http://localhost:11311
[xmlrpc][INFO] 2024-09-04 19:09:35,676: xml rpc node: starting XML-RPC server
[rospy.init][INFO] 2024-09-04 19:09:35,776: registered with master
[rospy.rosout][INFO] 2024-09-04 19:09:35,776: initializing /rosout core topic
[rospy.rosout][INFO] 2024-09-04 19:09:35,778: connected to core topic /rosout
[rospy.simtime][INFO] 2024-09-04 19:09:35,778: initializing /clock core topic
[rospy.simtime][INFO] 2024-09-04 19:09:35,780: connected to core topic /clock
[rosout][INFO] 2024-09-04 19:09:35,795: Check the ROS Master[Hz]: 1
[rosout][INFO] 2024-09-04 19:09:35,796: Heart beat [Hz]: 0.02
[rosout][INFO] 2024-09-04 19:09:35,797: Active request after [sec]: 60
[rosout][INFO] 2024-09-04 19:09:35,798: Remove after [sec]: 300
[rosout][INFO] 2024-09-04 19:09:35,800: Robot hosts: []
[rosout][INFO] 2024-09-04 19:09:35,802: Approx. mininum avg. network load: 1.36 bytes/s
[rosout][INFO] 2024-09-04 19:09:35,811: Start RPC-XML Server at ('0.0.0.0', 11611)
[rosout][INFO] 2024-09-04 19:09:35,814: hide_nodes: []
[rosout][INFO] 2024-09-04 19:09:35,816: hide_topics: []
[rosout][INFO] 2024-09-04 19:09:35,819: hide_services: []
[rosout][INFO] 2024-09-04 19:09:35,820: Subscribe to parameter `/roslaunch/uris`
[rosout][INFO] 2024-09-04 19:09:35,962: Detected master discovery: http://10.0.0.105:11611
[rosout][INFO] 2024-09-04 19:09:36,066: Added master with ROS_MASTER_URI=http://RNS5:11311/
[rospy.internal][INFO] 2024-09-04 19:09:36,148: topic[/rosout] adding connection to [/rosout], count 0
[rospy.internal][INFO] 2024-09-04 19:09:36,821: topic[/master_discovery/changes] adding connection to [/master_sync], count 0
[rosout][INFO] 2024-09-04 19:10:07,428: Detected master discovery: http://10.10.100.23:11611
[rosout][INFO] 2024-09-04 19:10:07,539: Added master with ROS_MASTER_URI=http://FMS:11311/
[rospy.internal][INFO] 2024-09-04 19:10:08,951: topic[/clock] adding connection to [http://FMS:36921/], count 0
[rosout][WARNING] 2024-09-04 19:10:25,218: MasterMonitor[/velocity_smoother]: can't update contact information. ROS master responds with: unknown node [/velocity_smoother]
[rospy.core][INFO] 2024-09-04 19:16:37,566: signal_shutdown [signal-2]
[rosout][INFO] 2024-09-04 19:16:37,567: Unsubscribe from parameter `/roslaunch/uris`
[rosout][INFO] 2024-09-04 19:16:37,568: shutdown own RPC server
[rospy.internal][INFO] 2024-09-04 19:16:37,615: topic[/master_discovery/changes] removing connection to /master_sync
[rospy.internal][INFO] 2024-09-04 19:16:38,173: topic[/rosout] removing connection to /rosout
[rospy.internal][INFO] 2024-09-04 19:16:38,174: topic[/clock] removing connection to http://FMS:36921/
[rospy.impl.masterslave][INFO] 2024-09-04 19:16:38,175: signal-2
[rospy.core][INFO] 2024-09-04 19:16:38,964: signal_shutdown [atexit]

After I stop the navigation stack, but master_sync still up, the publications and services is still empty

@KhairulM
Copy link
Author

KhairulM commented Sep 4, 2024

And for additional information, the Unregistering /gazebo log was only printed in our navigation stack log, but not on the master_discovery or master_sync log

@atiderko
Copy link
Member

atiderko commented Sep 4, 2024

Two things that could be checked:

  1. The master_discovery log on the FMS says that the rns2 is no longer reporting and is therefore removed from the list.
[rosout][INFO] 2024-09-04 19:08:27,994: Set host to offline: rns2
[rosout][INFO] 2024-09-04 19:12:23,087: Remove master discovery: http://10.0.0.102:11611

This may also be the case on rns2. Maybe you can add this parameter to all master_discovery:
<rosparam param="heartbeat_hz">0.5</rosparam>
And also comment in the robot_hosts parameter again. This means that everyone proactively sends heartbeats as a unicast message. If the problem is caused by multicast, this would solve the problem.

  1. The nodes register with master node, which can be reached via localhost.
[rospy.registration][INFO] 2024-09-04 19:09:35,676: Registering with master node http://localhost:11311

The ROS wiki notes that localhost in combination with remote nodes can lead to problems (https://wiki.ros.org/ROS/EnvironmentVariables#ROS_MASTER_URI
Normally ROS resolves localhost to hostname, but this does not seem to be the case for you.
Although I don't think it will solve the original problem, it may lead to problems elsewhere that you don't have now.

@KhairulM
Copy link
Author

KhairulM commented Sep 6, 2024

I've added the heartbeat_hz param to all master discovery nodes and the robot_hosts param to all robot's master discovery, but I still get the same log Unregistering /gazebo when I run my navigation stack. When I check the /gazebo node info, the publications and services are still empty
image

@KhairulM
Copy link
Author

KhairulM commented Sep 6, 2024

After discussing with our team, this log was not printed by our code, so I guess it's from the ROS system itself

@atiderko
Copy link
Member

atiderko commented Sep 9, 2024

Have you tried to start gazebo and your navigation stack in the same launch file?

Otherwise you could subscribe to the topics /master_discovery/linkstats and /master_discovery/changes on the RNS3 and see if you can find any hints there.
I assume that all docker containers are running on the same PC or are connected to each other via ethernet, then the quality of /master_discovery/linkstats should be 100. At /master_discovery/changes and also the service /master_discovery/list_masters all masters should be online.

You can also see if the same thing happens with other nodes. Start a publisher on the FSM, e.g. rostopic pub -r 1 /test std_msgs/Bool “{data: true}” and on the RNS3 the subscriber rostopic echo /test.
If this also stops working after a certain time, something is wrong with the communication between FSM and RNS3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants