You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an issue where calling SwitchController DBus method on bluechi-agent fails with a timeout when bluechi-agent never connected to bluechi-controller. However, if bluechi-agent has ever connected to bluechi-controller and retries to connect to bluechi-controller, calling SwitchController DBus method on bluechi-agent succeeds.
To Reproduce
Prerequisite
host 192.168.16.101 exists on local network and is running bluechi-controller.
host 192.168.16.111 does not exist on local network.
root@42dot-ak7:~# journalctl -f -u bluechi-agent
Sep 20 15:02:53 42dot-ak7 systemd[1]: Started BlueChi systemd service controller agent daemon.
Sep 20 15:02:53 42dot-ak7 bluechi-agent[3062]: Starting bluechi-agent 0.8.0-1
Sep 20 15:02:53 42dot-ak7 bluechi-agent[3062]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:02:56 42dot-ak7 bluechi-agent[3062]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:02:56 42dot-ak7 bluechi-agent[3062]: Initial controller connection failed, retrying
Sep 20 15:02:56 42dot-ak7 bluechi-agent[3062]: Trying to connect to controller (try 1)
Sep 20 15:02:56 42dot-ak7 bluechi-agent[3062]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:02:59 42dot-ak7 bluechi-agent[3062]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:02:59 42dot-ak7 bluechi-agent[3062]: Trying to connect to controller (try 2)
Sep 20 15:02:59 42dot-ak7 bluechi-agent[3062]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:03:02 42dot-ak7 bluechi-agent[3062]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:03:02 42dot-ak7 bluechi-agent[3062]: Trying to connect to controller (try 3)
Sep 20 15:03:02 42dot-ak7 bluechi-agent[3062]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:03:05 42dot-ak7 bluechi-agent[3062]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
...
call SwitchController DBus method with host 192.168.16.101
root@42dot-ak7:~# dbus-send --system --dest=org.eclipse.bluechi.Agent --print-reply --type=method_call /org/eclipse/bluechi org.eclipse.bluechi.Agent.SwitchController string:'tcp:host=192.168.16.101,port=842'
Error org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
busctl introspect command also is failed
root@42dot-ak7:~# busctl introspect org.eclipse.bluechi.Agent /org/eclipse/bluechi
Failed to introspect object /org/eclipse/bluechi of service org.eclipse.bluechi.Agent: Connection timed out
Success case
start bluechi-agent with host 192.168.16.101
call SwitchController DBus method with host 192.168.16.111
call SwitchController DBus method with host 192.168.16.101
root@42dot-ak7:~# journalctl -f -u bluechi-agent
Sep 20 15:04:34 42dot-ak7 systemd[1]: Started BlueChi systemd service controller agent daemon.
Sep 20 15:04:34 42dot-ak7 bluechi-agent[3347]: Starting bluechi-agent 0.8.0-1
Sep 20 15:04:34 42dot-ak7 bluechi-agent[3347]: Connecting to controller on tcp:host=192.168.16.101,port=842
Sep 20 15:04:34 42dot-ak7 bluechi-agent[3347]: Connected to controller as 'ak7_master_main'
Sep 20 15:04:54 42dot-ak7 bluechi-agent[3347]: CONTROLLER ADDRESS changed to tcp:host=192.168.16.111,port=842
Sep 20 15:04:54 42dot-ak7 bluechi-agent[3347]: Disconnected from controller
Sep 20 15:04:54 42dot-ak7 bluechi-agent[3347]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:04:57 42dot-ak7 bluechi-agent[3347]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:04:57 42dot-ak7 bluechi-agent[3347]: Trying to connect to controller (try 1)
Sep 20 15:04:57 42dot-ak7 bluechi-agent[3347]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:05:01 42dot-ak7 bluechi-agent[3347]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:05:01 42dot-ak7 bluechi-agent[3347]: Trying to connect to controller (try 2)
Sep 20 15:05:01 42dot-ak7 bluechi-agent[3347]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:05:04 42dot-ak7 bluechi-agent[3347]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:05:04 42dot-ak7 bluechi-agent[3347]: Trying to connect to controller (try 3)
Sep 20 15:05:04 42dot-ak7 bluechi-agent[3347]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:05:07 42dot-ak7 bluechi-agent[3347]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:05:07 42dot-ak7 bluechi-agent[3347]: Trying to connect to controller (try 4)
Sep 20 15:05:07 42dot-ak7 bluechi-agent[3347]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:05:10 42dot-ak7 bluechi-agent[3347]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:05:10 42dot-ak7 bluechi-agent[3347]: Trying to connect to controller (try 5)
Sep 20 15:05:10 42dot-ak7 bluechi-agent[3347]: Connecting to controller on tcp:host=192.168.16.111,port=842
Sep 20 15:05:13 42dot-ak7 bluechi-agent[3347]: Registering as 'ak7_master_main' failed: Transport endpoint is not connected
Sep 20 15:05:13 42dot-ak7 bluechi-agent[3347]: CONTROLLER ADDRESS changed to tcp:host=192.168.16.101,port=842
Sep 20 15:05:13 42dot-ak7 bluechi-agent[3347]: Disconnected from controller
Sep 20 15:05:13 42dot-ak7 bluechi-agent[3347]: Connecting to controller on tcp:host=192.168.16.101,port=842
Sep 20 15:05:13 42dot-ak7 bluechi-agent[3347]: Connected to controller as 'ak7_master_main'
Expected behavior
When bluechi-agent never connected to bluechi-controller, if call SwitchController DBus method with host 192.168.16.101, bluechi-agent must connect to bluechi-controller of host 192.168.16.101.
I was able to reproduce the bug.
One thing that I have noticed, is it behaves this way only when host 192.168.16.111 does not exist on local network. When it does exist, but not running bluechi-controller - there is no problem, and the bluechi-agent's DBus API works as expected.
The reason for that is that the reconnect mechanism seats on the heartbeat timer, and as long as agent->connection_state == AGENT_CONNECTION_STATE_RETRY - it will try to reconnect on each heartbeat. The heartbeat timer is set by default to 2000ms.
On the other hand, when the host to which the agent is trying to connect to does not exist on the network, the Register method call fails by timeout, which also takes around 2 seconds or a bit more. As a result, the event_loop is always busy with trying to reconnect, and does not handle any incoming API calls. When a host does exist, but not listening on port 842 - the connection is refused immediately, so no problem occurs.
There are a couple of ways to mitigate this:
The easiest way to handle it on the user level is to increase HeartbeatInterval config value. Setting it to be 5000 for example fixed the issue, but I guess lower values of 4000 or even 3000 will do the trick.
Decreasing the priority for the heartbeat timer event also fixes the issue.
I have set priority to be 50 and it made the SwitchController request work right away. For introspect though it took some time to return. mkemel@55e1457
I didn't find how to increase API calls' priority instead, though that would also work if possible. I wonder if decreasing the heartbeat timer priority won't have an impact when we are having a lot of requests or something like that.
We can hardcode something in agent_heartbeat_timer_callback to mitigate that specific issue.
Describe the bug
I have an issue where calling SwitchController DBus method on bluechi-agent fails with a timeout when bluechi-agent never connected to bluechi-controller. However, if bluechi-agent has ever connected to bluechi-controller and retries to connect to bluechi-controller, calling SwitchController DBus method on bluechi-agent succeeds.
To Reproduce
Prerequisite
Failure case
busctl introspect
command also is failedSuccess case
Expected behavior
When bluechi-agent never connected to bluechi-controller, if call SwitchController DBus method with host 192.168.16.101, bluechi-agent must connect to bluechi-controller of host 192.168.16.101.
The text was updated successfully, but these errors were encountered: