-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mosquitto bridge connection deadlock waiting for message from Cumulocity IoT #3141
Comments
Current theoryAfter some research, the following mosquitto issue #1214 was found which says:
The description seems to match the symptoms and the collected evidence. Likely root cause
|
FYI: The mosquitto message states are visible in the source code: lib/mosquitto_internal.h#L82 enum mosquitto_msg_direction {
mosq_md_in = 0,
mosq_md_out = 1
};
enum mosquitto_msg_state {
mosq_ms_invalid = 0,
mosq_ms_publish_qos0 = 1,
mosq_ms_publish_qos1 = 2,
mosq_ms_wait_for_puback = 3,
mosq_ms_publish_qos2 = 4,
mosq_ms_wait_for_pubrec = 5,
mosq_ms_resend_pubrel = 6,
mosq_ms_wait_for_pubrel = 7,
mosq_ms_resend_pubcomp = 8,
mosq_ms_wait_for_pubcomp = 9,
mosq_ms_send_pubrec = 10,
mosq_ms_queued = 11
};
enum mosquitto_client_state {
mosq_cs_new = 0,
mosq_cs_connected = 1,
mosq_cs_disconnecting = 2,
mosq_cs_active = 3,
mosq_cs_connect_pending = 4,
mosq_cs_connect_srv = 5,
mosq_cs_disconnect_ws = 6,
mosq_cs_disconnected = 7,
mosq_cs_socks5_new = 8,
mosq_cs_socks5_start = 9,
mosq_cs_socks5_request = 10,
mosq_cs_socks5_reply = 11,
mosq_cs_socks5_auth_ok = 12,
mosq_cs_socks5_userpass_reply = 13,
mosq_cs_socks5_send_userpass = 14,
mosq_cs_expiring = 15,
mosq_cs_duplicate = 17, /* client that has been taken over by another with the same id */
mosq_cs_disconnect_with_will = 18,
mosq_cs_disused = 19, /* client that has been added to the disused list to be freed */
mosq_cs_authenticating = 20, /* Client has sent CONNECT but is still undergoing extended authentication */
mosq_cs_reauthenticating = 21, /* Client is undergoing reauthentication and shouldn't do anything else until complete */
}; |
Describe the bug
When using the mosquitto persistence (e.g. the
mosquitto.db
file), the mosquitto bridge function stops working and no messages are being received or published to the cloud.Symptoms
te/device/main/service/mosquitto-c8y-bridge/status/health
) toggles periodically between0
and1
c8y/s/uat
goes unanswered onc8y/s/dat
)Below shows the pattern where the bridge health is toggling and the requests by the mapper to get a new Cumulocity JWT.
$ tedge mqtt sub '#' [c8y/s/uat] [c8y/s/uat] [c8y/s/uat] [c8y/s/uat] [c8y/s/uat] [c8y/s/uat] [te/device/main/service/mosquitto-c8y-bridge/status/health] 0 [c8y/s/us] 102,rmi_cb001:device:main:service:mosquitto-c8y-bridge,service,mosquitto-c8y-bridge,down [te/device/main/service/mosquitto-c8y-bridge/status/health] 1 [c8y/s/us] 102,rmi_cb001:device:main:service:mosquitto-c8y-bridge,service,mosquitto-c8y-bridge,up [c8y/s/uat] [c8y/s/uat] [c8y/s/uat] [c8y/s/uat] [te/device/main/service/mosquitto-c8y-bridge/status/health] 0 [c8y/s/us] 102,rmi_cb001:device:main:service:mosquitto-c8y-bridge,service,mosquitto-c8y-bridge,down [te/device/main/service/mosquitto-c8y-bridge/status/health] 1 [c8y/s/us] 102,rmi_cb001:device:main:service:mosquitto-c8y-bridge,service,mosquitto-c8y-bridge,up [c8y/s/uat] [c8y/s/uat]
Workaround
The cloud connection
Disconnect tedge (which stops mosquitto and tedge-mapper-c8y)
Remove the mosquitto.db file
Connect tedge (which starts mosquitto and tedge-mapper-c8y)
To Reproduce
It is currently unknown how to reproduce the problem. It might be possible to use the attached mosquitto.db to reproduce the mosquitto bridge deadlock.
This procedure is not verified but you could try:
Stop mosquitto
Copy the mosquitto.db.tgz file, and decompress it to /var/lib/mosquitto/mosquitto.db
Change the permissions of the file
Enable the mosquitto persistence (assuming you haven't already this setting to mosquitto)
Start mosquitto
Monitor the local MQTT broker looking at the mosquitto health etc.
tedge mqtt sub '#'
Expected behavior
Screenshots
Environment (please complete the following information):
Alpine Linux v3.18
docker
Linux 9bba511280c7 6.8.0-39-generic #39-Ubuntu SMP PREEMPT_DYNAMIC Sat Jul 6 02:50:39 UTC 2024 aarch64 Linux
tedge 1.3.0
2.0.18
Additional context
Logs from mosquitto and tedge-mapper-c8y
Output from mosquitto_db_dump tool
Using the mosquitto_dump_db (built from https://github.com/eclipse/mosquitto/tree/master/apps/db_dump), the following shows the output:
The following one-liner looks at the MQTT client ID of the Cumulocity IoT bridge, and gathers some statistics on
The bridge queue (e.g. how many are outbound and inbound) can be calculated using:
The above shows that there is one inbound message (assuming that
Direction: 0
means inbound, and1
outbound). Below shows the meta information of the pending inbound message:The message states can also be aggregated to see how many message are in which state:
The text was updated successfully, but these errors were encountered: