Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User module unable to connect to Hub with 'Communication_error' #7293

Closed
chatnoir1982 opened this issue May 22, 2024 · 1 comment
Closed

Comments

@chatnoir1982
Copy link

chatnoir1982 commented May 22, 2024

We have a custom IoTEdge module that out of the blue has started experiencing loss of connectivity to the IoTHub, with ConnectionStatus.Disconnected+ConnectionStatusChangeReason.Retry_Expired and
ConnectionStatus.Disconnected+ConnectionStatusChangeReason.Communication_Error.
The loss of connection for custom modules is continuous, there are no intermittent connections to the hub.
The IoTEdgeMetricsCollector we use is still online and working properly.
This happens on only one gateway device, other devices are healthy.

One thing that stands out is that the EdgeHub module twin of the unhealthy device does not list the custom module in its reported properties' "clients" section, which seems to be the related to the outage. Only the IoTEdgeMetricsCollector module is listed, which is functioning normally.

}, "clients": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z", "deviceId/IoTEdgeMetricsCollector": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z", "status": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z" }, "lastConnectedTimeUtc": { "$lastUpdated": "2024-05-22T06:41:40.9107753Z" } } }

From the EdgeHub module logs we see the following lines (our custom modules named "MyGatewayModule" and "MyHealthCheckModule":
image

There is also an exception of interest in the EdgeAgent logs "Newtonsoft.Json.JsonSerializationException: Required property 'message' not found in JSON. Path '', line 1, position 1533.". I suspect however that this is unrelated to the issue. See logs below.

Context

  1. IoTEdge version 1.4.27, EdgeAgent and EdgeHub both at 1.4.35.
  2. We are using AMQP-WS to connect our IoTEdge device to IoTHub.
  3. The device running the module in question is using a 4G connection to connect to the outside world; signal strength, latency and bandwidth are healthy, no DNS issue
  4. Removing and re-adding the custom module to the device from the Azure portal, and deployed it through layered deployment as well, but it didn't fix the issue
  5. The "iotedge check" command returns all green with a couple warnings on package versions, no errors.
  6. Rebooting the gateway, restarting iotedge modules, running "iotedge system restart" does not seem to help, edgeHub is not picking up the custom modules.
  7. Deploying a new module on the device e.g. SimulatedTemperatureSensor correctly registers the module with EdgeHub's clients and the new module functions correctly

Has anyone experienced this issue before?

Thanks

Expected Behavior

  • EdgeHub registers all indicated user modules in its "clients" property.
  • Messages from user modules to the outside world are routed normally to the IoThub
    • Messages between user modules on the device are correctly routed

Current Behavior

Our custom module is experiencing loss of connectivity to the IoTHub, with ConnectionStatus.Disconnected+ConnectionStatusChangeReason.Retry_Expired and
ConnectionStatus.Disconnected+ConnectionStatusChangeReason.Communication_Error.

Steps to Reproduce

Not sure if/how this can be reproduced. I suspect accidental edits in edgeHub module twin (via Azure IoTHub explorer tool). What we're looking for is some hints to allow us to quickly recover the device.

Context (Environment)

  1. IoTEdge version 1.4.27, EdgeAgent and EdgeHub both at 1.4.35.
  2. We are using AMQP-WS to connect our IoTEdge device to IoTHub.
  3. The device running the module in question is using a 4G connection to connect to the outside world; signal strength, latency and bandwidth are healthy, no DNS issue
  4. Removing and re-adding the custom module to the device from the Azure portal, and deployed it through layered deployment as well, but it didn't fix the issue
  5. The "iotedge check" command returns all green with a couple warnings on package versions, no errors.
  6. Rebooting the gateway, restarting iotedge modules, running "iotedge system restart" does not seem to help.
  7. Reprovisioned device with "iotedge system reprovision"

Output of iotedge check

Click here
Configuration checks (aziot-identity-service)
---------------------------------------------
√ keyd configuration is well-formed - OK
√ certd configuration is well-formed - OK
√ tpmd configuration is well-formed - OK
√ identityd configuration is well-formed - OK
√ daemon configurations up-to-date with config.toml - OK
√ identityd config toml file specifies a valid hostname - OK
‼ aziot-identity-service package is up-to-date - Warning
    Installed aziot-identity-service package has version 1.4.7 but 1.4.8 is the latest stable version available.
    Please see https://aka.ms/aziot-update-runtime for update instructions.
        caused by: Installed aziot-identity-service package has version 1.4.7 but 1.4.8 is the latest stable version available.
                   Please see https://aka.ms/aziot-update-runtime for update instructions.
√ host time is close to reference time - OK
√ preloaded certificates are valid - OK
√ keyd is running - OK
√ certd is running - OK
√ tpmd is running - OK
√ identityd is running - OK
√ read all preloaded certificates from the Certificates Service - OK
√ read all preloaded key pairs from the Keys Service - OK
√ check all EST server URLs utilize HTTPS - OK
√ ensure all preloaded certificates match preloaded private keys with the same ID - OK

Connectivity checks (aziot-identity-service)
--------------------------------------------
√ host can connect to and perform TLS handshake with iothub AMQP port - OK
√ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - OK
√ host can connect to and perform TLS handshake with iothub MQTT port - OK
√ host can connect to and perform TLS handshake with DPS endpoint - OK

Configuration checks
--------------------
√ aziot-edged configuration is well-formed - OK
√ configuration up-to-date with config.toml - OK
√ container engine is installed and functional - OK
√ configuration has correct URIs for daemon mgmt endpoint - OK
‼ aziot-edge package is up-to-date - Warning
    Installed IoT Edge daemon has version 1.4.27 but 1.4.33 is the latest stable version available.
    Please see https://aka.ms/iotedge-update-runtime for update instructions.
        caused by: Installed IoT Edge daemon has version 1.4.27 but 1.4.33 is the latest stable version available.
                   Please see https://aka.ms/iotedge-update-runtime for update instructions.
√ container time is close to host time - OK
√ DNS server - OK
√ production readiness: logs policy - OK
√ production readiness: Edge Agent's storage directory is persisted on the host filesystem - OK
√ production readiness: Edge Hub's storage directory is persisted on the host filesystem - OK
√ Agent image is valid and can be pulled from upstream - OK
√ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK

Connectivity checks
-------------------
√ container on the default network can connect to upstream AMQP port - OK
√ container on the default network can connect to upstream HTTPS / WebSockets port - OK
√ container on the default network can connect to upstream MQTT port - OK
√ container on the IoT Edge module network can connect to upstream AMQP port - OK
√ container on the IoT Edge module network can connect to upstream HTTPS / WebSockets port - OK
√ container on the IoT Edge module network can connect to upstream MQTT port - OK
37 check(s) succeeded.
2 check(s) raised warnings.

Device Information

  • Host OS : Debian 11
  • Architecture: Arm64
  • Container OS: Linux containers

Runtime Versions

  • aziot-edged [run iotedge version]: 1.4.27
  • Edge Agent [image tag (e.g. 1.0.0)]: 1.4.35
  • Edge Hub [image tag (e.g. 1.0.0)]: 1.4.35
  • Docker/Moby [run docker version]: 24.0.9-1

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

aziot-edged logs

edge-agent logs

<6> 2024-05-22 06:20:21.560 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Agent.Core.LoggingCommandFactory] - Executing command: "Command Group: (\n  [Stop module edgeHub]\n  [Update module edgeHub]\n)"
<7> 2024-05-22 06:20:21.572 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Agent.Edgelet.ModuleManagementHttpClient] - Making a Http call to unix:///var/run/iotedge/mgmt.sock to stop module edgeHub
<7> 2024-05-22 06:20:21.635 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/mgmt.sock
<7> 2024-05-22 06:20:21.636 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/mgmt.sock
<7> 2024-05-22 06:20:21.638 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://mgmt.sock/modules/edgeHub/stop?api-version=2022-08-03
<7> 2024-05-22 06:20:22.737 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received NoContent
<7> 2024-05-22 06:20:22.751 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Agent.Edgelet.ModuleManagementHttpClient] - Received a valid Http response from unix:///var/run/iotedge/mgmt.sock for stop module edgeHub
<7> 2024-05-22 06:20:22.775 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Agent.Edgelet.ModuleManagementHttpClient] - Making a Http call to unix:///var/run/iotedge/mgmt.sock to update module edgeHub
<7> 2024-05-22 06:20:22.868 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connecting socket /var/run/iotedge/mgmt.sock
<7> 2024-05-22 06:20:22.870 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Connected socket /var/run/iotedge/mgmt.sock
<7> 2024-05-22 06:20:22.870 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Sending request http://mgmt.sock/modules/edgeHub?api-version=2022-08-03
<7> 2024-05-22 06:20:24.268 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Util.Uds.HttpUdsMessageHandler] - Response received Created
<7> 2024-05-22 06:20:24.338 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Agent.Edgelet.ModuleManagementHttpClient] - Error when getting an Http response from unix:///var/run/iotedge/mgmt.sock for update module edgeHub
HTTP Response: 



Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2022_08_03.GeneratedCode.SwaggerException: Could not deserialize the response body stream as Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2022_08_03.GeneratedCode.ErrorResponse.

Status: 201
Response: 

 ---> Newtonsoft.Json.JsonSerializationException: Required property 'message' not found in JSON. Path '', line 1, position 1533.
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.EndProcessProperty(Object newObject, JsonReader reader, JsonObjectContract contract, Int32 initialDepth, JsonProperty property, PropertyPresence presence, Boolean setDefaultValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
   at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonSerializer.Deserialize(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonSerializer.Deserialize[T](JsonReader reader)
   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2022_08_03.GeneratedCode.EdgeletHttpClient.ReadObjectResponseAsync[T](HttpResponseMessage response, IReadOnlyDictionary`2 headers, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2022_08_03.GeneratedCode.EdgeletHttpClient.ReadObjectResponseAsync[T](HttpResponseMessage response, IReadOnlyDictionary`2 headers, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/version_2022_08_03/generatedCode/EdgeletHttpClient.cs:line 1938
   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2022_08_03.GeneratedCode.EdgeletHttpClient.UpdateModuleAsync(String api_version, String name, Nullable`1 start, ModuleSpec module, CancellationToken cancellationToken) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/version_2022_08_03/generatedCode/EdgeletHttpClient.cs:line 458
   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Versioning.ModuleManagementHttpClientVersioned.Execute[T](Func`1 func, String operation) in /mnt/vss/_work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/versioning/ModuleManagementHttpClientVersioned.cs:line 152
<7> 2024-05-22 06:20:24.472 +00:00 [DBG] [Microsoft.Azure.Devices.Edge.Agent.Core.LoggingCommandFactory] - Executing command for operation ["update"] succeeded.
<6> 2024-05-22 06:20:24.473 +00:00 [INF] [Microsoft.Azure.Devices.Edge.Agent.Core.LoggingCommandFactory] - Executing command: "Start module edgeHub"

edge-hub logs
[edgeHub_output (2).txt](https://github.com/Azure/iotedge/files/15414591/edgeHub_output.2.txt)

Additional Information

Please provide any additional information that may be helpful in understanding the issue.

@chatnoir1982
Copy link
Author

Solved. The issue was caused due to erroneously deleting the base deployment that puts the system modules on the device, then following up with layered/manual deployments. The edgeAgent and edgeHub modules were working, but would not see any newly deployed custom modules. Strangely enough, edgeHub would happily register other non-custom modules from the masrketplace, like Simulated Temperature Sensor. Restoring the base deployment brought everything back to working order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant