Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EFR32 MGM21 stops responding #166

Open
afirix opened this issue Nov 13, 2023 · 20 comments
Open

EFR32 MGM21 stops responding #166

afirix opened this issue Nov 13, 2023 · 20 comments

Comments

@afirix
Copy link

afirix commented Nov 13, 2023

Hello. My EFR32 MGM21 poe device has recently started locking up and not responding every 10-12 hours. Sometimes I can ping it successfully, but attempting to telnet doesn't work as if the port is not being open. Sometimes even pings cannot go through. Disconnecting the device and plugging it back helps, but only for the next few hours and eventually it becomes unresponsive again.

It used to be rock solid for a few months, but a couple weeks ago this behavior started. I didn't do (knowingly at least) anything to the device that could cause this. Troubleshooting I've tried so far:

  1. Reflash over USB with ESPHome-flasher using the latest firmware from here. Flashing succeeded, but it didn't change anything regarding the erratic behavior.
  2. Update EFR32 to the latest stable firmware (7.2.3 I believe). Not sure if this is even related cause I thought that ethernet connectivity was handled by ESP32 itself.
  3. Tried switching between ZHA and Zigbee2Mqtt, no difference observed.
  4. Tried disabling both ZHA and Zigbee2Mqtt and leaving the device idling. That did increase the period of responsiveness, but eventually it locked up as well.

Not sure what else to try. The device in such shape is pretty much unusable. Open to any suggestions.

@tube0013
Copy link
Owner

Hi, have you had any network infrastructure changes?

You've done most of all the steps I would suggest doing, I have one other though, which is to try a FW build based on on the ESP_IDF platform, and see if it makes a difference.

tubeszb-efr32-mgm210-poe-2023.bin.zip

^ this is built for the 2023 version if you have the 2022 let me know and I can build one for that.

This will need to be flashed over usb as it has a different partition table then the original FW.

If this doesn't help, get in touch with me at [email protected] and I'll help figure something out.

thanks!

@afirix
Copy link
Author

afirix commented Nov 13, 2023

No network changes as far as I am aware. I'll double check firewall logs to see if nothing gets blocked there on the path to the device, though the fact that it works initially after hard reboot and then stops makes me doubtful that the firewall is involved.

I believe I have 2022 version. At least it shows up with the tubeszb-efr32-mgm210-poe-2022 name in EspHome. If you can build it, I'll try it promptly. Thanks!

@tube0013
Copy link
Owner

Here is is compiled for 2022 version (different pins were used in 2023)

tubeszb-efr32-mgm210-poe-2022.bin.zip

@afirix
Copy link
Author

afirix commented Nov 14, 2023

Thanks. I flashed it and the device seems to be responsive to pings and telnets so far, however, I can no longer connect to it with either ZHA or Zigbee2Mqtt. ZHA just says 'Failed to connect' without any further details.

And I double checked my firewall, nothing gets blocked there.

@tube0013
Copy link
Owner

ugh, sorry that didn't work. were you able to go back to the fw before that was disconnecting but to at least get things working again partially. Anyway reach out to my store email and I'll help get you sorted.

@LordNex
Copy link

LordNex commented Nov 15, 2023

So far mines been working like a champ. Have 22 devices connected so far and no drop outs!

Now that ESPHome has been updated to 11.0 don't think it's safe to update? Or should we wait and let you test it out first?

@LordNex
Copy link

LordNex commented Nov 15, 2023

No network changes as far as I am aware. I'll double check firewall logs to see if nothing gets blocked there on the path to the device, though the fact that it works initially after hard reboot and then stops makes me doubtful that the firewall is involved.

I believe I have 2022 version. At least it shows up with the tubeszb-efr32-mgm210-poe-2022 name in EspHome. If you can build it, I'll try it promptly. Thanks!

I can give you what mine adopted when I brought it into ESPHome. Mainly just loads some libraries and then points to Tubez yaml config for it. But here it is redacted. You'll need your own API key

substitutions:
  name: tubeszb-efr32-mgm210-poe-2023
packages:
  tubeszb.efr32_mgm210_poe_2023: github://tube0013/tube_gateways/models/current/tubeszb-efr32-MGM210-poe/firmware/esphome/tubeszb-efr32-mgm210-poe-2023.yaml
esphome:
  name: ${name}
  name_add_mac_suffix: false
api:
  encryption:
    key: REDACTED

That's litterly everything in mine but I've been waiting for version 11 to come out as TubeZ said there was a bug in the 10.x code.

@tube0013
Copy link
Owner

All I say with new esphome updates is update at your own risk, as I'm not able to test every release. you can always move back to the known working binary from github.

@afirix
Copy link
Author

afirix commented Nov 15, 2023

ugh, sorry that didn't work. were you able to go back to the fw before that was disconnecting but to at least get things working again partially. Anyway reach out to my store email and I'll help get you sorted.

Yeah I reflashed back to the original firmware and the device is back to working for a few hours.

@LordNex
Copy link

LordNex commented Nov 15, 2023

All I say with new esphome updates is update at your own risk, as I'm not able to test every release. you can always move back to the known working binary from github.

Only reason I say that is because Ive had devices I forgot to update before, had them go offline, and basically had to reverse engineer what I had built cause it has been so long. And these were just esp8266 basic temp/motions sensors. I'll be the tests subject and report my findings.

Currently have a new ESP32 WROOM with a new BME680 doing its burn in. This will be the main sensor for monitoring air quality in the house. I plan on building another one with some 81650 vape batteries and a charging board for out door. I figured 2 of them in series @ 35 peak amps each should give be able 6 months a charge depending on temperature

Have you had an experience building ESP devices with battery support?

@LordNex
Copy link

LordNex commented Nov 15, 2023

All I say with new esphome updates is update at your own risk, as I'm not able to test every release. you can always move back to the known working binary from github.

Only reason I say that is because Ive had devices I forgot to update before, had them go offline, and basically had to reverse engineer what I had built cause it has been so long. And these were just esp8266 basic temp/motions sensors. I'll be the tests subject and report my findings.

Currently have a new ESP32 WROOM with a new BME680 doing its burn in. This will be the main sensor for monitoring air quality in the house. I plan on building another one with some 81650 vape batteries and a charging board for out door. I figured 2 of them in series @ 35 peak amps each should give be able 6 months a charge depending on temperature

Have you had an experience building ESP devices with battery support?

Well is started to go through its update process then HA decided it wanted to to refresh the page. But I gave it some time and watched the page and it updated fine as far as I can tell. All devices still connected.

image

@LordNex
Copy link

LordNex commented Nov 15, 2023

All I say with new esphome updates is update at your own risk, as I'm not able to test every release. you can always move back to the known working binary from github.

Quick question. How many Zigbee devices can this hold? Main and battery based. Didn't know if you had an idea of what it can handle or what because I plan on installing quite a bit more. Along those same lines, is there a device or anything you know of out there that will extend the range or add an additional coordinator. I know mains powered devices act as a repeater, but I thought I read somewhere that each of those are limited to 6 other connections. If you were going to make a Zigbee Super Network. What would you do?

@tube0013
Copy link
Owner

can you move this to a Discussion please

@LordNex
Copy link

LordNex commented Nov 21, 2023

can you move this to a Discussion please

Done

@LordNex
Copy link

LordNex commented Nov 23, 2023

Just as in update. I've fully tested this with the latest version of ESPHome and everything works perfectly. Not seeing any errors and the mesh is strong and stable. This for version 11.x of ESPHome itself

@LordNex
Copy link

LordNex commented Nov 29, 2023

Thanks. I flashed it and the device seems to be responsive to pings and telnets so far, however, I can no longer connect to it with either ZHA or Zigbee2Mqtt. ZHA just says 'Failed to connect' without any further details.

And I double checked my firewall, nothing gets blocked there.

What is the default account to telnet into it?

@LordNex
Copy link

LordNex commented Jan 10, 2024

I've started to notice my adapter loose connection about once or twice a day. I have to restart the coordinator and Home Assistant they all came back. In an effort better cover my home with repeaters, I bought some Tradifi Zigbee Plugs, the ones that ensure the current so everything's covered. I'm not up to 34 devices. It wasn't until I started really trying to beef up the system by adding more repeaters. Is there something fundamentally that I'm doing wrong in my setup. I usually try and pair any Repeaters, Routers, or other devices that are mains powered directly to the coordinator. Then usually pick a router that's closest to the end device to pair it with. But the topology constantly changes and will work perfect for a few hours and then eventually one of my family will complain, I goto look and every device is unavailable. Restarting the ZHA integration to the socket doesn't seem to matter. I turned on define mode so as soon as I can provide anything more, I will. Thanks again for all your help!

@LordNex
Copy link

LordNex commented Jan 29, 2024

@tube0013 Ok so I'm still having this happen every 10-12 hours or so. Can't find any specific pattern that's causing it. I took your advise and copied and pasted your code directly into the yaml config of ESPHome and even manually assigned it an IP address even though it has one reserved. Flashed it through ESPHome with your code, and let it run. It seemed a little better but still kept causing the issue.

Only way I've found to recover is to fully restart the coordinator, then restart Home Assistant as just reloading the ZHA Integration didn't seem to do the job. It's making life hell for me as my wife leaves for work early and our primary door lock "Wyze Original" is a Zigbee lock by nature and I've always used it that way. It works fine and had worked fine until this coordinator started messing up. I'm in the process of trying to dig through log files to see if I can find anything wrong.

Sometimes it'll be a full day or almost 2. But it eventually locks up somewhere in the line and the above is the only solution. I've tried to create an automation that looks at the serial connection state and if anything else mains connected, like my EFR32 based range extender goes Unavailable and then do those steps for me, but it doesn't always trigger right and if I don't put in enough checks, it will annoyingly repeat over and over.

My mesh is now 35 devices, soon to be 37 with the majority of them being good official routers with just a spattering of end devices now on the periphery. I've even purchased a separate external antenna that has a small magnet on its base that allows me to centrally place it in the main hall of our house on an air return vent. When working, all the LQI and RSSIs for it and most except the very far or shielded devices are well within range and the coordinator isn't seeing a huge amount of noise either. But I'm at a loss to explain why this is occurring.

With my past Zigbee stick, the Nortek HUBZ1 which is still being used for its ZWave side never had this issue, although it did have range issues even with a USB extender to attempt to keep it away from interference.

So is there anything log wise I should be looking at? I'm not finding anything especially related in the HA logs. One time I thought I did because it came up saying that the channel was over utilized, but then I found the my cats had knocked the antenna down and it swung right behind a fairly large speaker. Once I placed it back where it should be. That went away. For the sake of it all I'm going to try and reflash the coordinator with just your firmware, but I'm not sure if this is an issue with ESPHome or not. It's consistently happened since I've purchased it regardless of weather I let ESP adopt it and run with substitutions pointing to your yaml configs, or if I manually copy and paste the config from your files in.

If you could let me know how to run this inside HA's ZHA without utilizing ESPHome, and that works, I'll be happy to leave it that way. It's just that I tend to use a long of ESPHome devices and I know it well enough to usually fix any problems. But I'm not seeing any errors or anything, and when I go to the web front end when it's "locked up". It doesn't show anything out of the norm. As soon as I hit the restart button for the coordinator and then restart HA, all the devices come back and the mesh reforms and works fine again.

So I'm a bit lost. Would you mind helping me sort this out please? I'm no programmer but I do have a professional background in IT so 99.9% of this i understand and get. But I'm Obviously missing something somewhere. Again, thank you for your time and help!

@LordNex
Copy link

LordNex commented Jan 29, 2024

@tube0013 After reboot I did find this in the log. If I find more I'll just edit this post and add the additional info. I also reached out to your store front email address as this is driving me and the wife nuts


Logger: zigpy.application
Source: /usr/local/lib/python3.11/site-packages/zigpy/application.py:667
First occurred: 2:35:47 AM (1 occurrences)
Last logged: 2:35:47 AM

Watchdog failure
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 665, in _watchdog_loop
    await self.watchdog_feed()
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 647, in watchdog_feed
    await self._watchdog_feed()
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 999, in _watchdog_feed
    (res,) = await self._ezsp.readCounters()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 212, in _command
    raise EzspError("EZSP is not running")
bellows.exception.EzspError: EZSP is not running

On an unrelated side note. I just built a new PC to replace my failed laptop so at some point in the future I plan on wiping my main server and reloading Home Assistant and everything else. Although I have decided how I want to do that yet (IE from a backup file, pull my config folder from GitHub, or just manually backup files and do a full build piece by piece from scratch). Although that really shouldn't make a difference concerning this problem and is sometime off from being done.

Please let me know if there is any other logs or info I can provide to assist. And thanks again as always!

@tube0013
Copy link
Owner

I'll try and catch up on this later today or tomorrow tonight, I'm in court for jury duty for most of the day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants