Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception (LoadProhibited) in run_ook_demods() #116

Open
dilmash opened this issue Nov 16, 2023 · 8 comments
Open

Exception (LoadProhibited) in run_ook_demods() #116

dilmash opened this issue Nov 16, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@dilmash
Copy link

dilmash commented Nov 16, 2023

Analysis

I am running a vanilla OpenMQTTGateway on a LilyGo RTL 433 Module. Sometime the firmware crashes with the information shown below. I can reproduce the exception when I power on an OregonScientifc Thermo/Hygro Sensor
image
Most likely a THGN132N

I compiled with:

build_type = debug
monitor_filters = esp32_exception_decoder

and I also added 2 debug printf's to identify the element that may have caused the exception as:

int run_ook_demods(list_t* r_devs, pulse_data_t* pulse_data) {
  int p_events = 0;

  unsigned next_priority = 0; // next smallest on each loop through decoders
  // run all decoders of each priority, stop if an event is produced
  int debug_count = 0;         // << Added for debugging
  logprintfLn(LOG_DEBUG, "\nr_api.c - Decoder start");
  for (unsigned priority = 0; !p_events && priority < UINT_MAX; priority = next_priority) {   
    next_priority = UINT_MAX;
    for (void** iter = r_devs->elems; iter && *iter; ++iter) {          // <== Trap is here
      printf(" %d", debug_count++);   // << Added for debugging
      r_device* r_dev = *iter;

As the counter indicates, the loop is at either position 87 or 88 (Last last digit of 88 may have been dropped by the exception)

      memcpy(&cfg->devices[87], &maverick_et73, sizeof(r_device));
      memcpy(&cfg->devices[88], &maverick_et73x, sizeof(r_device));
      memcpy(&cfg->devices[89], &mebus433, sizeof(r_device));

Most probably mebus433 ?

[.... ] 85 86 87 8Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x4013996b  PS      : 0x00060c30  A0      : 0x8010a714  A1      : 0x3ffe31f0  
A2      : 0x00000001  A3      : 0x3ffe09cc  A4      : 0xbaad5678  A5      : 0x00000000  
A6      : 0x00000059  A7      : 0x0000000a  A8      : 0x80139969  A9      : 0x3ffe31a0  
A10     : 0x00000003  A11     : 0x3ffdeac8  A12     : 0x000003e8  A13     : 0x000000e5  
A14     : 0x4009a1e0  A15     : 0x00000000  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0xbaad56a4  LBEG    : 0x4008aba5  LEND    : 0x4008abb5  LCOUNT  : 0xffffffff  


Backtrace: 0x40139968:0x3ffe31f0 0x4010a711:0x3ffe3220

  #0  0x40139968:0x3ffe31f0 in run_ook_demods at .pio/libdeps/lilygo-rtl_433/rtl_433_ESP/src/rtl_433/r_api.c:523
  #1  0x4010a711:0x3ffe3220 in rtl_433_DecoderTask(void*) at .pio/libdeps/lilygo-rtl_433/rtl_433_ESP/src/signalDecoder.cpp:482

I can reproduce the issue almost imediately when activating the THGN132N

Expected Behavior

run_ook_demods() should not cause an exception when scanning the list of protocols

Steps To Reproduce

Activating a THGN132N temparature sensor will cause the exception almost immediatley

Logs

See above

Configuration

Compiling OpenMQTTGateway with
`default_envs = lilygo-rtl_433`
set in platform.ini

Environment

  • OS: Lilygo rl433
  • Software: OpenMQTTGateway
  • Node: N/A
  • npm: N/A

Process Supervisor

not applicable

Additional Context

No response

@dilmash dilmash added the bug Something isn't working label Nov 16, 2023
@NorthernMan54
Copy link
Owner

Thanks for identifying this

To confirm which decoder is triggering this, can you add this line back in ( you may need to fiddle with it a bit, but want to see r_dev->name )

logprintfLn(LOG_DEBUG, "demod(%d) - %s", r_dev->modulation, r_dev->name);

It will give the decoder name

I don't think it is mebus433 as it is listed as disabled in the code base.

Once you identify which one, you can add a disabled flag to the device decoder similar to mebus433 and recompile, that will remove it from being accessed.

@dilmash
Copy link
Author

dilmash commented Nov 17, 2023

Is added the printf into lines 523 und 524 as:

  for (unsigned priority = 0; !p_events && priority < UINT_MAX; priority = next_priority) {
    next_priority = UINT_MAX;
    for (void** iter = r_devs->elems; iter && *iter; ++iter) {
      
      r_device* r_dev = *iter;
      logprintfLn(LOG_DEBUG, "demod(%d) - %s", r_dev->modulation, r_dev->name);
      fflush(stdout);

      // Find next smallest priority

Result is shown below:

[ ... ]
rtl_433_ESP(7): demod(9) - Norgo NGE101
rtl_433_ESP(7): demod(4) - Oil Ultrasonic STANDARD ASK
rtl_433_ESP(7): demod(6) - Opus/Imagintronix XT300 Soil Moisture
rtl_433_ESP(7): demod(3) - Oregon Scientific Weather Sensor
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x40139944  PS      : 0x00060c30  A0      : 0x8010a714  A1      : 0x3ffe3170  
A2      : 0x00000001  A3      : 0x4009a1e0  A4      : 0x00000478  A5      : 0x3ffe0938  
A6      : 0x00000000  A7      : 0x0000000a  A8      : 0x80139944  A9      : 0x3ffe3120  
A10     : 0x00000010  A11     : 0x3ffdea58  A12     : 0x00000071  A13     : 0x00000001  
A14     : 0x3ffde5f8  A15     : 0x000001b8  SAR     : 0x00000019  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x0000047c  LBEG    : 0x4008aba5  LEND    : 0x4008abb5  LCOUNT  : 0xffffffff  


Backtrace: 0x40139941:0x3ffe3170 0x4010a711:0x3ffe31a0

  #0  0x40139941:0x3ffe3170 in run_ook_demods at .pio/libdeps/lilygo-rtl_433/rtl_433_ESP/src/rtl_433/r_api.c:523
  #1  0x4010a711:0x3ffe31a0 in rtl_433_DecoderTask(void*) at .pio/libdeps/lilygo-rtl_433/rtl_433_ESP/src/signalDecoder.cpp:482

I have deactivated the corresponding device and the exception has disappeared. But of course the "Oregon Scientific" sensor is no longer recognized. Is there any way to fix this? I would definitely try to help.

@NorthernMan54
Copy link
Owner

Two thumbs up on having the back trace enabled

Looking at the backtrace I thought it would have pointed to the line within Oregon_scientific.c, but it didn't.

As the device decoders are sourced from rtl_433, my approach would be to first refresh that decoder to see if the issue is already resolved ( hopefully no dramatic API changes where made), then do a code review to see if I can spot the issue, then worst case put a lot of debug statements in to see where it is failing. And once I find the issue, resolve it, and send the fix upstream to rtl_433. Not sure how much time you want to put into this.

@dilmash
Copy link
Author

dilmash commented Dec 1, 2023

On further investigation of this problem, I found that the error always seems to occur in malloc() .... However, the stack trace is also always different. The case shown in the following trace occurs most frequently.
The packet seems to be decoded properly, then an enqueue() occurs in the callback function;

In src/rtl_433/r_api.c:985

  // callback to external function that receives message from device (
  // rtl_433_ESPCallBack )
  (cfg->callback)(cfg->messageBuffer);

then to enqueueJsonObject in main/ZgatewayRTL_433.ino:261 and jsonQueue.push(bundle); in main/main.ino:436

    RFrtl_433_ESPdata["origin"] = (char*)topic.c_str();
    enqueueJsonObject(RFrtl_433_ESPdata);    // <===
    storeSignalValue(MQTTvalue);
    pubOled((char*)topic.c_str(), RFrtl_433_ESPdata);

----------
  JsonBundle bundle;
  bundle.doc = jsonDoc;
  jsonQueue.push(bundle);          // <===
  Log.trace(F("Queue length: %d" CR), jsonQueue.size());

and down to components/heap/multi_heap.c:200 where it finally crashes.
I don't understand the overall logic of the program so far, but since the decoding has obviously run without errors,

rtl_433_ESP(6): data_output {"model":"Oregon-THGR122N","id":112,"channel":1,"battery_ok":1,"temperature_C":22.7,"humidity":44,"protocol":"Oregon Scientific Weather Sensor","rssi":-41,"duration":186814}
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

I would suspect a storage overlay either in the decoder or another memory problem.
With some help I would have time to investigate further.

@NorthernMan54
Copy link
Owner

That section of code is from the openMQTTGateway package, and I believe it is queuing the message for sending via MQTT. I'm wondering if the object is larger than others or for some reason it is exceeding a buffer size. Visually it doesn't appear to be extremely large.

@dilmash
Copy link
Author

dilmash commented Dec 13, 2023

An annoying problem, a simple cause. After increasing the value of rtl_433_ReceiverTask_Stack in rtl_433_ESP.cpp by just 48 bytes, the error no longer occurred. To be on the safe side, I set the value to 3000. Decoding is now running smoothly.

@NorthernMan54
Copy link
Owner

@dilmash - If you watch the logs and status messages, the high water mark for that stack is logged so you can tune it further as needed. The log entry is RTL_HWM

alogprintf(LOG_INFO, ", RTL_HWM: %d", uxTaskGetStackHighWaterMark(rtl_433_ReceiverHandle));

@NorthernMan54
Copy link
Owner

I have added your 48 byte tweak to release 0.3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants