Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPI timeouts #13

Open
tlbtlbtlb opened this issue Oct 7, 2022 · 1 comment
Open

SPI timeouts #13

tlbtlbtlb opened this issue Oct 7, 2022 · 1 comment

Comments

@tlbtlbtlb
Copy link

tlbtlbtlb commented Oct 7, 2022

I think I've tracked down why I periodically get SPI timeouts reported. They're due to an arithmetic overflow in the timer update in the udriver firmware.

In dual_motor_torque_ctrl.c, it checks for timeout by comparing the last timestamp a packet was received from the hall sensors (gSPILastReceivedIqRef_stamp) with the current time (gTimer0_stamp):

  gErrors.bit.spi_recv_timeout = (
     gSPIReceiveIqRefTimeout != 0 // and timeout is enabled
        // check if one of the motors is enabled and has a IqRef != 0
       && ((gMotorVars[HAL_MTR1].Flag_Run_Identify
          && gMotorVars[HAL_MTR1].IqRef_A != 0)
        || (gMotorVars[HAL_MTR2].Flag_Run_Identify
           && gMotorVars[HAL_MTR2].IqRef_A != 0))
  	// finally check if last message exceeds timeout
     && (gSPILastReceivedIqRef_stamp
        < gTimer0_stamp - gSPIReceiveIqRefTimeout)
			);

So far so good, but the way gTimer0_stamp is calculated makes it wrap around to zero in much less than 2^32 ticks. In timer0_ISR, it does this:

#define TIMER0_FREQ_Hz 4000
...
uint32_t gTimer0_cnt = 0;
uint32_t gTimer0_stamp = 0;
...
  ++gTimer0_cnt;
  gTimer0_stamp = 1000 * gTimer0_cnt / TIMER0_FREQ_Hz;

But C calculates 1000 * gTimer0_cnt before dividing, so it rolls over every 2^32/4000/1000 seconds, about 17 minutes.
When it's close to zero, gTimer0_stamp - gSPIReceiveIqRefTimeout wraps around to a huge number. So if you happen to be controlling the robot at that moment, it shuts down and you have to power cycle it.

As a workaround I can disable the timeout check, but I worry that it'll fry the electronics if the hall sensors actually stop reporting.

It's probably a 1-line fix to

gTimer0_stamp = 1000 * gTimer0_cnt / TIMER0_FREQ_Hz;

@luator
Copy link
Member

luator commented Oct 10, 2022

Thanks a lot for tracking this down! We had already noticed that there is some issue always occurring after ~17 min but couldn't find the cause so far.

@thomasfla @jviereck FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants