Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Performance of setting the PWM duty cycles #356

Open
Candas1 opened this issue Dec 10, 2023 · 10 comments
Open

[FEATURE] Performance of setting the PWM duty cycles #356

Candas1 opened this issue Dec 10, 2023 · 10 comments
Labels
enhancement New feature or request

Comments

@Candas1
Copy link
Collaborator

Candas1 commented Dec 10, 2023

Hi,

A topic has been on my mind last few days, thought I would socialize it to see if it's worth exploring it.

A few months back I measured it takes 48us on STM32F1 to run the setpwm function for 6PWM.
Which is more than the inverse clarke/park and svpwm combined in my case.

I think we are converting to too many different value domains:

  1. we calculate Ua Ub and Uc voltages (float)
  2. Dc_a Dc_b and Dc_c are calculated here as percentages of the power supply [0,1] (float)
  3. duty cycles are multiplied here by pwm_range to scale it to the timer resolution [0,4095] (uint32_t)
  4. duty cycles are scaled to the Auto reload register value here which is for example 72Mhz/16Khz/2 so [0,2250] in my case (uint32_t)

Few ideas:

  • a non breaking change could be to scale from [0,1] to [0,ARR+1] directly, skipping [0,4095] in point (3). This math could be adapted
  • a more complicated change would be to directly scale to [0,ARR+1] in _setPwm but we would need a way to get ARR+1 in a consistent way for all drivers (check getOverflow )
  • the arduino bloat could be skipped by directly using the Low level code in _setPwm (e.g. this is calculated each loop but is not used in our case, this also seems unnecessary).

[UPDATE]

  • commenting the arduino bloat reduced by 4us only
  • SetPhaseState seems to take 8us for 3 executions, this could be used to check if channel is running before stopping/resuming
  • converting from [0,4095] to [0,2250] is integer math so it might not be taking a lot of time
@Candas1 Candas1 added the enhancement New feature or request label Dec 10, 2023
@runger1101001
Copy link
Member

An associated change would be to change the STM32 PWM initialisation to work in the same way as Renesas, RP2040, ESP32 or SAMD: to use the highest value for ARR supported by the timer - in other words, to adjust the pre-scaler according to the chosen frequency to use the highest resolution we can. At the moment we use a fixed 4096 on STM32, when some MCUs could do better :-)
Also, it would be intuitive that (like with the other drivers) I can trade off PWM frequency for increased resolution.

Another good change might be to use the registers directly when setting the PWM, rather than using the HAL functions? I haven't checked, but knowing HAL this might save a lot of cycles...

The drivers can report back the PWM range using the driver's data structure, so we can work with ARR+1 like you suggest.

  • I think it's good to keep the external API hardware independent and floating point based on voltages. So I would keep the API of BLDCDriver in terms of voltages.
  • We can think about changing the API of _writeDutyCycle() - I would have no problem with this. But to do so will require changing all the hardware drivers.
  • The simple change is the one that just affects the STM32 code

@runger1101001
Copy link
Member

Another point on the PWM topic:

the way the code is structured calculations are interleaved with the setting of the PWM to the different phases. From my point of view this increases the chance of a timer reset event occurring during the process of setting the PWM to the phases. This will result in a duty cycle where some phases are in the new state and some phases are in the previous state regarding the PWM settings.
While this isn't critical, it would of course be nicer to avoid it. Changing the code so all calculations happen first, and then all registers are updated in rapid succession doesn't eliminate, but reduces the chance of these events happening.

@Candas1
Copy link
Collaborator Author

Candas1 commented Dec 11, 2023

Thanks for your comments Richard.
This is a topic I started to explore, and so far I haven't found anything obviously wrong, and the changes I tried didn't greatly improve the performance.
I was looking mainly at hardware 6pwm because that's the most straight forward, low level should work well in this case as all the channels are from the same timer.

the way the code is structured calculations are interleaved with the setting of the PWM to the different phases. From my point of view this increases the chance of a timer reset event occurring during the process of setting the PWM to the phases. This will result in a duty cycle where some phases are in the new state and some phases are in the previous state regarding the PWM settings. While this isn't critical, it would of course be nicer to avoid it. Changing the code so all calculations happen first, and then all registers are updated in rapid succession doesn't eliminate, but reduces the chance of these events happening.

Good point, it could be that the current code is setting the pwm for each channel too far appart.

There is also something I took for granted but might not be enabled with the current code, it's the preload enable bit.
You usually don't want the pwm to be applied directly, this function uses shadow registers so that the changes are applied only at the end of the pwm period to have consistent switching.
[EDIT] It's ok preload is enabled.

@Candas1
Copy link
Collaborator Author

Candas1 commented Dec 11, 2023

I am in my bed tried to fall asleep, and I think I found one reason why it's slow.

We shouldn't do math in _constrain(), I think it does the math 3 times 😅

Need to test when I wake up.

@Candas1
Copy link
Collaborator Author

Candas1 commented Dec 12, 2023

This is what I meant:
image

But it takes same time even if I move the math out, it's probably optimized by the compiler.
It might be still worth moving the math out for all those, in case it's not optimized with some of the toolchains.

@Candas1
Copy link
Collaborator Author

Candas1 commented Dec 12, 2023

Surprisingly this takes 8us on the stm32f1:

dc_a = _constrain(dc_a, 0.0f , 1.0f );
dc_b = _constrain(dc_b, 0.0f , 1.0f );
dc_c = _constrain(dc_c, 0.0f , 1.0f );

But I am not even sure it's needed, as we do this before:

Ua = _constrain(Ua, 0, voltage_limit);
Ub = _constrain(Ub, 0, voltage_limit);
Uc = _constrain(Uc, 0, voltage_limit);

voltage_limit should be smaller or equal to voltage_power_supply (see init) unless it's wrongly set after the init.

Worst case we could constrain CCR1 CCR2 CCR3 between 0 and ARR+1, it will be faster as it's integer math.
But that also impacts all drivers then.

@runger1101001
Copy link
Member

runger1101001 commented Dec 14, 2023

Looking over the code and the different drivers, I think the situation is this:

  • we need to apply the driver limit, as it can be lower than the PSU voltage
  • we need to scale by the PSU voltage, and scale to the counter TOP value. Using the range 0.0-1.0 as an intermediary API between driver class and hardware functions isn't really needed, but it's convenient.
  • limiting to the 0.0 - 1.0 range is not needed if we assume the driver voltage limit is set below the PSU voltage, as it should be.
  • the various hardware drivers generally don't check the range any more, and assume the value is in the range 0-1, and just use it to multiply with the TOP value

So internally, we could change the way the driver and the hardware driver interact: the current limit - scale - limit - scale operation could be folded into just a limit - scale operation, if we know the counter range within the driver class already.
Doing this would be an internal API change, but affect all the drivers.

Longer term one could remove the driver limit and roll it into the motor limit, so you have only one voltage limit that you have to set for the system. It would be a simplification from some point of view, but it would be an external API change...

Hmmm - actually the driver limit defines the centre-point of the modulation? So we actually do need a separate driver and motor voltage limit?

@Candas1
Copy link
Collaborator Author

Candas1 commented Dec 14, 2023

Antun had shared this in the forum, yes we need both:
image

Those are equivalent, so it could be done in different places:
Voltage_power_supply = 1.0 = ARR +1

Again I am just analyzing, if it turns out changes require a lot of effort but bring little benefits, it's not worth it.

@Copper280z
Copy link
Contributor

Copper280z commented Jul 14, 2024

Timing on an stm32f401 using pin toggling, a back to back toggle takes ~500ns. All reported times are from falling edge to rising edge which I assume means it includes 1 period of this base latency (ie if toggles were instant, the reported duration would be shorter by 500ns). Total time for the higher level setPwm call is ~9.5-10.5us.

  digitalToggle(PC14);
  Ua = _constrain(Ua, 0, voltage_limit);
  Ub = _constrain(Ub, 0, voltage_limit);
  Uc = _constrain(Uc, 0, voltage_limit);
  // calculate duty cycle
  // limited in [0,1]
  float v_supply_inv = 1/voltage_power_supply;
  dc_a = _constrain(Ua * v_supply_inv, 0.0f , 1.0f );
  dc_b = _constrain(Ub * v_supply_inv, 0.0f , 1.0f );
  dc_c = _constrain(Uc * v_supply_inv, 0.0f , 1.0f );
  digitalToggle(PC14);

1.56us

      digitalToggle(PC14);
      _setSinglePhaseState(phase_state[0], ((STM32DriverParams*)params)->timers[0], ((STM32DriverParams*)params)->channels[0], ((STM32DriverParams*)params)->channels[1]);
      if(phase_state[0] == PhaseState::PHASE_OFF) dc_a = 0.0f;
      digitalToggle(PC14);

2.1us, this obviously happens 3 times, so it's a big contributor.

      digitalToggle(PC14);
      _setPwm(((STM32DriverParams*)params)->timers[0], ((STM32DriverParams*)params)->channels[0], _PWM_RANGE*dc_a, _PWM_RESOLUTION);
      digitalToggle(PC14);

1.8us, again 3 times

  TIM_HandleTypeDef *tim = HT->getHandle();
  uint32_t CCR_RegisterValue = ((__HAL_TIM_GET_AUTORELOAD(tim) + 1) * value) / ((1 << resolution) - 1) ;
  int chan = HT->getChannel(channel);

  __HAL_TIM_SET_COMPARE(tim, chan, CCR_RegisterValue);

Replacing the call to HT->setCaptureCompare in _setPwm with this saves roughly 1us per call, for a total of 3us.

@Candas1
Copy link
Collaborator Author

Candas1 commented Jul 17, 2024

I was checking the UDIS bit @Copper280z found.
LL_TIM_EnableUpdateEvent() and LL_TIM_DisableUpdateEvent() are available for all the STM32 families so we could use it.
But this bit is set at timer level and not channel level, it could be used when the whole timer is dedicated to a single driver or we have to loop through all the timers of a driver and it makes sense only if the timers are aligned.
I would still separate the math for calculating pwm values and setting CCRX registers to reduce the risk of setting the channels at different times, but it should be done in 2PWM/3PWM/4PWM/hardware6PWM/software6PWM.

Something like this for hardware6PWM:

TIM_HandleTypeDef *tim = (STM32DriverParams*)params)->timers[0]->getHandle();
uint32_t ARR = __HAL_TIM_GET_AUTORELOAD(tim) +1;

// directly scaling to ARR instead of _PWM_RESOLUTION
uint32_t CCRA =  ARR * dc_a;
uint32_t CCRB =  ARR * dc_b;
uint32_t CCRC =  ARR * dc_c;

LL_TIM_DisableUpdateEvent(tim.instance)
__HAL_TIM_SET_COMPARE(tim, ((STM32DriverParams*)params)->channels[0], CCRA);
__HAL_TIM_SET_COMPARE(tim, ((STM32DriverParams*)params)->channels[2], CCRB);
__HAL_TIM_SET_COMPARE(tim, ((STM32DriverParams*)params)->channels[4], CCRC);
LL_TIM_EnableUpdateEvent(tim.instance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants