Possible race condition when stopping a resource #277

cochicde · 2024-11-19T06:55:31Z

One case when the problem occurs is when starting a resource and stopping it right afterwards, because some events are still running when the FB are changed to a stopped stated.

The case I encountered was specifically in the BaseCommFB, where the INIT event was being executed and therefore the connection is being opened, and then the changeExecutionState(stop/kill) call arrives from another thread (usually the main thread) which closes the closes the connection deleteing the topComStack and causing issues.

I think the problem is in the CResource::changeExecutionState where the mResourceEventExecution is stopped only after all the internal FBs are stopped. I believe we should change the states of the internal of the FBs only when we are sure that the mResourceEventExecution is not running anymore, meaning that the it should be stopped first, the ecet itself should first join its thread before returning and only then the FBs should be stopped.

So, since the resource is a composition of a ecet and internal FBs, the order of the start of it should be:

FBs
ecet

and the order of the stop of it should be the opposite:

ecet
FBs

stopping (and probably starting too) the ecet need to be synchronous to actually make sure that the ecet is in the desired state before continuing, otherwise you end up with a transit state where you triggered the change of the state, but it was not reached yet.

The text was updated successfully, but these errors were encountered:

cochicde · 2024-11-19T07:21:25Z

I see now that the E_RESTART FB is handling the Start/Stop state changes in close relationship with the ecet, which would also require some re-thinking

cochicde · 2024-11-19T17:33:58Z

After some thought, my doubts are:

what's the point of triggering the STOP event output of RESTART when the rest of the FBs in the resource will be in stopped mode anyway?
If the trigger would make sense if only the RESTART is set to stop, maybe RESTART should check first if the ecet is still alive before trying to trigger the STOP event, otherwise just go into STOP state and return.

azoitl · 2024-11-19T21:35:20Z

The description of the E_RESTART block is a bit vague in the standard. For years we have tried to come up with different interpretations how to correctly implement it. Your assessment helps that we can come to a better solution. This will also be important for the thing @MandKastner is currently to start working at.

What I can do is to tell my understanding of E_RESTART: For me it is a means that allows to inform FB Networks in this resource that the execution of this resource has started or is stopped. The events it sends out shall be used to trigger any initialization initialization. Therefore I think that E_RESTART instances should be started and stopped separately to the FBs so that the triggered FBs can be informed and execute. For starting this is less critical as any output events that an E_RESTART block is sending are put into the ECET. So it is fine to first START all blocks and then the ECET.

For STOPPING I think we need to first STOP E_RESTART, wait till all execution because of the STOP events completes. Stop ECET and then STOP all the FBS. Or did I miss anything.

cochicde · 2024-11-24T12:55:47Z

I see. I agree with the stopping procedure, but now I have more thoughts :)

How do we know when the STOP event chain finished? Could we create a temporary ecet for this?(maybe this would be even more problematic regarding race conditions). What if there's a loop somehow in this STOP event chain? Should we see this as a user error which can't be handled by the runtime? Do we use the KILL command for immediate stop of the ecet and FBs activity? I'd see then KILL kind of a force STOP.
After the STOP event fron E_RESTART is triggered, how do handle the external events still coming in? Should we shut down the input from the system to the resource, meaning not allowing new event chains? (I'm sure if this could be problematic for some use cases).
I see some overlap between the Start/Stop states which are triggered by the "system", and the init/deinit transitions which are triggered by events (usually triggered by the START/STOP from E_RESTART). From your explanation and what I see in the code, the Stopped state of a Function Block inhibits any logic being executed in it (does not process any event and shuts down any connection to the system). However, after some thoughts, I don't see this as a problem actually, but a good strategy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible race condition when stopping a resource #277

Possible race condition when stopping a resource #277

cochicde commented Nov 19, 2024

cochicde commented Nov 19, 2024

cochicde commented Nov 19, 2024

azoitl commented Nov 19, 2024

cochicde commented Nov 24, 2024 •

edited

Loading

Possible race condition when stopping a resource #277

Possible race condition when stopping a resource #277

Comments

cochicde commented Nov 19, 2024

cochicde commented Nov 19, 2024

cochicde commented Nov 19, 2024

azoitl commented Nov 19, 2024

cochicde commented Nov 24, 2024 • edited Loading

cochicde commented Nov 24, 2024 •

edited

Loading