Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: workflow builtin actions honoring on_success handlers #3070

Conversation

didier-wenzek
Copy link
Contributor

@didier-wenzek didier-wenzek commented Aug 20, 2024

Proposed changes

Improve the workflow builtin actions with explicit operation names and the ability to define specific state on success and on error, as well as for the executing state.

[scheduled]
  action = "software_update"
  on_exec = "executing"

[executing]
  action = "await_software_update"
  on_success = "postprocess"
  on_error = "rollback"
  • Distinguish BuiltIn from AwaitBuiltIn actions
  • Make the name of the operation explicit for BuiltIn and AwaitBuiltIn actions
  • Equip the BuiltIn action with an on_exec handler.
  • Equip the AwaitBuiltIn action with on_success and on_error handlers.
  • Let the user provides its own steps for the executing, successful and failed state
  • Update the documentation - deprecating the builtin keyword
  • Make sure the deprecated builtin keyword can still be used for backward compatibility
  • Add a system test checking that a builtin:<op> can be triggered on any state.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link

codecov bot commented Aug 20, 2024

Copy link
Contributor

github-actions bot commented Aug 20, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
499 0 2 499 100 1h28m15.949205999s

@didier-wenzek didier-wenzek force-pushed the feat/improve-workflow-builtin-actions branch from 3b151c3 to d3ee4ca Compare August 21, 2024 11:39
@didier-wenzek didier-wenzek force-pushed the feat/improve-workflow-builtin-actions branch from 4ec89b9 to e7591bd Compare August 23, 2024 08:35
@didier-wenzek didier-wenzek marked this pull request as ready for review August 23, 2024 08:37
Copy link
Contributor

@gligorisaev gligorisaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have checked and runned flake-finder, looks ok

Comment on lines +73 to +79
/// Trigger a built-in operation
///
/// ```toml
/// action = "<builtin-operation-name>"
/// on_exec = "<state>"
/// ```
BuiltInAction(OperationName, ExecHandlers),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afterthought, I see that as over engineering.

The motivation was to be explicit:

  • With action = "builtin" both the operation and the step are given by the context.
    • the operation is the operation of the workflow itself
    • the step is derived from the current state of the command (i.e scheduled or executing)
  • With action = "<builtin-operation-name>" and `action = "await " both the operation and the step are explicit
    • action = "<builtin-operation-name>" triggers the scheduled step of the given operation
    • `action = "await " triggers the execution step

But this raises other issues:

  • It might then be confusing to be able to invoke an operation either directly (with action = software_update) or using a sub-workflow (with operation = software_update). These are meaningful and even useful differences, but this comes with complications notably to guide users.
  • What if the user try to invoke the builtin behavior of operation A in a workflow for B? This could work but would requires extra work to handle the command payload, making even greater the confusion between builtin actions and sub operations.
  • => In practice, having the operation implied by the workflow is a good thing, simple and effective.

Hence, I'm considering to revert this change. i.e. to only keep the action = "builtin" case.

Another appealing alternative is the following:

  • Keep the action = "builtin" as of now (honoring the user-provided handlers)
  • Introduce two slight variants, to make the action clearer as it's not obvious of what is done by the builtin action depends on the state name (scheduled or executing).
  • I propose: action = "execute" for the scheduled step (which can then be named differently).
  • and action = "await" for the executing step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see one additional problem with the approach tying builtin action to their respective states, that might make some future enhancements harder. It's problematic for certain operations like config_update, where the executing state does multiple things as follows

  1. Send the executing status update
  2. Download the config file
  3. Apply the updated config file

Now, with the proposed change, we can add additional logic before or after these 3 steps, but nothing in between. For e.g: some customer might want to validate the downloaded artefact before it is applied. So, I'm actually in favour of providing multiple smaller granular actions like download, file-copy, etc which can be tied to any state as the user wants. Having some re-usable granular built-in actions would enable those to be used from any state in any workflow as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here a new proposal #3014 (comment).

I will address it in a different PR so we will be in position to compare and choose.

Copy link
Contributor

@Bravo555 Bravo555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit out of my depth on the design front (I should probably keep track of custom workflow developments more closely), but the Rust side overall looks alright, I just have one question regarding the interaction between await-... and executing status.

Comment on lines +421 to +437
/// Rewrite a command state before pushing it to a builtin operation actor
///
/// Depending the action is to trigger or await the operation,
/// set the status to schedule or executing.
///
/// Return the command state unchanged if there is no appropriate substitute.
pub fn adapt_builtin_request(&self, command_state: GenericCommandState) -> GenericCommandState {
match self {
OperationAction::BuiltInAction(_, _) => {
command_state.update(GenericStateUpdate::scheduled())
}
OperationAction::AwaitBuiltInAction(_, _) => {
command_state.update(GenericStateUpdate::executing())
}
_ => command_state,
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: it's nonobvious how setting a scheduled state leads to an operation being spawned, execution state leads to it being awaited, and whether or not the state being set is for current command or a different one (sub-command)

At the beginning I was a bit confused why <builtin-operation-name> moves to scheduled and await-<builtin-operation-name> moves to scheduling, but, as the comment says, if idea is for builtin-* to trigger an operation and await-builtin-* to await the result, then it's not clear to me how this is eventually achieved.

And particularly, when we're doing a sub-workflow execution, e.g. an example from the documentation:

[trigger_config_update]
operation = "config_update"
input.tedgeUrl = "http://127.0.0.1:8000/tedge/file-transfer/example/config_update/mosquitto-1234"
input.type = "mosquitto"
on_exec = "waiting_for_config_update"

[waiting_for_config_update]
action = "await-operation-completion"
timeout_second = 600
on_timeout = "timeout_config_update"
on_success = "successful_config_update"
on_error = { status = "failed", reason = "fail to update the config"}

it's relatively clear that action = "await-... below is waiting for the entire config update to complete, but when applied to the lower level of states, I have trouble understanding why that's necessary for the executing state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's relatively clear that action = "await-... below is waiting for the entire config update to complete, but when applied to the lower level of states, I have trouble understanding why that's necessary for the executing state.

It seems like there is an implicit assumption that scheduling state is handled by other agents trivially and immediately, so we model it as moving immediately to executing state, and we also assume that executing is a potentially long-running process that we need to await the completion of, but the question is, is this distinction necessary.

In any case, I have now a bit better idea of the flow after looking at adapt_builtin_request and adapt_builtin_response methods:

[scheduled]
  action = "software_update"
  on_exec = "executing"

[executing]
  action = "await_software_update"
  on_success = "postprocess"
  on_error = "rollback"
  1. workflow is in scheduled state, builtin operation actor is notified of that
  2. builtin operation actor moves to scheduled state
  3. builtin operation actor moves to executing state, tells workflow to run on_exec handler
  4. workflow moves to executing state, builtin operation actor is notified of that
  5. builtin operation actor moves to successful or failed state, tells workflow to run on_success or on_error handlers
  6. workflow moes to successful or failed state
  7. workflow is terminated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: it's nonobvious how setting a scheduled state leads to an operation being spawned, execution state leads to it being awaited, and whether or not the state being set is for current command or a different one (sub-command)

In both case, the workflow engine delegates the action to the actor registered for that builtin operation.

The response is sent back by the operation specific actor to the workflow actor for further processing.
This commit improves method name as they were not really clear,

It seems like there is an implicit assumption that scheduling state is handled by other agents trivially and immediately, so we model it as moving immediately to executing state, and we also assume that executing is a potentially long-running process that we need to await the completion of, but the question is, is this distinction necessary.

There is indeed an assumption that all the builtin operator actors follow the same workflow, transitioning from scheduled to executing and then either to successful or failed. However, there is no assumption on the timing. Technically, in both cases (scheduled and executing), the workflow engine awaits for the transition to occur.

Is this distinction necessary? From a workflow perspective, yes, as this makes explicit the transitions (todo - doing - done) and clarifies error management. This is notably a key point for the restart operation, where the device reboots between the two states. Internally, for the builtin actors, no. This is a legacy of when these actors where directly exposed to MQTT notifying themselves all the state transitions.

In any case, I have now a bit better idea of the flow after looking at adapt_builtin_request and adapt_builtin_response methods:

[scheduled]
 action = "software_update"
 on_exec = "executing"

[executing]
 action = "await_software_update"
 on_success = "postprocess"
 on_error = "rollback"
  1. workflow is in scheduled state, builtin operation actor is notified of that
  2. builtin operation actor moves to scheduled state
  3. builtin operation actor moves to executing state, tells workflow to run on_exec handler
  4. workflow moves to executing state, builtin operation actor is notified of that
  5. builtin operation actor moves to successful or failed state, tells workflow to run on_success or on_error handlers
  6. workflow moes to successful or failed state
  7. workflow is terminated

Your understanding is correct.

This commit introduces no changes to the user.
This is only a preparation step to properly use the exit handlers
provided by the user for a builtin command.

- Distinguish `BuiltIn` from `AwaitBuiltIn` actions
- Make the name of the operation explicit for `BuiltIn` and `AwaitBuiltIn` actions
- Equip the `BuiltIn` action with an `on_exec` handler.
- Equip the ``AwaitBuiltIn` action with `on_success` and `on_error` handlers.

Signed-off-by: Didier Wenzek <[email protected]>
The new name is more appropriate, now that these handlers are used not
only for background scripts but also to trigger a sub operation or a
builtin action.

Signed-off-by: Didier Wenzek <[email protected]>
@didier-wenzek
Copy link
Contributor Author

Closing as superseded by #3105

@didier-wenzek didier-wenzek deleted the feat/improve-workflow-builtin-actions branch September 6, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants