feat: workflow builtin actions honoring on_success handlers #3070

didier-wenzek · 2024-08-20T16:55:24Z

Proposed changes

Improve the workflow builtin actions with explicit operation names and the ability to define specific state on success and on error, as well as for the executing state.

[scheduled]
  action = "software_update"
  on_exec = "executing"

[executing]
  action = "await_software_update"
  on_success = "postprocess"
  on_error = "rollback"

Distinguish BuiltIn from AwaitBuiltIn actions
Make the name of the operation explicit for BuiltIn and AwaitBuiltIn actions
Equip the BuiltIn action with an on_exec handler.
Equip the AwaitBuiltIn action with on_success and on_error handlers.
Let the user provides its own steps for the executing, successful and failed state
Update the documentation - deprecating the builtin keyword
Make sure the deprecated builtin keyword can still be used for backward compatibility
Add a system test checking that a builtin:<op> can be triggered on any state.

Types of changes

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
Documentation Update (if none of the other choices apply)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

allow users to add workflow states after a built-in executing action #3014

Checklist

I have read the CONTRIBUTING doc
I have signed the CLA (in all commits with git commit -s)
I ran cargo fmt as mentioned in CODING_GUIDELINES
I used cargo clippy as mentioned in CODING_GUIDELINES
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

Further comments

codecov · 2024-08-20T17:10:19Z

Codecov Report

Attention: Patch coverage is 36.51685% with 113 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
crates/core/tedge_api/src/workflow/toml_config.rs	0.00%	58 Missing ⚠️
crates/core/tedge_api/src/workflow/mod.rs	37.77%	28 Missing ⚠️
crates/core/tedge_api/src/workflow/state.rs	55.17%	13 Missing ⚠️
.../core/tedge_agent/src/operation_workflows/actor.rs	52.94%	2 Missing and 6 partials ⚠️
crates/core/tedge_api/src/workflow/supervisor.rs	64.28%	4 Missing and 1 partial ⚠️
crates/core/tedge_api/src/workflow/script.rs	92.85%	1 Missing ⚠️

Additional details and impacted files

📢 Thoughts on this report? Let us know!

github-actions · 2024-08-20T17:44:41Z

Robot Results

✅ Passed	❌ Failed	⏭️ Skipped	Total	Pass %	⏱️ Duration
499	0	2	499	100	1h28m15.949205999s

gligorisaev

Have checked and runned flake-finder, looks ok

didier-wenzek · 2024-08-23T13:01:10Z

crates/core/tedge_api/src/workflow/mod.rs

+    /// Trigger a built-in operation
+    ///
+    /// ```toml
+    /// action = "<builtin-operation-name>"
+    /// on_exec = "<state>"
+    /// ```
+    BuiltInAction(OperationName, ExecHandlers),


Afterthought, I see that as over engineering.

The motivation was to be explicit:

With action = "builtin" both the operation and the step are given by the context.

the operation is the operation of the workflow itself

the step is derived from the current state of the command (i.e scheduled or executing)

With action = "<builtin-operation-name>" and `action = "await " both the operation and the step are explicit

action = "<builtin-operation-name>" triggers the scheduled step of the given operation

`action = "await " triggers the execution step

But this raises other issues:

It might then be confusing to be able to invoke an operation either directly (with action = software_update) or using a sub-workflow (with operation = software_update). These are meaningful and even useful differences, but this comes with complications notably to guide users.

What if the user try to invoke the builtin behavior of operation A in a workflow for B? This could work but would requires extra work to handle the command payload, making even greater the confusion between builtin actions and sub operations.

=> In practice, having the operation implied by the workflow is a good thing, simple and effective.

Hence, I'm considering to revert this change. i.e. to only keep the action = "builtin" case.

Another appealing alternative is the following:

Keep the action = "builtin" as of now (honoring the user-provided handlers)

Introduce two slight variants, to make the action clearer as it's not obvious of what is done by the builtin action depends on the state name (scheduled or executing).

I propose: action = "execute" for the scheduled step (which can then be named differently).

and action = "await" for the executing step.

I see one additional problem with the approach tying builtin action to their respective states, that might make some future enhancements harder. It's problematic for certain operations like config_update, where the executing state does multiple things as follows

Send the executing status update

Download the config file

Apply the updated config file

Now, with the proposed change, we can add additional logic before or after these 3 steps, but nothing in between. For e.g: some customer might want to validate the downloaded artefact before it is applied. So, I'm actually in favour of providing multiple smaller granular actions like download, file-copy, etc which can be tied to any state as the user wants. Having some re-usable granular built-in actions would enable those to be used from any state in any workflow as well.

Here a new proposal #3014 (comment).

I will address it in a different PR so we will be in position to compare and choose.

tests/RobotFramework/tests/tedge_agent/workflows/custom_operation.robot

Bravo555

A bit out of my depth on the design front (I should probably keep track of custom workflow developments more closely), but the Rust side overall looks alright, I just have one question regarding the interaction between await-... and executing status.

Bravo555 · 2024-08-30T14:14:56Z

crates/core/tedge_api/src/workflow/mod.rs

+    /// Rewrite a command state before pushing it to a builtin operation actor
+    ///
+    /// Depending the action is to trigger or await the operation,
+    /// set the status to schedule or executing.
+    ///
+    /// Return the command state unchanged if there is no appropriate substitute.
+    pub fn adapt_builtin_request(&self, command_state: GenericCommandState) -> GenericCommandState {
+        match self {
+            OperationAction::BuiltInAction(_, _) => {
+                command_state.update(GenericStateUpdate::scheduled())
+            }
+            OperationAction::AwaitBuiltInAction(_, _) => {
+                command_state.update(GenericStateUpdate::executing())
+            }
+            _ => command_state,
+        }
+    }


thought: it's nonobvious how setting a scheduled state leads to an operation being spawned, execution state leads to it being awaited, and whether or not the state being set is for current command or a different one (sub-command)

At the beginning I was a bit confused why <builtin-operation-name> moves to scheduled and await-<builtin-operation-name> moves to scheduling, but, as the comment says, if idea is for builtin-* to trigger an operation and await-builtin-* to await the result, then it's not clear to me how this is eventually achieved.

And particularly, when we're doing a sub-workflow execution, e.g. an example from the documentation:

[trigger_config_update] operation = "config_update" input.tedgeUrl = "http://127.0.0.1:8000/tedge/file-transfer/example/config_update/mosquitto-1234" input.type = "mosquitto" on_exec = "waiting_for_config_update" [waiting_for_config_update] action = "await-operation-completion" timeout_second = 600 on_timeout = "timeout_config_update" on_success = "successful_config_update" on_error = { status = "failed", reason = "fail to update the config"}

it's relatively clear that action = "await-... below is waiting for the entire config update to complete, but when applied to the lower level of states, I have trouble understanding why that's necessary for the executing state.

it's relatively clear that action = "await-... below is waiting for the entire config update to complete, but when applied to the lower level of states, I have trouble understanding why that's necessary for the executing state.

It seems like there is an implicit assumption that scheduling state is handled by other agents trivially and immediately, so we model it as moving immediately to executing state, and we also assume that executing is a potentially long-running process that we need to await the completion of, but the question is, is this distinction necessary.

In any case, I have now a bit better idea of the flow after looking at adapt_builtin_request and adapt_builtin_response methods:

[scheduled] action = "software_update" on_exec = "executing" [executing] action = "await_software_update" on_success = "postprocess" on_error = "rollback"

workflow is in scheduled state, builtin operation actor is notified of that

builtin operation actor moves to scheduled state

builtin operation actor moves to executing state, tells workflow to run on_exec handler

workflow moves to executing state, builtin operation actor is notified of that

builtin operation actor moves to successful or failed state, tells workflow to run on_success or on_error handlers

workflow moes to successful or failed state

workflow is terminated

thought: it's nonobvious how setting a scheduled state leads to an operation being spawned, execution state leads to it being awaited, and whether or not the state being set is for current command or a different one (sub-command)

In both case, the workflow engine delegates the action to the actor registered for that builtin operation.

The response is sent back by the operation specific actor to the workflow actor for further processing.
This commit improves method name as they were not really clear,

It seems like there is an implicit assumption that scheduling state is handled by other agents trivially and immediately, so we model it as moving immediately to executing state, and we also assume that executing is a potentially long-running process that we need to await the completion of, but the question is, is this distinction necessary.

There is indeed an assumption that all the builtin operator actors follow the same workflow, transitioning from scheduled to executing and then either to successful or failed. However, there is no assumption on the timing. Technically, in both cases (scheduled and executing), the workflow engine awaits for the transition to occur.

Is this distinction necessary? From a workflow perspective, yes, as this makes explicit the transitions (todo - doing - done) and clarifies error management. This is notably a key point for the restart operation, where the device reboots between the two states. Internally, for the builtin actors, no. This is a legacy of when these actors where directly exposed to MQTT notifying themselves all the state transitions.

In any case, I have now a bit better idea of the flow after looking at adapt_builtin_request and adapt_builtin_response methods:

[scheduled] action = "software_update" on_exec = "executing" [executing] action = "await_software_update" on_success = "postprocess" on_error = "rollback"

workflow is in scheduled state, builtin operation actor is notified of that

builtin operation actor moves to scheduled state

builtin operation actor moves to executing state, tells workflow to run on_exec handler

workflow moves to executing state, builtin operation actor is notified of that

builtin operation actor moves to successful or failed state, tells workflow to run on_success or on_error handlers

workflow moes to successful or failed state

workflow is terminated

Your understanding is correct.

This commit introduces no changes to the user. This is only a preparation step to properly use the exit handlers provided by the user for a builtin command. - Distinguish `BuiltIn` from `AwaitBuiltIn` actions - Make the name of the operation explicit for `BuiltIn` and `AwaitBuiltIn` actions - Equip the `BuiltIn` action with an `on_exec` handler. - Equip the ``AwaitBuiltIn` action with `on_success` and `on_error` handlers. Signed-off-by: Didier Wenzek <[email protected]>

Signed-off-by: Didier Wenzek <[email protected]>

The new name is more appropriate, now that these handlers are used not only for background scripts but also to trigger a sub operation or a builtin action. Signed-off-by: Didier Wenzek <[email protected]>

Signed-off-by: Didier Wenzek <[email protected]>

didier-wenzek · 2024-09-06T14:04:38Z

Closing as superseded by #3105

didier-wenzek temporarily deployed to Test Pull Request August 20, 2024 16:55 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto August 20, 2024 17:01 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request August 21, 2024 08:43 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto August 21, 2024 08:49 — with GitHub Actions Failure

didier-wenzek force-pushed the feat/improve-workflow-builtin-actions branch from 3b151c3 to d3ee4ca Compare August 21, 2024 11:39

didier-wenzek temporarily deployed to Test Pull Request August 21, 2024 11:39 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto August 21, 2024 11:45 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request August 21, 2024 16:28 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto August 21, 2024 16:35 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request August 22, 2024 08:15 — with GitHub Actions Inactive

didier-wenzek temporarily deployed to Test Auto August 22, 2024 08:21 — with GitHub Actions Inactive

didier-wenzek temporarily deployed to Test Pull Request August 22, 2024 14:11 — with GitHub Actions Inactive

didier-wenzek temporarily deployed to Test Auto August 22, 2024 14:17 — with GitHub Actions Inactive

didier-wenzek force-pushed the feat/improve-workflow-builtin-actions branch from 4ec89b9 to e7591bd Compare August 23, 2024 08:35

didier-wenzek temporarily deployed to Test Pull Request August 23, 2024 08:35 — with GitHub Actions Inactive

didier-wenzek marked this pull request as ready for review August 23, 2024 08:37

didier-wenzek requested review from albinsuresh, jarhodes314, rina23q, gligorisaev and a team as code owners August 23, 2024 08:37

didier-wenzek temporarily deployed to Test Auto August 23, 2024 08:42 — with GitHub Actions Inactive

didier-wenzek requested a review from reubenmiller as a code owner August 23, 2024 10:33

didier-wenzek temporarily deployed to Test Pull Request August 23, 2024 10:33 — with GitHub Actions Inactive

didier-wenzek temporarily deployed to Test Auto August 23, 2024 11:12 — with GitHub Actions Inactive

gligorisaev reviewed Aug 23, 2024

View reviewed changes

didier-wenzek commented Aug 23, 2024

View reviewed changes

didier-wenzek had a problem deploying to Test Pull Request August 27, 2024 08:25 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request August 27, 2024 08:30 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto August 27, 2024 08:39 — with GitHub Actions Failure

didier-wenzek force-pushed the feat/improve-workflow-builtin-actions branch from 17a57ff to de47052 Compare August 27, 2024 12:08

didier-wenzek temporarily deployed to Test Pull Request August 27, 2024 12:08 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto August 27, 2024 12:14 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request August 27, 2024 12:42 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto August 27, 2024 12:47 — with GitHub Actions Failure

didier-wenzek temporarily deployed to Test Pull Request August 27, 2024 16:22 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto August 27, 2024 16:29 — with GitHub Actions Failure

didier-wenzek commented Aug 27, 2024

View reviewed changes

tests/RobotFramework/tests/tedge_agent/workflows/custom_operation.robot Show resolved Hide resolved

didier-wenzek temporarily deployed to Test Pull Request August 28, 2024 15:14 — with GitHub Actions Inactive

didier-wenzek temporarily deployed to Test Auto August 28, 2024 15:23 — with GitHub Actions Inactive

didier-wenzek added the theme:workflows label Aug 29, 2024

Bravo555 reviewed Aug 30, 2024

View reviewed changes

didier-wenzek added 5 commits September 2, 2024 17:43

Clarify name and role of workflow processing methods

b0954ed

Signed-off-by: Didier Wenzek <[email protected]>

Use user provider exec and await handlers for builtin actions

b5d6f32

Signed-off-by: Didier Wenzek <[email protected]>

Rename BgExitHandlers -> ExecHandlers

0d93cec

The new name is more appropriate, now that these handlers are used not only for background scripts but also to trigger a sub operation or a builtin action. Signed-off-by: Didier Wenzek <[email protected]>

Improve documentation of builtin actions in a workflow

099f63a

Signed-off-by: Didier Wenzek <[email protected]>

didier-wenzek force-pushed the feat/improve-workflow-builtin-actions branch from cdfdde9 to e891f6c Compare September 2, 2024 15:47

didier-wenzek temporarily deployed to Test Pull Request September 2, 2024 15:47 — with GitHub Actions Inactive

didier-wenzek had a problem deploying to Test Auto September 2, 2024 15:53 — with GitHub Actions Error

Simplify operation workflow test

08cd735

Signed-off-by: Didier Wenzek <[email protected]>

didier-wenzek force-pushed the feat/improve-workflow-builtin-actions branch from e891f6c to 08cd735 Compare September 2, 2024 15:56

didier-wenzek temporarily deployed to Test Pull Request September 2, 2024 15:56 — with GitHub Actions Inactive

didier-wenzek temporarily deployed to Test Auto September 2, 2024 16:01 — with GitHub Actions Inactive

didier-wenzek mentioned this pull request Sep 4, 2024

Feat: improve workflow builtin actions #3105

Merged

20 tasks

didier-wenzek closed this Sep 6, 2024

didier-wenzek deleted the feat/improve-workflow-builtin-actions branch September 6, 2024 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: workflow builtin actions honoring on_success handlers #3070

feat: workflow builtin actions honoring on_success handlers #3070

didier-wenzek commented Aug 20, 2024 •

edited

Loading

codecov bot commented Aug 20, 2024 •

edited

Loading

github-actions bot commented Aug 20, 2024 •

edited

Loading

gligorisaev left a comment

didier-wenzek Aug 23, 2024

albinsuresh Aug 28, 2024

didier-wenzek Aug 29, 2024

Bravo555 left a comment

Bravo555 Aug 30, 2024

Bravo555 Aug 30, 2024

didier-wenzek Sep 2, 2024

didier-wenzek commented Sep 6, 2024

feat: workflow builtin actions honoring on_success handlers #3070

feat: workflow builtin actions honoring on_success handlers #3070

Conversation

didier-wenzek commented Aug 20, 2024 • edited Loading

Proposed changes

Types of changes

Paste Link to the issue

Checklist

Further comments

codecov bot commented Aug 20, 2024 • edited Loading

Codecov Report

github-actions bot commented Aug 20, 2024 • edited Loading

Robot Results

gligorisaev left a comment

Choose a reason for hiding this comment

didier-wenzek Aug 23, 2024

Choose a reason for hiding this comment

albinsuresh Aug 28, 2024

Choose a reason for hiding this comment

didier-wenzek Aug 29, 2024

Choose a reason for hiding this comment

Bravo555 left a comment

Choose a reason for hiding this comment

Bravo555 Aug 30, 2024

Choose a reason for hiding this comment

Bravo555 Aug 30, 2024

Choose a reason for hiding this comment

didier-wenzek Sep 2, 2024

Choose a reason for hiding this comment

didier-wenzek commented Sep 6, 2024

didier-wenzek commented Aug 20, 2024 •

edited

Loading

codecov bot commented Aug 20, 2024 •

edited

Loading

github-actions bot commented Aug 20, 2024 •

edited

Loading