Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…e hooks, and resource monitoring
  • Loading branch information
amolnar-rh committed Feb 27, 2025
1 parent 493fcb9 commit 11d4528
Show file tree
Hide file tree
Showing 4 changed files with 334 additions and 0 deletions.
122 changes: 122 additions & 0 deletions edge_manager/edge_mgr_device_lifecycle_hooks.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
[#device-lifecycle-hooks"]
= Device lifecycle hooks

The {rhem} agent can run user-defined commands at specific points in the device lifecycle by using device lifecycle hooks.
For example, you can add a shell script to your operating system images that backs up your application data.
You can then specify that this script must run and complete successfully before the agent can start updating the operating system.

The following device lifecycle hooks are supported:

[%header,cols="1,3"]
|===
|Lifecycle Hook |Description
|`beforeUpdating` |The hook is called after the agent completed preparing for the update but before changing the operating system.
If an action in this hook returns with a failure, the agent cancels the update.

|`afterUpdating` |The hook is called after the agent wrote the update to disk.
If an action in this hook returns with a failure, the agent cancels and rolls back the update.

|`beforeRebooting` |The hook is called before the system reboots. The agent blocks the reboot until the action completes or times out.
If any action in this hook returns with a failure, the agent cancels and rolls back the update.

|`afterRebooting` |The hook is called when the agent first starts after a reboot.
If any action in this hook returns with a failure, the agent reports the failure but continues starting up.
|===

//For a state diagram defining when each device lifecycle hook is called by the agent, see the ADD LINK[Device API statuses] section.
//API docs are deprecated in ACM so I can't include this. Do we need to add more info this doc?

[#rule-files"]
== Rule files

You can define device lifecycle hooks by adding rule files to one of the following locations in the device filesystem:

* Rules in the `/usr/lib/flightctl/hooks.d/<lifecycle_hook_name>/` drop-in directory are read-only.
To add rules to the `/usr` directory, you must add them to the operating system image during image building.
* Rules in the `/etc/flightctl/hooks.d/<lifecycle_hook_name>/` drop-in directory are read-writable.
You can update the rules at runtime by using several methods.
//For more information, see Image building and The operating system configuration
//TODO add link and verify title of the section

When creating and placing the files, you must consider the following practices:

* The name of the rule must be all lower case.
* If you define rules in both locations, the rules are merged.
* If you add more than one rule files to a lifecycle hook directory, the files are processed in lexical order of the file names.
* If you define files with identical file names in both locations, files in the `/etc` folder takes precedence over files of the same name in the `/usr` folder.
A rule file is written in YAML format and contains a list of one or more actions.
An action can be an instruction to run an external command.

When you specify many actions for a hook, the actions are performed in sequence, finishing one action before starting the next.
If an action returns with a failure, the following actions are skipped.

A `run` action takes the following parameters:

[%header,cols="1,3"]
|===
|Parameter |Description
|`Run` |The absolute path to the command to run, followed by any flags or arguments, for example `/usr/bin/nmcli connection reload`.
The command is not executed in a shell, so you cannot use shell variables, such as `$PATH` or `$HOME`, or chain commands, such as `\|` or `;`.
However, if necessary, you can start a shell by specifying the shell as command to run, for example `/usr/bin/bash -c 'echo $SHELL $HOME $USER'`.

|`EnvVars` |Optional. A list of key-value pairs to set as environment variables for the command.

|`WorkDir` |Optional. The directory the command is run from.

|`Timeout` |Optional. The maximum duration allowed for the action to complete.
Specify the duration as a single positive integer followed by a time unit.
The `s`, `m`, and `h` units are supported for seconds, minutes, and hours, respectively.

|`If` |Optional. A list of conditions that must be true for the action to be run.
If not provided, actions run unconditionally.
|===

By default, actions are performed every time the hook is triggered.
However, for the `afterUpdating` hook, you can use the `If` parameter to add conditions that must be true for an action to be performed.
Otherwise the action is skipped.

For example, to run an action only if a given file or directory changes during the update, you can define a path condition that takes the following parameters:

[%header,cols="1,3"]
|===
|Parameter |Description
|`Path` |An absolute path to a file or directory that must change during the update as condition for the action to be performed.
Specify paths by using forward slashes (`/`).
If the path is to a directory, it must end with a forward slash (`/`).
If you specify a path to a file, the file must have changed to satisfy the condition.
If you specify a path to a directory, a file in that directory or any of its subdirectories must have changed to satisfy the condition.
|`Op` |A list of file operations, such as `created`, `updated`, and `removed`, to limit the type of changes to the specified path as condition for the action to be performed.
|===

If you specify a path condition for an action in the `afterUpdating` hook, you have the following variables that you can include in arguments to your command and are replaced with the absolute paths to the changed files:

[%header,cols="1,3"]
|===
|Variable |Description
|`{{ Path }}` |The absolute path to the file or directory specified in the path condition.

|`{{ Files }}` |A space-separated list of absolute paths of the files that changed during the update and are covered by the path condition.

|`{{ CreatedFiles }}` |A space-separated list of absolute paths of the files that were created during the update and are covered by the path condition.

|`{{ UpdatedFiles }}` |A space-separated list of absolute paths of the files that were updated during the update and are covered by the path condition.

|`{{ RemovedFiles }}` |A space-separated list of absolute paths of the files that were removed during the update and are covered by the path condition.
|===

The {rhem} agent includes a built-in set of rules defined in `/usr/lib/flightctl/hooks.d/afterupdating/00-default.yaml`.
The following commands are executed if the certain files are changed:

[%header,cols="2,2,4"]
|===
|File |Command|Description
|`/etc/systemd/system/` |`systemctl daemon-reload` |Changes to `systemd` units are activated by signaling the `systemd` daemon to reload the `systemd` manager configuration.
This reruns all generators, reloads all unit files, and re-creates the entire dependency tree.

|`/etc/NetworkManager/system-connections/` |`nmcli conn reload` |Changes to Network Manager system connections are activated by signaling Network Manager to reload all connections
//TODO check if Network Manager has a different brand-approved name

|`/etc/firewalld/` |`firewall-cmd --reload` |Changes to the permanent configuration of `firewalld` are activated by signaling `firewalld` to reload firewall rules as new runtime configuration.
|===
136 changes: 136 additions & 0 deletions edge_manager/edge_mgr_manage_apps_devices.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
[#manage-apps"]
= Managing applications

You can deploy, update, or remove applications on a device by updating the list of applications in the device specification.
When the {rhem} agent checks in and detects the change in the specification, the agent downloads any new or updated application packages and images from an OCI-compatible registry.
Then, the agent deploys the packages to the appropriate application runtime or removes them from that runtime.

The {rhem} supports `podman-compose` as the application runtime and format.

[#prereqs"]
== Prerequisites

* You must install the {rhem} CLI.
* You must log in to the {rhem} service.
* You must install Podman Compose. See link:https://podman-desktop.io/docs/compose/setting-up-compose[Setting up Compose].
[#create-apps"]
== Creating applications

You can create an Open Container Initiative (OCI) registry application package.
Complete the following steps:

. Define the functionality of the application with the Podman Compose specification.

+
[source,bash]
----
FROM scratch <1>
COPY podman-compose.yaml /podman-compose.yaml
LABEL appType="compose" <2>
----
<1> Embed the compose file in a `scratch` container.
<2> Add the `appType=compose` label.

. Build and push the container to your OCI registry.

. Specify the application package in `spec.applications` field of the `Device` resource:

+
[source,yaml]
----
apiVersion: flightctl.io/v1alpha1
kind: Device
metadata:
name: <device_name>
spec:
[...]
applications:
- name: podman-compose.yaml
[...]
----

[#deploy-apps"]
== Deploying applications on a device using the CLI

Deploy an application package to a device from an OCI registry by using the CLI.
Complete the following steps:

* Specify the application package that you want to deploy in the `spec.applications` field in the `Device` resource:
+
[source,yaml]
----
apiVersion: flightctl.io/v1alpha1
kind: Device
metadata:
name: <device_name> <1>
spec:
[...]
applications:
- name: wordpress
image: quay.io/rhem-demos/wordpress-app:latest <2>
envVars: <3>
WORDPRESS_DB_HOST: <database_host>
WORDPRESS_DB_USER: <user_name>
WORDPRESS_DB_PASSWORD: <password>
[...]
----
<1> A user-defined name for the application that is used when the web console and the CLI list applications.
<2> A reference to an application package in an OCI registry.
<3> Optional. A list of key-value pairs that are passed to the deployment tool as environment variables or command line flags.

*Note:* For each application in the `applications` section of the device specification, you can find the corresponding device status information.
//Add verification?

////
Check if this is this relevant
* To deploy an unpackaged application from a Git repository, specify it in the device's `spec.applications[]` as follows:
+
[source,yaml]
----
apiVersion: flightctl.io/v1alpha1
kind: Device
metadata:
name: some_device_name
spec:
[...]
applications:
- name: wordpress
git:
url: https://github.com/flightctl/flightctl-demos.git
revision: v1.0
path: /wordpress
envVars:
WORDPRESS_DB_HOST: "mysql"
WORDPRESS_DB_USER: "user"
WORDPRESS_DB_PASSWORD: "password"
[...]
----

* To deploy an unpackaged application inline with the device specification, specify it in the device's `spec.applications[]` as follows:
+
[source,yaml]
----
apiVersion: flightctl.io/v1alpha1
kind: Device
metadata:
name: some_device_name
spec:
[...]
applications:
- name: wordpress
inline:
podman-compose.yaml: |
version: “3.7"
services:
wordpress:
image: “wordpress:latest”
[...]
envVars:
WORDPRESS_DB_HOST: "mysql"
WORDPRESS_DB_USER: "user"
WORDPRESS_DB_PASSWORD: "password"
[...]
----
////
73 changes: 73 additions & 0 deletions edge_manager/edge_mgr_monitor_device_resources.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
[#device-resources"]
= Monitoring device resources

You can set up resource monitors for device resources and create alerts when the use of resources crosses a defined threshold.
When the agent alerts the {rhem} service, the service sets the device status to `degraded` or `error`, depending on the severity level.
The service suspends the rollout of updates and alerts the user.

*Important:* Device resource monitoring does not replace observability solutions.
If your use case requires streaming logs and metrics from devices into an observability stack and the network bandwidth of the device allows this, see Adding Device Observability.
//TODO add link for observability in ACM

Resource monitors take the following parameters:

[%header,cols="1,4]
|===
|Parameter |Description
|`MonitorType` |The resource to monitor.
The `CPU`, `Memory` and `Disk` resources are currently supported.
|`SamplingInterval` |The interval in which the monitor samples use. Specified as a positive integer followed by a time unit: `s` for seconds, `m` for minutes, `h` for hours.
|`AlertRules` |A list of alert rules.
|`Path` |For `Disk` monitor only. The absolute path to the directory to monitor.
Utilization reflects the file system containing the path, even if the defined path is not a mount point.
|===

Alert rules take the following parameters:

[%header,cols="1,4]
|===
|Parameter |Description
|`Severity` |The severity of the alert rule can be `Info`, `Warning`, or `Critical`.
Only one alert rule is allowed per severity level and monitor.
|`Duration` |The duration that resource use is measured and averaged over when sampling. Specified as a positive integer followed by a time unit: `s` for seconds, `m` for minutes, `h` for hours. The duration must be smaller than the sampling interval.
|`Percentage` |The usage threshold that triggers the alert, as percentage value. The value ranges from 0 to 100 without the % sign.
|`Description` |A human-readable description of the alert. Add details about the alert to help with debugging.
By default, the alert description is `load is above >% for more than`.
|===

[#device-resources-cli"]
== Monitoring device resources using the CLI

Monitor the resources of your device through the CLI, providing you with the tools and commands to track performance and troubleshoot issues.
Complete the following steps:

* Add resource monitors in the `spec.resources` section of the device specification. For example, add the following monitor for your disk:
+
[source,yaml]
----
apiVersion: flightctl.io/v1alpha1
kind: Device
metadata:
name: <device_name>
spec:
[...]
resources:
- monitorType: Disk
samplingInterval: 5s <1>
path: /application_data <2>
alertRules:
- severity: Warning <3>
duration: 30m
percentage: 75
description: Disk space for application data is >75% full for over 30m.
- severity: Critical <4>
duration: 10m
percentage: 90
description: Disk space for application data is >90% full over 10m.
[...]
----
<1> Samples usage every 5 seconds.
<2> Checks disk use on the filesystem associated with the `/applications_data` path.
<3> Triggers a warning if the average use exceeds 75% for more than 30 minutes
<4> Triggers a critical alert if the average use exceeds 90% for over 10 minutes.
3 changes: 3 additions & 0 deletions edge_manager/main.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ include::edge_mgr_enroll_devices.adoc[leveloffset=+3]
include::edge_mgr_view_devices.adoc[leveloffset=+3]
include::edge_mgr_label_devices.adoc[leveloffset=+3]
include::edge_mgr_update_labels_on_devices.adoc[leveloffset=+3]
include::edge_mgr_device_lifecycle_hooks.adoc[leveloffset=+3]
include::edge_mgr_monitor_device_resources.adoc[leveloffset=+3]
include::edge_mgr_manage_apps_devices.adoc[leveloffset=+2]

0 comments on commit 11d4528

Please sign in to comment.