-
Notifications
You must be signed in to change notification settings - Fork 103
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
https://issues.redhat.com/browse/ACM-17505--Add Apps, device lifecycl…
…e hooks, and resource monitoring
- Loading branch information
1 parent
493fcb9
commit 11d4528
Showing
4 changed files
with
334 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
[#device-lifecycle-hooks"] | ||
= Device lifecycle hooks | ||
|
||
The {rhem} agent can run user-defined commands at specific points in the device lifecycle by using device lifecycle hooks. | ||
For example, you can add a shell script to your operating system images that backs up your application data. | ||
You can then specify that this script must run and complete successfully before the agent can start updating the operating system. | ||
|
||
The following device lifecycle hooks are supported: | ||
|
||
[%header,cols="1,3"] | ||
|=== | ||
|Lifecycle Hook |Description | ||
|`beforeUpdating` |The hook is called after the agent completed preparing for the update but before changing the operating system. | ||
If an action in this hook returns with a failure, the agent cancels the update. | ||
|
||
|`afterUpdating` |The hook is called after the agent wrote the update to disk. | ||
If an action in this hook returns with a failure, the agent cancels and rolls back the update. | ||
|
||
|`beforeRebooting` |The hook is called before the system reboots. The agent blocks the reboot until the action completes or times out. | ||
If any action in this hook returns with a failure, the agent cancels and rolls back the update. | ||
|
||
|`afterRebooting` |The hook is called when the agent first starts after a reboot. | ||
If any action in this hook returns with a failure, the agent reports the failure but continues starting up. | ||
|=== | ||
|
||
//For a state diagram defining when each device lifecycle hook is called by the agent, see the ADD LINK[Device API statuses] section. | ||
//API docs are deprecated in ACM so I can't include this. Do we need to add more info this doc? | ||
|
||
[#rule-files"] | ||
== Rule files | ||
|
||
You can define device lifecycle hooks by adding rule files to one of the following locations in the device filesystem: | ||
|
||
* Rules in the `/usr/lib/flightctl/hooks.d/<lifecycle_hook_name>/` drop-in directory are read-only. | ||
To add rules to the `/usr` directory, you must add them to the operating system image during image building. | ||
* Rules in the `/etc/flightctl/hooks.d/<lifecycle_hook_name>/` drop-in directory are read-writable. | ||
You can update the rules at runtime by using several methods. | ||
//For more information, see Image building and The operating system configuration | ||
//TODO add link and verify title of the section | ||
|
||
When creating and placing the files, you must consider the following practices: | ||
|
||
* The name of the rule must be all lower case. | ||
* If you define rules in both locations, the rules are merged. | ||
* If you add more than one rule files to a lifecycle hook directory, the files are processed in lexical order of the file names. | ||
* If you define files with identical file names in both locations, files in the `/etc` folder takes precedence over files of the same name in the `/usr` folder. | ||
A rule file is written in YAML format and contains a list of one or more actions. | ||
An action can be an instruction to run an external command. | ||
|
||
When you specify many actions for a hook, the actions are performed in sequence, finishing one action before starting the next. | ||
If an action returns with a failure, the following actions are skipped. | ||
|
||
A `run` action takes the following parameters: | ||
|
||
[%header,cols="1,3"] | ||
|=== | ||
|Parameter |Description | ||
|`Run` |The absolute path to the command to run, followed by any flags or arguments, for example `/usr/bin/nmcli connection reload`. | ||
The command is not executed in a shell, so you cannot use shell variables, such as `$PATH` or `$HOME`, or chain commands, such as `\|` or `;`. | ||
However, if necessary, you can start a shell by specifying the shell as command to run, for example `/usr/bin/bash -c 'echo $SHELL $HOME $USER'`. | ||
|
||
|`EnvVars` |Optional. A list of key-value pairs to set as environment variables for the command. | ||
|
||
|`WorkDir` |Optional. The directory the command is run from. | ||
|
||
|`Timeout` |Optional. The maximum duration allowed for the action to complete. | ||
Specify the duration as a single positive integer followed by a time unit. | ||
The `s`, `m`, and `h` units are supported for seconds, minutes, and hours, respectively. | ||
|
||
|`If` |Optional. A list of conditions that must be true for the action to be run. | ||
If not provided, actions run unconditionally. | ||
|=== | ||
|
||
By default, actions are performed every time the hook is triggered. | ||
However, for the `afterUpdating` hook, you can use the `If` parameter to add conditions that must be true for an action to be performed. | ||
Otherwise the action is skipped. | ||
|
||
For example, to run an action only if a given file or directory changes during the update, you can define a path condition that takes the following parameters: | ||
|
||
[%header,cols="1,3"] | ||
|=== | ||
|Parameter |Description | ||
|`Path` |An absolute path to a file or directory that must change during the update as condition for the action to be performed. | ||
Specify paths by using forward slashes (`/`). | ||
If the path is to a directory, it must end with a forward slash (`/`). | ||
If you specify a path to a file, the file must have changed to satisfy the condition. | ||
If you specify a path to a directory, a file in that directory or any of its subdirectories must have changed to satisfy the condition. | ||
|`Op` |A list of file operations, such as `created`, `updated`, and `removed`, to limit the type of changes to the specified path as condition for the action to be performed. | ||
|=== | ||
|
||
If you specify a path condition for an action in the `afterUpdating` hook, you have the following variables that you can include in arguments to your command and are replaced with the absolute paths to the changed files: | ||
|
||
[%header,cols="1,3"] | ||
|=== | ||
|Variable |Description | ||
|`{{ Path }}` |The absolute path to the file or directory specified in the path condition. | ||
|
||
|`{{ Files }}` |A space-separated list of absolute paths of the files that changed during the update and are covered by the path condition. | ||
|
||
|`{{ CreatedFiles }}` |A space-separated list of absolute paths of the files that were created during the update and are covered by the path condition. | ||
|
||
|`{{ UpdatedFiles }}` |A space-separated list of absolute paths of the files that were updated during the update and are covered by the path condition. | ||
|
||
|`{{ RemovedFiles }}` |A space-separated list of absolute paths of the files that were removed during the update and are covered by the path condition. | ||
|=== | ||
|
||
The {rhem} agent includes a built-in set of rules defined in `/usr/lib/flightctl/hooks.d/afterupdating/00-default.yaml`. | ||
The following commands are executed if the certain files are changed: | ||
|
||
[%header,cols="2,2,4"] | ||
|=== | ||
|File |Command|Description | ||
|`/etc/systemd/system/` |`systemctl daemon-reload` |Changes to `systemd` units are activated by signaling the `systemd` daemon to reload the `systemd` manager configuration. | ||
This reruns all generators, reloads all unit files, and re-creates the entire dependency tree. | ||
|
||
|`/etc/NetworkManager/system-connections/` |`nmcli conn reload` |Changes to Network Manager system connections are activated by signaling Network Manager to reload all connections | ||
//TODO check if Network Manager has a different brand-approved name | ||
|
||
|`/etc/firewalld/` |`firewall-cmd --reload` |Changes to the permanent configuration of `firewalld` are activated by signaling `firewalld` to reload firewall rules as new runtime configuration. | ||
|=== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
[#manage-apps"] | ||
= Managing applications | ||
|
||
You can deploy, update, or remove applications on a device by updating the list of applications in the device specification. | ||
When the {rhem} agent checks in and detects the change in the specification, the agent downloads any new or updated application packages and images from an OCI-compatible registry. | ||
Then, the agent deploys the packages to the appropriate application runtime or removes them from that runtime. | ||
|
||
The {rhem} supports `podman-compose` as the application runtime and format. | ||
|
||
[#prereqs"] | ||
== Prerequisites | ||
|
||
* You must install the {rhem} CLI. | ||
* You must log in to the {rhem} service. | ||
* You must install Podman Compose. See link:https://podman-desktop.io/docs/compose/setting-up-compose[Setting up Compose]. | ||
[#create-apps"] | ||
== Creating applications | ||
|
||
You can create an Open Container Initiative (OCI) registry application package. | ||
Complete the following steps: | ||
|
||
. Define the functionality of the application with the Podman Compose specification. | ||
|
||
+ | ||
[source,bash] | ||
---- | ||
FROM scratch <1> | ||
COPY podman-compose.yaml /podman-compose.yaml | ||
LABEL appType="compose" <2> | ||
---- | ||
<1> Embed the compose file in a `scratch` container. | ||
<2> Add the `appType=compose` label. | ||
|
||
. Build and push the container to your OCI registry. | ||
|
||
. Specify the application package in `spec.applications` field of the `Device` resource: | ||
|
||
+ | ||
[source,yaml] | ||
---- | ||
apiVersion: flightctl.io/v1alpha1 | ||
kind: Device | ||
metadata: | ||
name: <device_name> | ||
spec: | ||
[...] | ||
applications: | ||
- name: podman-compose.yaml | ||
[...] | ||
---- | ||
|
||
[#deploy-apps"] | ||
== Deploying applications on a device using the CLI | ||
|
||
Deploy an application package to a device from an OCI registry by using the CLI. | ||
Complete the following steps: | ||
|
||
* Specify the application package that you want to deploy in the `spec.applications` field in the `Device` resource: | ||
+ | ||
[source,yaml] | ||
---- | ||
apiVersion: flightctl.io/v1alpha1 | ||
kind: Device | ||
metadata: | ||
name: <device_name> <1> | ||
spec: | ||
[...] | ||
applications: | ||
- name: wordpress | ||
image: quay.io/rhem-demos/wordpress-app:latest <2> | ||
envVars: <3> | ||
WORDPRESS_DB_HOST: <database_host> | ||
WORDPRESS_DB_USER: <user_name> | ||
WORDPRESS_DB_PASSWORD: <password> | ||
[...] | ||
---- | ||
<1> A user-defined name for the application that is used when the web console and the CLI list applications. | ||
<2> A reference to an application package in an OCI registry. | ||
<3> Optional. A list of key-value pairs that are passed to the deployment tool as environment variables or command line flags. | ||
|
||
*Note:* For each application in the `applications` section of the device specification, you can find the corresponding device status information. | ||
//Add verification? | ||
|
||
//// | ||
Check if this is this relevant | ||
* To deploy an unpackaged application from a Git repository, specify it in the device's `spec.applications[]` as follows: | ||
+ | ||
[source,yaml] | ||
---- | ||
apiVersion: flightctl.io/v1alpha1 | ||
kind: Device | ||
metadata: | ||
name: some_device_name | ||
spec: | ||
[...] | ||
applications: | ||
- name: wordpress | ||
git: | ||
url: https://github.com/flightctl/flightctl-demos.git | ||
revision: v1.0 | ||
path: /wordpress | ||
envVars: | ||
WORDPRESS_DB_HOST: "mysql" | ||
WORDPRESS_DB_USER: "user" | ||
WORDPRESS_DB_PASSWORD: "password" | ||
[...] | ||
---- | ||
|
||
* To deploy an unpackaged application inline with the device specification, specify it in the device's `spec.applications[]` as follows: | ||
+ | ||
[source,yaml] | ||
---- | ||
apiVersion: flightctl.io/v1alpha1 | ||
kind: Device | ||
metadata: | ||
name: some_device_name | ||
spec: | ||
[...] | ||
applications: | ||
- name: wordpress | ||
inline: | ||
podman-compose.yaml: | | ||
version: “3.7" | ||
services: | ||
wordpress: | ||
image: “wordpress:latest” | ||
[...] | ||
envVars: | ||
WORDPRESS_DB_HOST: "mysql" | ||
WORDPRESS_DB_USER: "user" | ||
WORDPRESS_DB_PASSWORD: "password" | ||
[...] | ||
---- | ||
//// |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
[#device-resources"] | ||
= Monitoring device resources | ||
|
||
You can set up resource monitors for device resources and create alerts when the use of resources crosses a defined threshold. | ||
When the agent alerts the {rhem} service, the service sets the device status to `degraded` or `error`, depending on the severity level. | ||
The service suspends the rollout of updates and alerts the user. | ||
|
||
*Important:* Device resource monitoring does not replace observability solutions. | ||
If your use case requires streaming logs and metrics from devices into an observability stack and the network bandwidth of the device allows this, see Adding Device Observability. | ||
//TODO add link for observability in ACM | ||
|
||
Resource monitors take the following parameters: | ||
|
||
[%header,cols="1,4] | ||
|=== | ||
|Parameter |Description | ||
|`MonitorType` |The resource to monitor. | ||
The `CPU`, `Memory` and `Disk` resources are currently supported. | ||
|`SamplingInterval` |The interval in which the monitor samples use. Specified as a positive integer followed by a time unit: `s` for seconds, `m` for minutes, `h` for hours. | ||
|`AlertRules` |A list of alert rules. | ||
|`Path` |For `Disk` monitor only. The absolute path to the directory to monitor. | ||
Utilization reflects the file system containing the path, even if the defined path is not a mount point. | ||
|=== | ||
|
||
Alert rules take the following parameters: | ||
|
||
[%header,cols="1,4] | ||
|=== | ||
|Parameter |Description | ||
|`Severity` |The severity of the alert rule can be `Info`, `Warning`, or `Critical`. | ||
Only one alert rule is allowed per severity level and monitor. | ||
|`Duration` |The duration that resource use is measured and averaged over when sampling. Specified as a positive integer followed by a time unit: `s` for seconds, `m` for minutes, `h` for hours. The duration must be smaller than the sampling interval. | ||
|`Percentage` |The usage threshold that triggers the alert, as percentage value. The value ranges from 0 to 100 without the % sign. | ||
|`Description` |A human-readable description of the alert. Add details about the alert to help with debugging. | ||
By default, the alert description is `load is above >% for more than`. | ||
|=== | ||
|
||
[#device-resources-cli"] | ||
== Monitoring device resources using the CLI | ||
|
||
Monitor the resources of your device through the CLI, providing you with the tools and commands to track performance and troubleshoot issues. | ||
Complete the following steps: | ||
|
||
* Add resource monitors in the `spec.resources` section of the device specification. For example, add the following monitor for your disk: | ||
+ | ||
[source,yaml] | ||
---- | ||
apiVersion: flightctl.io/v1alpha1 | ||
kind: Device | ||
metadata: | ||
name: <device_name> | ||
spec: | ||
[...] | ||
resources: | ||
- monitorType: Disk | ||
samplingInterval: 5s <1> | ||
path: /application_data <2> | ||
alertRules: | ||
- severity: Warning <3> | ||
duration: 30m | ||
percentage: 75 | ||
description: Disk space for application data is >75% full for over 30m. | ||
- severity: Critical <4> | ||
duration: 10m | ||
percentage: 90 | ||
description: Disk space for application data is >90% full over 10m. | ||
[...] | ||
---- | ||
<1> Samples usage every 5 seconds. | ||
<2> Checks disk use on the filesystem associated with the `/applications_data` path. | ||
<3> Triggers a warning if the average use exceeds 75% for more than 30 minutes | ||
<4> Triggers a critical alert if the average use exceeds 90% for over 10 minutes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters