A set of scripts to run basic checks on an OpenShift cluster. PRs welcome!
⚠️ This is an unofficial tool, don't blame us if it breaks your cluster
$ ./openshift-checks.sh -h
Usage: openshift-checks.sh [-h]
This script will run a minimum set of checks to an OpenShift cluster
Available options:
-h, --help Print this help and exit
-v, --verbose Print script debug info
-l, --list Lists the available checks
-s <script>, --single <script> Executes only the provided script
--no-info Disable cluster info commands (default: enabled)
--no-checks Disable cluster check commands (default: enabled)
--no-ssh Disable ssh-based check commands (default: enabled)
--prechecks path/to/install-config.yaml Executes only prechecks (default: disabled)
--results-only Only shows pass/fail results from checks (default: disabled)
With no options, it will run all checks and info commands with no debug info
There is an automated container build configured with the content of this repository main branch available at quay.io/rhsysdeseng/openshift-checks.
You can use it with your own kubeconfig
file and with the parameters required
as:
$ podman run -it --rm -v /home/foobar/kubeconfig:/kubeconfig:Z -e KUBECONFIG=/kubeconfig quay.io/rhsysdeseng/openshift-checks:latest -h
You can even create a handy alias:
$ alias openshift-checks="podman run -it --rm -v /home/foobar/kubeconfig:/kubeconfig:Z -e KUBECONFIG=/kubeconfig quay.io/rhsysdeseng/openshift-checks:latest"
Then, simply run it as:
$ openshift-checks -s info/00-clusterversion
Using default/api-foobar-example-com:6443/system:admin context
...
Note: If your kubeconfig file doesn't have the proper permissions you may get the error "KUBECONFIG not set". In that case verify that the kubeconfig file has read permissions for the user that is used inside the container or just
chmod o+r kubeconfig
in your host.
You can build your own container with the included Containerfile:
$ podman build --tag foobar/openshiftchecks .
STEP 1: FROM registry.access.redhat.com/ubi8/ubi:latest
...
$ podman push foobar/openshiftchecks
...
Then, run it by replacing
quay.io/repository/rhsysdeseng/openshift-checks:latest
with your own image
such as foobar/openshiftchecks:latest
:
$ podman run -it --rm -v /home/foobar/kubeconfig:/kubeconfig:Z -e KUBECONFIG=/kubeconfig foobar/openshiftchecks:latest -h
Usage: openshift-checks.sh [-h]
...
The checks can be scheduled to run periodically in an OpenShift cluster by creating a CronJob.
Check the cronjob.yaml example.
The openshift-checks.sh
script is just a wrapper around bash scripts located
in the info, checks or ssh directories.
Check each script and its description in checks
Note: This file is autogenerated when running: ./scripts/update-checksmd > checks.md
Environment variable | Default value | Description |
---|---|---|
INTEL_IDS | 8086:158b | Intel device IDs to check for firmware. Can be overridden for non-supported NICs. |
OCDEBUGIMAGE | registry.redhat.io/rhel8/support-tools:latest | Used by oc debug . |
OSETOOLSIMAGE | registry.redhat.io/openshift4/ose-tools-rhel8:latest | Used by oc debug in ethtool-firmware-version |
RESTART_THRESHOLD | 10 | Used by the restarts script. |
THRASHING_THRESHOLD | 10 | Used by the port-thrashing script. |
PARALLELJOBS | 1 | By default, all the oc debug commands run in a serial fashion, unless this variable is set >1 |
OVN_MEMORY_LIMIT | 5000 | Used by the ovn-pods-memory-usage script to set the maximum memory LIMIT (in Mi) to trigger the warning. |
The current script checks only the firmware version of the SRIOV operator supported NICs (in 4.6).
You can add your own device ID if needed by modifying the script (hint, the
variable is called IDS
and the format is vendorID_A:deviceID_A vendorID_B:deviceID_B
)
Add a new script to get some information or to perform some check in the proper folder and create a pull request.
Make sure you include a # description: $TEXT
that will be later used to populate the checks.md
file with the description.
You can pipe the script to mail
and if there are any errors, an email will be
sent.
First, you can configure postfix (already included in RHEL8) as relay host (see https://access.redhat.com/solutions/217503). As an example:
- Append the following settings in
/etc/postfix/main.cf
:
myhostname = kni1-bootstrap.example.com
relayhost = smtp.example.com
- Restart the postfix service:
sudo systemctl restart postfix
- Test it:
echo "Hola" | mail -s 'Subject' [email protected]
Then, run the script as:
/openshift-checks.sh > /tmp/oc-errors 2>&1 || mail -s "Something has failed" [email protected] < /tmp/oc-errors
As a bonus, you can include this in a cronjob for periodic checks.
This requires the installation of python requirements in the requirements.txt
file, recommended within a virtual environment, once those are installed execute:
./risu.py -l
To automatically execute the tests against the current environment and generate two output files:
osc.json
osc.html
When loaded over a web server, the HTML file will pull the json
file over AJAX and represent the results of the tests in a graphical way: