Ansible remediation with one run #10479

rumch-se · 2023-04-21T08:55:48Z

rumch-se
Apr 21, 2023

We (at SUSE) found that it is not possible to remediate a test system by executing the command

ansible-playbook -i "localhost", -c local /usr/share/scap-security-guide/ansible/sle15-playbook-stig.yml

where sle15-playbook-stig.yml is from the last official release, or previous one(s)

The main reason for that is the fact that whenever a task/rule in ansible fails, ansible terminates the play (the bash remediation does not work in this way). We found, for example, that the rules package_pam_apparmor_installed and dir_system_commands_root_owned terminate the execution of the whole ansible playbook. Maybe there are other rules for other vendors.

Because of that ansible playbook has to be executed in portions - rule by rule, or artificially a loop to be created which executes 2 or 3 rules per step. This way of execution makes remediation process very long - for example ~2.5/3 hours.

I have a concern about execution of ansible remediation in a places when we have to remediate many hosts for example in a data center, because this will be a time consuming task.

I have the following questions:

Did somebody execute bash remediation as it is in the release on a test machine to see how the remediation will be applied, and after to make the same exercise with the ansible playbook on another test machine?
Does somebody know how to apply ansible remediation with one shot/run?
Do you think that we have to add for ALL rules ignore_errors=true for ALL ansible snippet to be able to have a functionality similar to bash?
Do you think that it will be a good idea integration tests to be added to the repo. These tests will execute the whole ansible / bash remediation scripts on the test machines.

Rumen

marcusburghardt · 2023-04-25T09:07:54Z

marcusburghardt
Apr 25, 2023
Maintainer

Hello @marcusburghardt

We (at SUSE) found that it is not possible to remediate a test system by executing the command

ansible-playbook -i "localhost", -c local /usr/share/scap-security-guide/ansible/sle15-playbook-stig.yml

where sle15-playbook-stig.yml is from the last official release, or previous one(s)

The main reason for that is the fact that whenever a task/rule in ansible fails, ansible terminates the play (the bash remediation does not work in this way). We found, for example, that the rules package_pam_apparmor_installed and dir_system_commands_root_owned terminate the execution of the whole ansible playbook. Maybe there are other rules for other vendors.

It is expected in Ansible to terminate a Playbook if any task on it fails. It usually means the Playbook is not robust enough to treat all cases it is exposed to. If this is happening, the correct approach is to investigate the failure, the reason and propose a PR updating the tasks to fix the behavior. A Playbook should execute completely without any unexpected failure (expected failures can be treated in the Playbook itself if necessary). Tasks errors should never be ignored. They usually show opportunities for improvements.

Because of that ansible playbook has to be executed in portions - rule by rule, or artificially a loop to be created which executes 2 or 3 rules per step. This way of execution makes remediation process very long - for example ~2.5/3 hours.

It is not be expected to use the Ansible remediation in this painful this way. We should fix the errors instead.

I have a concern about execution of ansible remediation in a places when we have to remediate many hosts for example in a data center, because this will be a time consuming task.

I have the following questions:

Did somebody execute bash remediation as it is in the release on a test machine to see how the remediation will be applied, and after to make the same exercise with the ansible playbook on another test machine?

Yes. We always do that during the review of each PR. In some cases, testing a rule alone shows all green but testing it in a profile context, including many rules, some issues can be revealed. So, in our CI tests we also test different profiles for all PRs. You can also do that locally via automatus.

Does somebody know how to apply ansible remediation with one shot/run?

This is the common way to execute it. A good Playbook should finish properly without unexpected failures.

Do you think that we have to add for ALL rules ignore_errors=true for ALL ansible snippet to be able to have a functionality similar to bash?

I don't. This would break the logic of many tasks and would make Playbooks totally useless. ignore_errors: true or failed_when: false should be used consciously and only in very specific cases to treat expected failures.

Do you think that it will be a good idea integration tests to be added to the repo. These tests will execute the whole ansible / bash remediation scripts on the test machines.

We already have this in our CI tests, but not for all profiles and distros. You can take a look in the testing-farm tests, for example.
The rule you mentioned at the beginning seems to be specific for SUSE and possibly Ubuntu, so it is likely not executed in some tests using CentOS or Fedora, otherwise any error would be caught. Maybe it would be interesting if you propose extensions of existing tests for SUSE. I am not sure if this can be done via PRs, but we can check this if you are willing to work on extending existing CI tests for other distros. That would be nice for the project.

Rumen

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ansible remediation with one run #10479

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Ansible remediation with one run #10479

rumch-se Apr 21, 2023

Replies: 1 comment

marcusburghardt Apr 25, 2023 Maintainer

rumch-se
Apr 21, 2023

marcusburghardt
Apr 25, 2023
Maintainer