Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Add a hostvar to event_data #1249

Open
AlanCoding opened this issue May 31, 2023 · 0 comments
Open

[proposal] Add a hostvar to event_data #1249

AlanCoding opened this issue May 31, 2023 · 0 comments
Labels
needs_triage New item that needs to be triaged

Comments

@AlanCoding
Copy link
Member

In AWX, we receive events from ansible-runner and save them to the database. When applicable, those events are linked to its corresponding Host record. This creates a problem because a database relational field requires the primary key of the object, and ansible-runner only provides the host name.

Current solution

Right now we build a host_map variable locally when we write the inventory file (before we start the job).

https://github.com/ansible/awx/blob/d89cad0d9edd2baaa01f668d8ed12eca62ee1a48/awx/main/tasks/jobs.py#L318

Why is this non-ideal? Because the host_map variable is potentially very large, as we expect inventories of ~50,000 hosts in real-world situations. More importantly, the lifetime of this variable must persist for as long as events are generated (until the end of the job).

Why now? Because a great deal of wanted architectural changes dictate that we do the ansible-runner process step independently of the other steps (like transmit). This means that we want to avoid keeping large long-lived variables in memory as we consume these events.

Proposed solution

The host id is not particularly challenging to find, an we already set it automatically on every host in the inventory using the remote_host_id variable.

https://github.com/ansible/awx/blob/d89cad0d9edd2baaa01f668d8ed12eca62ee1a48/awx/main/models/inventory.py#L373

If you look in the callback, there are many places where we add the host name to the event data using result._host.get_name().

host_start = self._host_start.get(result._host.get_name())

This proposal is to allow an additional configuration to add a host variable into the event_data. Then everywhere the callback obtains the host name, it also obtains that variable (if it exists) in the host variables. This will allow us to much more easily replay job events starting from some non-zero line number, which is needed to recover from restarts without losing jobs.

@github-actions github-actions bot added the needs_triage New item that needs to be triaged label May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_triage New item that needs to be triaged
Projects
None yet
Development

No branches or pull requests

1 participant