-
-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are the advantages of gathering facts beforehand ? #805
Comments
Thank you for writing this up @julienlavergne - this has recently been on my mind also. I'm going to document the context of why this is the case below (this will be long I think :)), and give my thoughts. Firstly, let's split the problem into two distinct parts:
Historical ContextPrior to v2, pyinfra relied on line-number ordering and pre-execution fact gathering to achieve it's high performance. The reason for this is that operations were generated on hosts sequentially, rather than in parallel. As facts were required, they were gathered in parallel on all hosts (whether or not they need the specific fact). For example: # inventory.py
web_servers = ["web-01", "web-02"]
db_servers = ["db-01", "db-02"]
# deploy.py
if host in inventory.get_group("web_servers"):
files.file(path="web-file")
else:
files.file(path="db-file") To generate the commands to execute the following would happen (v0.x, v1.x):
Because facts were loaded in parallel (2.a), each iteration of 2 got quicker and quicker as most/all facts were pre-cached for the current host. This is why facts were(are) gathered before execution. The above example also highlights why operation order is generated from line numbers - because the same code (
By taking operation order as they are called this would not be possible. Note: this does not affect deploys against a single host target, where operation call order would work. What can we do now?Back to v2 and your points above, I'll split my thoughts into the two problems above: Operation ordering
Unfortunately I don't think we can avoid this without breaking operation execution flow, particularly where there are multiple code paths for different hosts involved in a deploy. The line/stack ordering enforces "correct" ordering - except loops and context processors. The general assumption being that deploy files are generally "simplified Python" consisting of operation calls, conditional statements and functions. I'm not a fan of this gotcha and would be keen to investigate alternatives! While I don't see a way to remove the line ordering mechanism, I would like to have it automatically handle loops and context processors if possible. In v0.x pyinfra would modify the Fact gathering
Because of the operation ordering issue, it's still not possible to provide output from an operation immediately. The deploy code must be run once before any operations are actually executed to generate the order, which unfortunately makes it impossible to have the output included.
I would absolutely love to remove this, it's a real pain and a massive gotcha. v2 makes it entirely possible to do by having operations (re)collect facts at execution time. The only drawback is the list of changes pre-execution may not be correct; ie if you do a dry run deploy first you expect the number of commands proposed to match those executed, and collecting facts at execution may break this. One option could be to display "up to X" commands per operation, because we can make reasonable assumptions that certain facts will change (files) and others will not (system OS). ThoughtsCollecting some thoughts below on the more general philosophy of pyinfra and how it works. I do think the "dry run" pyinfra offers is a powerful tool that has a lot of unused potential. On a basic level pyinfra could support terraform style approval steps. Even more interesting would be the idea of creating a diff file that can then be moved somewhere else for execution - pyinfra needn't even be the tool doing the execution. The whole two-stage deploy mechanism has consistently provided complexity over the last 7(!) years, but has also enabled writing almost-normal Python code to generate operations that execute in a similar way to tools like Ansible. I've yet to encounter something that wasn't possible (but have seen things not possible in other tools). Examples & documentation would help a lot here I think. Today pyinfra seems to be a hybrid of a Ansible/SaltStack-like mostly-state-base ddeployment tool and Fabric/Parallel-SSH command execution tool. This is definitely both an advantage in terms of high flexibility but also a disadvantage because it comes with some gotchas that make it "almost like Python" at times. I hope this provides some context, please let me know if anything doesn't make sense and would love to hear thoughts from any pyinfra users on the above. Ultimately I think any changes to these systems are on the table assuming enough support and technical possibility :) |
Coming back on your example:
The flow I am referring to is the following: In parallel on all 4 servers:
If I take the different points you mention, I think they can be supported with this flow:
|
The problem with doing this is it breaks a number of scenarios in which the order across multiple costs is essential. Really anything involving a control / worker node setup. For example bootstrapping an Elasticsearch cluster might look like:
There are many similar examples of this where operations on one node must complete before operations on another. The operation order as it currently exists handles this. |
Pyinfra is not the right tool for that. I am doing similar things on my side, and there is better ways to do it than pyinfra. Setup/configuration and actually deploying/running an application are two different things that comes with their own challenges and pyinfra is not equipped at all to deal with the challenges of bringing a cluster of applications up. Typically, bringing up an ES cluster need to deal with scenarios where you add/remove nodes to the cluster, handle failure cases if the master fails, rollback to a previous working state etc.. Regardless, if such a case happen, pyinfra could expose an interface to synchronize the execution across all hosts at any time during execution. asyncio does provide the necessary synchronization mechanisms for that. |
I can absolutely say pyinfra works incredibly well for the ES use case, it's one of the places it was first used at scale within a company environment to manage tens of large (300+ node) clusters. I am aware of its use in production today for all sorts of setups including ES, MariaDB and Kubernetes clusters. The operation ordering is relied upon to provide consistent idempotent deployment of these services. Because of this ops like adding/removing nodes can be achieved simply by updating the inventory and re-running. |
To come back to the original question. What can be done when splitting the execution in 2 steps that cannot be done without this split ?
|
I think at least the following would be difficult with single-run execution:
The 'same'[1] operation cannot be grouped in the output (a)We could use some command line ANSI escape code magic to update previously printed operations for hosts that have now also hit them. This might look something like:
...2 seconds later...
The 'same'[1] operation cannot be interactively approved/disapproved as a group (b)While not ideal, this could be solved with a synchronization key: burn_server_op_name = 'Set server on fire'
if host in inventory.get_group("flammable_nodes"):
pyinfra.operations.server.burn_server(name=burn_server_op_name, appoval_needed=True)
else:
pyinfra.operations.skip(name=burn_server_op_name)
The key could just be Inter-host dependent operations (c)As @julienlavergne pointed out this could be solved using a synchronization API: if host in inventory.get_group("control_nodes"):
# control stuff (1)
else:
# data stuff
pyinfra.wait(key = "Wait for control-1")
if host not in inventory.get_group("control_nodes"):
# data stuff that depends on (1) ("start ES on the three data nodes") asyncio.Event doesn't need I think using a key is fine, but we could move per-host code into the same context like so: control_done_event = asyncio.Event()
async def deploy(host):
if host in inventory.get_group("control_nodes"):
# control stuff (1)
control_done_event.set()
else:
# data stuff
await control_done_event.wait()
# data stuff that depends on (1) ("start ES on the three data nodes") Where this file is executed once and then await asyncio.gather(deploy(host) for host in hosts_to_execute_magic_variable) Although I don't know how pyinfra would be able to figure out which host an operation is being executed for. Generating a diff file for later execution (d)For this one I cannot really think of a solution. [1] I guess operations called with different arguments are considered the same. P.S. Amazing project, thanks for all the hard work! ❤️ |
Just ran into the limitation of the two-phase deployment as well trying to use a custom fact which relies on a tool being installed in the same deployment here (https://github.com/mvgijssel/setup/pull/288/files#r1205699420). Also was expecting If you are contemplating changing the execution strategy maybe you need to take a look at how Pulumi does it. It's similar to Terraform, but then also writeable in Python 🎉. Basically turning all operations into promises and executing stuff once all dependencies are resolved, giving you a N-phase deployment! |
Just filed an issue about the fact not being updated properly by the |
…cution This is specifically to catch cases where there is no current remote state but it is likely that this may change due to hidden side effects from other operations. See: #805 for more context.
I’m convinced, it has become clear the disadvantages and confusion of the current execution strategy outweigh the advantages. To remedy this I am proposing that pyinfra v3 will switch to executing operations and facts live. It will also be able to retain diff functionality simply by collecting facts twice if so desired. I’ve started working on this in #996, is working but tests/etc are not. |
v3 implements this, beta is out and pending full release imminently. |
Is your feature request related to a problem? Please describe
I have been modifying a local version of pyinfra in order to solve different issues (some of them posted as github issues here), and I came to the conclusion that there is not much advantages in gathering facts before operations.
Actually, most of the limitations and weird behaviors I see would be solved by gathering facts along the way.
I would be interested to discuss to pros and cons and contribute in modifying the way pyinfra work if necessary.
Describe the solution you'd like
I have a small list of the advantages of gathering facts before operations that requires them:
preserve_loop_order
magic anymore. It was anyway very counter-intuitive to have operations in loops not executed in the expected order.assume_present
arguments becomes unnecessary since facts will reflect the correct state of the machine right before an operation is performed.server.shell
.The text was updated successfully, but these errors were encountered: