Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow init decorator #645

Merged
merged 17 commits into from
Sep 24, 2024
Merged

Workflow init decorator #645

merged 17 commits into from
Sep 24, 2024

Conversation

dandavison
Copy link
Contributor

@dandavison dandavison commented Sep 18, 2024

Closes #600
Supersedes #634

This PR makes it possible to define an __init__ method on a workflow class such that the workflow input is passed to __init__ (in exactly the same way as it is passed to the @workflow.run method, i.e. as one unnamed positional argument per workflow input payload).

Users make use of this feature by using a decorator. Here's a simple example:

@workflow.defn
class WorkflowRunSeesWorkflowInitWorkflow:
    @workflow.init
    def __init__(self, arg: str) -> None:
        self.value = arg

    @workflow.run
    async def run(self, arg: str):
        return f"hello, {self.value}"

However, that example does not show the real purpose of this feature, which is to allow update and signal handlers to access the workflow input, even if they happen to execute in the first workflow task (in which case they will start executing before the main workflow method). Previously, this would have required the handlers to use workflow.wait_condition. See the test test_update_in_first_wft_sees_workflow_init added in this PR.

This PR supersedes #634. That PR attempted to introduce this feature without introducing a decorator, i.e., by inspection of the method signature. However, we have decided that that approach could lead to surprising behavior, especially in situations involving workflow class inheritance.

@dandavison dandavison requested a review from a team as a code owner September 18, 2024 15:14
@dandavison dandavison mentioned this pull request Sep 18, 2024
) -> None:
self._random.seed(job.randomness_seed)

def _make_workflow_input(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there value in separating out this single-use method? I don't mind if just for clarity, just confirming there is not another reason

Copy link
Contributor Author

@dandavison dandavison Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I strongly object to code styles that fail to break out helper functions. It doesn't matter at all to me whether they are used once or multiple times at this point in the codebase. Some reasons (could write much more...):

  • Abstraction: if there are (say) 10 or more lines within a function that create their own independent subset of local function state with the sole aim of computing a value, then it is much easier to read and reason about if that is broken out as a helper function. There is no need for the reader to understand the computation if they don't want to; it's one of the main reasons functions exist. As an extra benefit, the helper function is an opportunity to choose a good descriptive name for the unit of functionality. It also gives a target for unit testing.

  • Even if it were desirable as an alternative to a helper functions (I don't think it is) Python doesn't have block-level scope; it's not possible to determine how the local state interacts with other parts of the function, without reading all the implementation detail closely and mentally tracking scope / lifetime of each variable.

  • Python has a strong and long tradition of being writing as a collection of small functions; I think this is due to its long history of having no static typing; even moreso without static typing, functions have to be small to give readers and writers a chance of avoiding bugs.

https://martinfowler.com/bliki/FunctionLength.html is somewhat similar to my view.

Reasons against:

  • Some people prefer reading large amounts of implementation detail in a single sequential pass and don't mind paying for that by tracking very complex function state in their brains, state that can trivially be decomposed into independent modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, no strong preference. For the same reason people have a hard time reading many lines in a method, they have a hard time reading many methods in a class. I just want to make sure we strike a good balance and not have a ton of small, single-use methods littering the class (as opposed to normal reusable methods).

Copy link
Contributor Author

@dandavison dandavison Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's the great often forgotten-about religious war in programming :)

For the same reason people have a hard time reading many lines in a method, they have a hard time reading many methods in a class.

I don't agree that there's an equivalence, but I agree that the scope of helper functions has to be tastefully chosen, and that the author has to be very good at choosing function names. With those two things in hand, helper functions basically only make it easier to read. The point is that you can choose how deep into implementation detail you wish to go, whereas without helper functions you cannot choose.

Copy link
Member

@cretz cretz Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For myself, I struggle hopping around amongst disparately placed logic compared to scrolling (but still don't want to scroll too much of course)

Copy link
Contributor Author

@dandavison dandavison Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so the question is: why do you feel the need to hop to the implementation? A helper function has a return type, and a well-chosen name. That should be all that you need to understand the implementation at the call site: you don't need to see the implementation detail of the helper function. Is the issue that you don't trust methods to not mutate instance state? If so, that implies that you'd be happier with helper functions (with immutable parameters), just not so much with methods (in Python).

But the issue of mutating state is where helper functions bring a big advantage: if you include unrolled implementation logic at the call site then in most languages it's very hard to tell whether the implementation is mutating local state, whereas a helper function makes this clearer or even guaranteed (i.e. in Python, pass immutable arguments; in other languages pass non-writable references, etc).

In general, IMO, complex logic should be written as a sequence of function calls. For example, this should be preferred over an unrolled version, even if the functions are called once only.

    artifact_globs = cli.options.artifact_globs or ["*"]
    remote_artifacts = _list_remote_artifacts(repos, artifact_globs)
    artifacts_to_download = _get_artifacts_not_in_db(db, remote_artifacts)
    downloaded_artifacts = _download_zip_artifacts(artifacts_to_download)
    rows = _parse_xml_from_zip_artifacts(downloaded_artifacts)
    rows = _fetch_pr_info(async_iterator_to_list(rows))
    db.insert_rows(rows)


Note that the ``@workflow.run`` method is always called as
``my_workflow.my_run_method(**workflow_input_args)``. Therefore, if you use
the ``@workflow.init`` decorator, the parameter list of your __init__ and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parameter list of your init and @workflow.run methods will usually be identical.

I think we should require/enforce the params are identical if this decorator is present and the run method is present

Copy link
Contributor Author

@dandavison dandavison Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Yes, I was tentatively thinking of not having a contract without any static (decorator-time) constraints:

If you use the decorator, then it is your responsibility to ensure that your init signature is such that it will accept the arguments passed at workflow-run time. As long as that is true, the parameter signatures of the two methods can be whatever Python allows.

But I can't think of a clear counter-example where the stricter contract would make desirable code invalid, whereas enforcing identical params at decorator-eval time has the benefit of catching incorrect code at import time, and also that it can be relaxed in the future if we want to (whereas the looser contract cannot easily be made stricter).

I do think we want to allow names to differ, e.g. for this scenario where an underscore (perhaps required by a linter or styleguide) is used to mark an unused parameter.

@workflow.defn
class MyWorkflow:
    @workflow.init
    def __init__(self, arg: str) -> None:
        self.value = arg

    @workflow.run
    async def run(self, _arg: str) -> str:
        return self.value

d8b5dff

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Would recommend updating the README (ideally we would get rid of README and only have docs, but that's a separate project I think).

@dandavison dandavison merged commit 0995ae0 into main Sep 24, 2024
12 checks passed
@dandavison dandavison deleted the workflow-init-decorator branch September 24, 2024 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Workflow-init support
2 participants