Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post_configure: ensure asyncio tasks created by plugins are awaited #5868

Merged
merged 2 commits into from
Feb 22, 2024

Conversation

oliver-sanders
Copy link
Member

@oliver-sanders oliver-sanders commented Dec 6, 2023

This is one of a triplet of PRs which closes cylc/cylc-rose#274



Reviewers, please check:

  • It does what it says on the tin.
  • (re)Install logging works as expected.
  • Errors in file installation are reported as expected.

Testing:

This diff will make file installation take longer:

diff --git a/metomi/rose/config_processors/fileinstall.py b/metomi/rose/config_processors/fileinstall.py
index 937ed746..9797c042 100644
--- a/metomi/rose/config_processors/fileinstall.py
+++ b/metomi/rose/config_processors/fileinstall.py
@@ -399,6 +399,8 @@ class ConfigProcessorForFile(ConfigProcessorBase):
 
     async def process_job(self, job, conf_tree, loc_dao, work_dir):
         """Process a job, helper for "process"."""
+        import asyncio
+        await asyncio.sleep(1)
         for key, method in [
             (Loc.A_INSTALL, self._target_install),
             (Loc.A_SOURCE, self._source_pull),

This diff will disable the new guard:

diff --git a/cylc/flow/async_util.py b/cylc/flow/async_util.py
index 478c41dc29..14ddc26a80 100644
--- a/cylc/flow/async_util.py
+++ b/cylc/flow/async_util.py
@@ -519,8 +519,8 @@ async def async_block():
         # any tasks two() started will have been awaited by now
     """
     # make a list of all tasks running before we enter the context manager
-    tasks_before = asyncio.all_tasks()
+    # tasks_before = asyncio.all_tasks()
     # run the user code
     yield
     # await any new tasks
-    await asyncio.gather(*(asyncio.all_tasks() - tasks_before))
+    # await asyncio.gather(*(asyncio.all_tasks() - tasks_before))

Try installing a workflow with cylc install --debug, it will now log the time taken by each plugin.

With the async_block it should take >1s (due to the sleep) without the async_block (i.e. with the above diff applied) it should take <1s because the sleep will not be awaited.


CI:

Note that the cylc-rose and metomi-rose tests will fail until both the cylc-flow and metomi-rose branches are merged.

I.E. please run the tests manually due review.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md no - unreleased issue
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

entry_point.name,
exc
) from None
async with async_block():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensures that any async tasks started by the plugins are awaited before we move on.

Rose fileinstallation will create async tasks but cannot await them (because it's called synchronously) so we have to do it here, otherwise cylc vip would start running the workflow before local file installation has finished.

@@ -97,8 +97,8 @@ def src_run_dirs(
return tmp_src_path, tmp_run_path


def test_install_scan_no_ping(
src_run_dirs: Callable,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, mypy cannot validate type hints for fixtures so they can be completely wrong as in this case.

@staticmethod
def raiser(*_, **__):
import cylc.flow.flags
if cylc.flow.flags.cylc7_back_compat:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was intended to check that the back-compat flag was set correctly, however, it achieved this by monkeypatching the back-compat flag (i.e. manually setting it to what it expected it to be).

@@ -54,7 +55,7 @@ run_fail "${TEST_NAME_BASE}-runs" cylc vr "${WORKFLOW_NAME}"
grep_ok "on elephantshrew." "${TEST_NAME_BASE}-runs.stderr"

# Clean Up:
sed -i "s@CYLC_WORKFLOW_HOST=elephantshrew@CYLC_WORKFLOW_HOST=$HOSTNAME@" "${CONTACTFILE}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this sed was working (workflow will eventually time out and the test will pass), switched to restoring the original contact file.

@oliver-sanders oliver-sanders self-assigned this Dec 6, 2023
@oliver-sanders oliver-sanders added this to the cylc-8.3.0 milestone Dec 6, 2023
@oliver-sanders oliver-sanders added the non-cylc bug This is a bug, but not in Cylc label Dec 11, 2023


@asynccontextmanager
async def async_block():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crude but effective way to allow us to call async -> sync -> async.

This makes a list of async tasks before the plugin is called in order to await any new tasks created by the plugin.

This means that plugins can go async like so:

def plugin(opts):
    asyncio.create_task(run(opts))
    # the plugin will return now, but Cylc will await the "run" for us

async def run(opts):
    ...

I think it's a reasonable solution.

Sadly, directly converting the plugin interfaces to async isn't an easy option due to the use of plugins within parsec. Making the plugin async would make loading the global config async which would make global config value retrieval (e.g. the line below) async:

glbl_cfg().get(['install', 'symlink dirs'])

Which would basically require the entire cylc-flow codebase to be async and prohibit config caching at the module level.

Comment on lines +75 to +71
async with _async_block():
plugin_result = meth(*args, **kwargs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensures that any async tasks started by the plugins are awaited before we move on.

Rose fileinstallation will create async tasks but cannot await them (because it's called synchronously) so we have to do it here, otherwise cylc vip would start running the workflow before local file installation has finished.

@oliver-sanders oliver-sanders force-pushed the cylc-rose-274 branch 2 times, most recently from 9c45021 to b4c7e44 Compare January 26, 2024 10:56
@oliver-sanders oliver-sanders marked this pull request as ready for review January 26, 2024 10:56
@oliver-sanders oliver-sanders requested a review from wxtim January 26, 2024 10:56
cylc/flow/async_util.py Show resolved Hide resolved
@wxtim wxtim self-requested a review January 26, 2024 13:25
@oliver-sanders oliver-sanders requested a review from wxtim February 5, 2024 10:28
@wxtim
Copy link
Member

wxtim commented Feb 5, 2024

  • Read code
  • Tried out locally
  • Assured myself that test failures were unrelated to this change:
    • Rose stem failure caused by a local change
    • MyPy Failure (metomi/rose) caused by MyPy becoming annoying around Python 3.7.
    • Test Tutorials.

cylc/flow/plugins.py Outdated Show resolved Hide resolved
cylc/flow/plugins.py Show resolved Hide resolved
Comment on lines +377 to +388
monkeypatch.setattr(
'cylc.flow.plugins._async_block',
async_block,
)

# this is what it would have done without async block
(one_src.path / 'b').touch() # give it something else to install
my_install_plugin.clear()
assert my_install_plugin == []
await reinstall_cli(opts=ReInstallOptions(), workflow_id=one_run.id)
# the absence of "end" means that the task was not awaited
assert my_install_plugin == ['start', 'return']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why this bit is being tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tldr; the second part of this test confirms that the first part is a valid test.

It's easy to write a test for this sort of thing that passes, even if the related functionality doesn't actually work.

To make the that this functionality does work, I wrote two tests:

  1. One checks that there would be a problem if we didn't use the fix.
  2. The second checks that the fix solves the problem.

By writing them together using the same approach you reduce the likelyhood of the test being broken and producing false negatives in the future.

@MetRonnie
Copy link
Member

Unfortunately functional tests / py-3-latest and test-tutorial-workflow / test (3) hang at some point

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 9, 2024

Lucky catch, this PR appears to have broken daemonization, which is a major weak spot in our testing (due to its slippery nature). I have no idea why just that one test fails, but you can easily replicate the issue by configuring run hosts.

The cause isn't obvious, even uninstalling cylc-rose and commenting out the async_block code doesn't change it.

[edit] Daemonization is fine via cylc play, it's just the cylc vip script that has issues.

[edit] Can't replicate this any more 😕

* Await any background tasks started by a plugin before continuing.
* Addresses cylc/cylc-rose#274
* Centralise the plugin loading/running/reporting logic.
* Fix the installation test for back-compat-mode.
Copy link
Member

@wxtim wxtim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Read changes
  • Run tests locally
  • Tried to break various components of this

* The "cylc vip" command combines three individual coroutines, usually
  run in their own event loops.
* For unknown reasons, the cylc.flow.network.scan.scan coroutine pipe, called
  as part of the the "install" coroutine, interacts with the daemonization
  routine called as part of the "play" coroutine.
* Keeping these event loops separate avoids the conflict.
@oliver-sanders
Copy link
Member Author

oliver-sanders commented Feb 21, 2024

I think I have a solution to the hanging problem, see d173c2a for details.

The issue could be replicated in the simplest way:

# [scheduler][run hosts]available=
$ cylc vip

Before d173c2a, the scheduler would startup, but the cylc vip command would not exit.

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test this out tomorrow

Comment on lines +96 to +98
# NOTE: We call each of the stages in its own event loop because there is a
# strange interaction whereby the cylc.flow.network.scan coroutine pipe can
# cause sys.exit to hang on the original process post-daemonization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@wxtim wxtim self-requested a review February 22, 2024 09:41
Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested

@MetRonnie MetRonnie merged commit 4ac6f3d into cylc:master Feb 22, 2024
27 checks passed
@oliver-sanders oliver-sanders deleted the cylc-rose-274 branch February 23, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-cylc bug This is a bug, but not in Cylc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reinstall + fileinstall: async clash
3 participants