-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-38041: rewrite pre-exec-init logic to work without QGs and respect storage class differences #444
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #444 +/- ##
==========================================
+ Coverage 82.30% 83.52% +1.22%
==========================================
Files 96 97 +1
Lines 10971 11426 +455
Branches 2092 2192 +100
==========================================
+ Hits 9030 9544 +514
+ Misses 1590 1528 -62
- Partials 351 354 +3 ☔ View full report in Codecov by Sentry. |
d787880
to
048018f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, few minor questions. I only looked superficially at the unit tests.
Whether this task can run without any datasets for the given | ||
connection. | ||
""" | ||
return getattr(self.get_connections(), "minimum", 1) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm confused here, should it be getattr(getattr(self.get_connections(), connection_name), "minimum", 1)
? Or maybe even a bit more type-safe with isinstance(..., BaseInput)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, I've switched to a version that uses isinstance(..., BaseInput)
as that's a lot more readable than nested getattr
calls, too.
Previously recorded package versions. Updated in place to include | ||
any new packages that weren't present before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add that it is only updated if existing versions are consistent (no exception raised)?
if (ref := butler.find_dataset(self.packages_dataset_type)) is not None: | ||
packages = butler.get(ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we never have a situation with multiple concurrent clients trying to execute this code, it is full of races 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 I think we've been pretty clear about pre-exec-init not being something you can parallelize.
# We do writes at the end to minimize the mess we leave behind when we | ||
# raise an exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to delay all writes, including packages and init outputs, in case one of these three methods raises?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about that too, but I think the extra complexity involved just isn't worth it, since this kind of mess hasn't really been a problem in practice that I'm aware of.
tests/test_init_output_run.py
Outdated
@contextmanager | ||
def make_butler(self) -> Iterator[Butler]: | ||
"""Wrap a temporary local butler repository in a context manager.""" | ||
with tempfile.TemporaryDirectory() as root: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth adding ignore_cleanup_errors=False
, I think we saw random cleanup failures that resulted in exceptions which you probably want to ignore in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder. I actually intentionally went with tempfile.TemporaryDirectory
over lsst.utils.tests.temporaryDirectory
because I knew the standard library could now do that (and the only reason we invented our own was that the standard library couldn't before), and then I forgot to actually use that flag 🤦♂️.
This does not use TaskFactory (which is in ctrl_mpexec anyway) because it takes a lot more care with storage class conversions and components, and it saves all writes until after it's done all checks, in order to reduce the chance that we leave a repository messy when we encounter an error.
The old ones that take TaskDef will be deprecated on DM-40442.
This code was moved here from ctrl.mpexec.CmdLineFwk.preExecInitQBB with minimal changes.
This reimplements functionality previously in ctrl_mpexec's PreExecInit by delegating to the new PipelineGraph.instantiate_tasks instead. PreExecInit.saveInitOutputs now delegates to this method.
0634922
to
17ca159
Compare
7609498
to
9a3e471
Compare
Checklist
doc/changes