Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature gst checkpointing #347

Merged
merged 22 commits into from
Sep 19, 2023
Merged

Feature gst checkpointing #347

merged 22 commits into from
Sep 19, 2023

Conversation

coreyostrove
Copy link
Contributor

This PR gives an initial implementation for checkpointing iterative GST fits and introduces some basic and extensible infrastructure for checkpointing protocols more broadly going forward. For this initial implementation there is currently only full support for checkpointing ModelTest, GateSetTomography and 'StandardGST' protocols. Some highlights:

  1. Adds ProtocolCheckpoint, GateSetTomographyCheckpoint, ModelTestCheckpoint and StandardGSTCheckpoint objects, each of which can be serialized natively to json.
  2. Protocol run methods (and related driver functions) now take as input kwargs allowing one to give a previously generated checkpoint object if one exists, a path to write checkpoint files to during the course of the fit and finally an option to disable checkpointing entirely (checkpointing is on by default).

This should hopefully reduce one of the common pain points users encounter wherein their fits fail for some mundane reason (like running out of memory or cluster allocation time etc.) and then they need to restart an expensive calculation from scratch.

I have added some unit tests (these currently live in the test_drivers module in test_packages for convenience), which all seem to be passing at present. We do have some tests failing in the extras, but these look to be unrelated to new code beyond the fact that the new code made these bugs visible (discussed further in #346). So given that I'd advocate for merging this in despite those failing tests. Ultimately we'll want to disable checkpointing in these particular failing unit tests---I had done a pass through the tests to disable most of these to avoid needing to deal with file hygiene on the runners for tests that aren't related to checkpointing but missed these by accident---but for now I'd argue is useful to keep it enabled since that is what made these serialization bugs apparent.

Unrelatedly this includes a deprecation of the comma-separated string syntax for modes in StandardGST and some fixes to unit tests to account for that.

Feedback and questions welcome.

Corey Ostrove added 16 commits August 15, 2023 23:02
Initial implementation of checkpointing for GST. This initially implementation targets the GateSetTomography protocol specifically and future updates will add support for additional protocols. To support the checkpointing functionality a new generator function version of the iterative GST algorithm has been implemented to enable lazy evaluation. This commit also introduces a new ProtcolCheckpoint parent class and the GateSetTomographyCheckpoint class. Each protocol requires a somewhat different set of information to enable it to checkpoint properly during runtime, so the current vision is that each protocol will have a specialized checkpoint object associated with it.
Minor revision to restructure some of the control logic for the checkpointing.
An initial implementation of checkpointing for model testing. This really only saves intermediate objective function values, as that was all I identified that was either a) computationally worth checkpointing or b) safe to checkpoint given concerns about being independent of layout specific configuration details in resource alloc etc.
This commit adds an initial implementation of the StandardGSTCheckpointing object and it's use in the StandardGST protocol. Also includes a few bug fixes for the implementations of GateSetTomographyCheckpoint and ModelTestCheckpoint. The idea behind the implementation of the StandardGSTCheckpoint is to treat it as a container for some number of checkpoint objects for each of the sub-protocols run during StandardGST. To facilitate this I have also added a 'parent' attribute to the base ProtocolCheckpoint object to enable a link between child objects and their parents. This is primarily used at present to swap the child's write method with that of the parent's (which invokes the parent's serialization instead) when a parent checkpoint is present. Bugfixes were related to a mistake in using mutable objects as default kwarg values.
The old behavior for the modes argument is kind of wonky and doesn't have any analogues in the code that I can recall. This switches the argument to take either an iterable or a single string (i.e. for specifying a single-mode), and adds a deprecation warning about the comma-separated version.
Properly manage file access for MPI compatibility.
Updates the docstrings for new checkpointing functionality.
Adds in the plumbing for passing in checkpoint objects to the driver function based interface.
When this is set to True checkpoint objects are neither created nor written to disk. If there are values passed in for the checkpoint and checkpoint_path kwarg these are ignored entirely.
This adds some unit tests for the checkpointing functionality. Decided to add this to the drivers test package for simplicity. Only test we may want to revisit is for the StandardGSTCheckpoint, which uses log-likelihoods temporarily due to unexpected sign behavior with CPTPLND parameterized models. May also want to add tests for when checkpoints are disabled and such, but then that is already largely covered already. Also refactored a few other tests to bring them in line with the recent deprecations of comma-separated strings for modes on StandardGST.
Update tutorial notebooks to reflect new StandardGST syntax.
I was inadvertently using a method that is only in pathlib for 3.9+, switch to a call that is compatible with older versions.
Pretty sure this is a false positive in flake8, not sure why else there would be a syntax error...
Remove errant print statement from test file (which incidentally breaks on older versions of python).
I seem to have added this file by accident...
Corey Ostrove added 4 commits September 6, 2023 23:22
Add sections on checkpointing an its usage to the tutorials on GST and model testing protocols. Also makes a minor change to one of the unit tests that is inexplicably failing on older python versions.
Remove some of the disable_checkpointing flags I had added to the earlier parts of the of updated tutorials. I am still concerned we might run into file access collisions since we're running the notebook test suite with multiple processes, but I am more worried about accidentally shaping user behavior negatively. Most of our users will copy and paste directly from the tutorials and assume the disable_checkpointing flags are important/needed, and that is worse than weird testing behavior to debug.
Bugfix and refactoring for deprecated mode labels.
Fix minor typo in updated label
Copy link
Contributor

@sserita sserita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks really good. I appreciate both your commitment to updating examples and tests to avoid deprecation warnings and the new unit tests. Only have one comment for us to touch base on before approving.

pygsti/algorithms/core.py Show resolved Hide resolved
Corey Ostrove added 2 commits September 14, 2023 19:32
Refactor run_iterative_gst to use the new iterative_gst_generator generator function under the hood to avoid code duplication.
Add a new test for warmstarting of iterative_gst_generator.
Copy link
Contributor

@sserita sserita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Corey! Looks great.

@sserita sserita merged commit aaedfe0 into develop Sep 19, 2023
13 checks passed
@sserita sserita deleted the feature-gst-checkpointing branch September 19, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants