DM-46799: refactor DatasetRecordStorageManager #1095

TallJimbo · 2024-10-11T19:29:37Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes
(if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

codecov · 2024-10-11T19:51:44Z

Codecov Report

Attention: Patch coverage is 86.06965% with 84 lines in your changes missing coverage. Please review.

Project coverage is 89.69%. Comparing base (3abf212) to head (79f0f65).
Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
.../butler/registry/datasets/byDimensions/_manager.py	86.43%	31 Missing and 28 partials ⚠️
...on/lsst/daf/butler/registry/_dataset_type_cache.py	78.12%	6 Missing and 1 partial ⚠️
...af/butler/registry/datasets/byDimensions/tables.py	89.13%	2 Missing and 3 partials ⚠️
...n/lsst/daf/butler/registry/interfaces/_datasets.py	90.24%	2 Missing and 2 partials ⚠️
python/lsst/daf/butler/registry/sql_registry.py	81.81%	4 Missing ⚠️
.../daf/butler/registry/queries/_sql_query_backend.py	50.00%	3 Missing ⚠️
python/lsst/daf/butler/queries/result_specs.py	0.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1095      +/-   ##
==========================================
- Coverage   89.74%   89.69%   -0.05%     
==========================================
  Files         361      360       -1     
  Lines       47393    47394       +1     
  Branches     5723     5730       +7     
==========================================
- Hits        42532    42512      -20     
- Misses       3508     3520      +12     
- Partials     1353     1362       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andy-slac

Looks great, few docstring fixes. Thanks for fixing GeneralQueryResult!

python/lsst/daf/butler/registry/interfaces/_datasets.py

andy-slac · 2024-10-11T21:01:49Z

python/lsst/daf/butler/registry/interfaces/_datasets.py

+
+    @abstractmethod
+    def disassociate(
+        self, dataset_type: DatasetType, collection: CollectionRecord, datasets: Iterable[DatasetRef]


Do you have to specify DataType here, DatasetRef should already know its DataType?

This is one of the cases I just didn't want to add to the scope of the ticket, because while you're right, there was just so much of it that it would have been a distraction.

python/lsst/daf/butler/registry/interfaces/_datasets.py

python/lsst/daf/butler/registry/_caching_context.py

andy-slac · 2024-10-11T21:30:49Z

python/lsst/daf/butler/registry/_dataset_type_cache.py


    def clear(self) -> None:
        """Remove everything from the cache."""
-        self._cache = {}
+        self._by_name_cache = {}
+        self._by_dimensions_cache = {}
        self._full = False


Maybe also self._dimensions_full = False?

python/lsst/daf/butler/registry/datasets/byDimensions/_manager.py

python/lsst/daf/butler/registry/sql_registry.py

The lone concrete dataset storage manager class and its intermediate base class have been merged, and many function parameters (all internal, of course) that plumb the SQL dataset ID type through many layers have been dropped.

This is a mostly-mechanical change; for each method: - move a method to the manager class; - give it a dataset type argument, and have it look up the storage object internally; - change calling code (SqlRegistry) to stop looking up the storage object before calling it. The ultimate goal is to get rid of DimensionRecordStorageManager, at least as a public interface. But the steps after this one won't be nearly as mechanical, so it's useful to separate them.

The dataset type cache now holds just the dataset type definition and dataset ID by name, with the SQLAlchemy table objects instead cached by DimensionGroup so they can be shared by multiple dataset types. DatasetRecordStorage has been removed (both the base class and its sole implementation) - its methods had already been moved to DatasetRecordStorageManager, and it no longer works as the opaque thing to put in the cache. Instead there's a new subpackage-private DynamicTables class that is cached by DimensionGroup (this is where the lazy loading of SQLAlchemy table objects now happens), and a module-private _DatasetRecordStorage struct that just combines that with the dataset type and its ID, to make it more convenient to pass all of that around. I also threw in some changes to the insert/import implementations because I started trying to reduce the degree that DatasetRecordStorage was being passed things that were either totally unused or assumed (without checking) have some value. I quickly realized that this problem is ubiquitous (especially with storage classes) and should be a separate ticket, but I've kept what I already did since I think it's a step in the right direction.

Supporting these would be extra complexity I don't think we need.

When the given dataset type differs from the registered dataset type in imports, it's not clear what the ideal behavior is, but the right choice for *this* ticket is clearly to not change that behavior.

This reverts commit 4ff6cc6, reversing changes made to 3abf212.

Revert "Merge pull request #1095 from lsst/tickets/DM-46799"

This reverts commit fd37c27.

TallJimbo force-pushed the tickets/DM-46799 branch from e49b8c9 to a85d17f Compare October 11, 2024 19:37

andy-slac approved these changes Oct 11, 2024

View reviewed changes

TallJimbo force-pushed the tickets/DM-46799 branch 3 times, most recently from c578da3 to 152988e Compare October 15, 2024 17:05

TallJimbo added 10 commits October 16, 2024 18:01

Fix variable-binding bug in GeneralQueryResults.

d10be56

Minor doc fixes.

22c558c

Clarify save_dimension_graph behavior in docs.

832e843

Fix old DimensionGraph nomenclature reference.

8b861aa

Don't pass dataset types when only their dimensions are needed.

781bb32

Ban multiple-dataset-type results in general find-first queries.

0c3e4ca

Supporting these would be extra complexity I don't think we need.

Switch back to returning given DatasetType in registry imports.

79f0f65

When the given dataset type differs from the registered dataset type in imports, it's not clear what the ideal behavior is, but the right choice for *this* ticket is clearly to not change that behavior.

TallJimbo force-pushed the tickets/DM-46799 branch from 152988e to 79f0f65 Compare October 16, 2024 22:01

TallJimbo merged commit 4ff6cc6 into main Oct 16, 2024
16 of 18 checks passed

TallJimbo deleted the tickets/DM-46799 branch October 16, 2024 22:31

TallJimbo added a commit that referenced this pull request Oct 17, 2024

Revert "Merge pull request #1095 from lsst/tickets/DM-46799"

fd37c27

This reverts commit 4ff6cc6, reversing changes made to 3abf212.

TallJimbo added a commit that referenced this pull request Oct 17, 2024

Merge pull request #1101 from lsst/u/jbosch/DM-46799/revert

be031d3

Revert "Merge pull request #1095 from lsst/tickets/DM-46799"

TallJimbo restored the tickets/DM-46799 branch October 17, 2024 15:00

TallJimbo added a commit that referenced this pull request Oct 17, 2024

Reapply "Merge pull request #1095 from lsst/tickets/DM-46799"

2f1a196

This reverts commit fd37c27.

TallJimbo deleted the tickets/DM-46799 branch October 17, 2024 15:03

TallJimbo added a commit that referenced this pull request Oct 18, 2024

Reapply "Merge pull request #1095 from lsst/tickets/DM-46799"

c82414a

This reverts commit fd37c27.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-46799: refactor DatasetRecordStorageManager #1095

DM-46799: refactor DatasetRecordStorageManager #1095

TallJimbo commented Oct 11, 2024 •

edited

Loading

codecov bot commented Oct 11, 2024 •

edited

Loading

andy-slac left a comment

andy-slac Oct 11, 2024

TallJimbo Oct 14, 2024

andy-slac Oct 11, 2024

DM-46799: refactor DatasetRecordStorageManager #1095

DM-46799: refactor DatasetRecordStorageManager #1095

Conversation

TallJimbo commented Oct 11, 2024 • edited Loading

Checklist

codecov bot commented Oct 11, 2024 • edited Loading

Codecov Report

andy-slac left a comment

Choose a reason for hiding this comment

andy-slac Oct 11, 2024

Choose a reason for hiding this comment

TallJimbo Oct 14, 2024

Choose a reason for hiding this comment

andy-slac Oct 11, 2024

Choose a reason for hiding this comment

TallJimbo commented Oct 11, 2024 •

edited

Loading

codecov bot commented Oct 11, 2024 •

edited

Loading