Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-46799: refactor DatasetRecordStorageManager #1095

Merged
merged 10 commits into from
Oct 16, 2024
Merged

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Oct 11, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • (if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

Copy link

codecov bot commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 86.06965% with 84 lines in your changes missing coverage. Please review.

Project coverage is 89.69%. Comparing base (3abf212) to head (79f0f65).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
.../butler/registry/datasets/byDimensions/_manager.py 86.43% 31 Missing and 28 partials ⚠️
...on/lsst/daf/butler/registry/_dataset_type_cache.py 78.12% 6 Missing and 1 partial ⚠️
...af/butler/registry/datasets/byDimensions/tables.py 89.13% 2 Missing and 3 partials ⚠️
...n/lsst/daf/butler/registry/interfaces/_datasets.py 90.24% 2 Missing and 2 partials ⚠️
python/lsst/daf/butler/registry/sql_registry.py 81.81% 4 Missing ⚠️
.../daf/butler/registry/queries/_sql_query_backend.py 50.00% 3 Missing ⚠️
python/lsst/daf/butler/queries/result_specs.py 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1095      +/-   ##
==========================================
- Coverage   89.74%   89.69%   -0.05%     
==========================================
  Files         361      360       -1     
  Lines       47393    47394       +1     
  Branches     5723     5730       +7     
==========================================
- Hits        42532    42512      -20     
- Misses       3508     3520      +12     
- Partials     1353     1362       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, few docstring fixes. Thanks for fixing GeneralQueryResult!

python/lsst/daf/butler/registry/interfaces/_datasets.py Outdated Show resolved Hide resolved

@abstractmethod
def disassociate(
self, dataset_type: DatasetType, collection: CollectionRecord, datasets: Iterable[DatasetRef]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have to specify DataType here, DatasetRef should already know its DataType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the cases I just didn't want to add to the scope of the ticket, because while you're right, there was just so much of it that it would have been a distraction.

python/lsst/daf/butler/registry/interfaces/_datasets.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registry/interfaces/_datasets.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registry/interfaces/_datasets.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registry/_caching_context.py Outdated Show resolved Hide resolved

def clear(self) -> None:
"""Remove everything from the cache."""
self._cache = {}
self._by_name_cache = {}
self._by_dimensions_cache = {}
self._full = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also self._dimensions_full = False?

python/lsst/daf/butler/registry/sql_registry.py Outdated Show resolved Hide resolved
python/lsst/daf/butler/registry/sql_registry.py Outdated Show resolved Hide resolved
@TallJimbo TallJimbo force-pushed the tickets/DM-46799 branch 3 times, most recently from c578da3 to 152988e Compare October 15, 2024 17:05
The lone concrete dataset storage manager class and its intermediate
base class have been merged, and many function parameters (all
internal, of course) that plumb the SQL dataset ID type through many
layers have been dropped.
This is a mostly-mechanical change; for each method:

- move a method to the manager class;

- give it a dataset type argument, and have it look up the storage
  object internally;

- change calling code (SqlRegistry) to stop looking up the storage
  object before calling it.

The ultimate goal is to get rid of DimensionRecordStorageManager, at
least as a public interface.  But the steps after this one won't be
nearly as mechanical, so it's useful to separate them.
The dataset type cache now holds just the dataset type definition and
dataset ID by name, with the SQLAlchemy table objects instead cached
by DimensionGroup so they can be shared by multiple dataset types.

DatasetRecordStorage has been removed (both the base class and its
sole implementation) - its methods had already been moved to
DatasetRecordStorageManager, and it no longer works as the opaque
thing to put in the cache.  Instead there's a new subpackage-private
DynamicTables class that is cached by DimensionGroup (this is where
the lazy loading of SQLAlchemy table objects now happens), and a
module-private _DatasetRecordStorage struct that just combines that
with the dataset type and its ID, to make it more convenient to pass
all of that around.

I also threw in some changes to the insert/import implementations
because I started trying to reduce the degree that
DatasetRecordStorage was being passed things that were either totally
unused or assumed (without checking) have some value.  I quickly
realized that this problem is ubiquitous (especially with storage
classes) and should be a separate ticket, but I've kept what I already
did since I think it's a step in the right direction.
Supporting these would be extra complexity I don't think we need.
When the given dataset type differs from the registered dataset type
in imports, it's not clear what the ideal behavior is, but the right
choice for *this* ticket is clearly to not change that behavior.
@TallJimbo TallJimbo merged commit 4ff6cc6 into main Oct 16, 2024
16 of 18 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-46799 branch October 16, 2024 22:31
TallJimbo added a commit that referenced this pull request Oct 17, 2024
This reverts commit 4ff6cc6, reversing
changes made to 3abf212.
TallJimbo added a commit that referenced this pull request Oct 17, 2024
Revert "Merge pull request #1095 from lsst/tickets/DM-46799"
@TallJimbo TallJimbo restored the tickets/DM-46799 branch October 17, 2024 15:00
TallJimbo added a commit that referenced this pull request Oct 17, 2024
@TallJimbo TallJimbo deleted the tickets/DM-46799 branch October 17, 2024 15:03
TallJimbo added a commit that referenced this pull request Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants