Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38498: improve follow-up query support for QG generation use cases #876

Merged
merged 8 commits into from
Aug 24, 2023

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Aug 8, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Aug 8, 2023

Codecov Report

Patch coverage: 88.61% and project coverage change: +0.01% 🎉

Comparison is base (7c9a229) 87.65% compared to head (fd2474a) 87.67%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #876      +/-   ##
==========================================
+ Coverage   87.65%   87.67%   +0.01%     
==========================================
  Files         272      272              
  Lines       36005    36107     +102     
  Branches     7529     7552      +23     
==========================================
+ Hits        31560    31656      +96     
  Misses       3270     3270              
- Partials     1175     1181       +6     
Files Changed Coverage Δ
python/lsst/daf/butler/core/persistenceContext.py 59.25% <ø> (ø)
python/lsst/daf/butler/core/dimensions/_records.py 82.14% <50.00%> (-1.71%) ⬇️
.../butler/registry/datasets/byDimensions/_storage.py 91.83% <80.00%> (-0.54%) ⬇️
...ython/lsst/daf/butler/registry/queries/_results.py 89.72% <83.33%> (-0.22%) ⬇️
python/lsst/daf/butler/registry/queries/_query.py 79.25% <86.66%> (+0.82%) ⬆️
...lsst/daf/butler/registry/queries/_query_backend.py 92.45% <94.44%> (+0.14%) ⬆️
...hon/lsst/daf/butler/core/dimensions/_coordinate.py 88.73% <100.00%> (+0.28%) ⬆️
.../daf/butler/registry/queries/_sql_query_backend.py 86.53% <100.00%> (+0.13%) ⬆️
python/lsst/daf/butler/registry/tests/_registry.py 98.21% <100.00%> (+0.02%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TallJimbo TallJimbo force-pushed the tickets/DM-38498 branch 2 times, most recently from c2ecaa9 to 9570f7c Compare August 10, 2023 06:04
@TallJimbo TallJimbo force-pushed the tickets/DM-38498 branch 6 times, most recently from 4d5a0bb to 1cf1130 Compare August 21, 2023 19:51
@TallJimbo TallJimbo marked this pull request as ready for review August 21, 2023 21:08
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few minor questions.

Comment on lines 767 to 769
raise ConflictingDefinitionError(
f"Existing dataset type or run do not match new dataset: {row._asdict()}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unreachable code, all branches above raise exceptions.

Comment on lines 839 to 841
raise ConflictingDefinitionError(
f"Existing dataset type and dataId does not match new dataset: {row._asdict()}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

def iter_data_ids_and_dataset_refs(
self, dataset_type: DatasetType, dimensions: DimensionGraph | None = None
) -> Iterator[tuple[DataCoordinate, DatasetRef]]:
"""Iterate over pairs of data IDs and dataset refs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't dataset ref include its data ID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but these data IDs may have different dimensions from the dataset type. I'll point that out in the docs.


Returns
-------
pairs : `~collections.abc.Iterable` [ `tuple` [ `DataCoordinate`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterator instead of Iterable?

initial_join_max_columns=frozenset(self._relation.columns),
governor_constraints=self._governor_constraints,
spatial_joins=spatial_joins,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd need another day to understand the details in the code above, so I just trust it is OK. 🙂

TallJimbo and others added 7 commits August 24, 2023 10:59
This is the butler functionality that will finally let us avoid the
death-by-many-tiny-queries performance problem in QuantumGraph
generation for (in particular) ISR.
This will allow reference catalog queries in QG generation to be
vectorized as long as they use the common skypix system.
Just had to debug some of these, and found the integer primary keys
embedded in the old messages hard to use.  Unfortunately I couldn't get
rid of all of them: this code has no access to the general mapping
from dataset type ID to dataset type name.  But I think that's the
least likely field to be causing the conflict here, so it's not a huge
loss.
Make use of the serialization caches when calling `to_simple` on
dimension records.
@TallJimbo TallJimbo merged commit 1585a23 into main Aug 24, 2023
16 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-38498 branch August 24, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants