-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esheldon fork Tickets/dm 40513 #24
base: tickets/DM-40513
Are you sure you want to change the base?
Conversation
MultipleCellCoadd isn't iterable, but its .cells attribute is a mapping.
The Input name is set correctly to cellCoadd temporarily commented object schema "cT.InitOutput" because this doesn't work yet temporarilyl set required_bands to r, need to learn how to set with config
Currently the real cell coadds don't have everything we need
The metadetect code expects MultiBandExposure
Short term todo items:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jim and I did a quick first pass and will do a more careful one separately. We will refrain from commenting on column names in this PR.
nullable=False, | ||
metadata={ | ||
"doc": "admom wmom T (<x^2> + <y^2>) measurement for PSF.", | ||
"unit": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add actual units here and wherever applicable. So this would be "pixels", I guess?
}, | ||
), | ||
pa.field( | ||
"wmom_band_flux_1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the _1
here signify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_1 means flux in the first filter
We will be measuring the flux in N filters and the filters will be configurable.
We can translate these to filter names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should translate to filter names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to do this using the required_bands
config option list
single_cell_tables: list[pa.Table] = [] | ||
for single_cell_coadds in zip( | ||
for cell_coadds in zip( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should stay as single_cell_coadds
to be evident what class it is. In metadetect/DESC code base, it should be okay to simplify it as cell_coadds
if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type is documented
@@ -117,14 +124,20 @@ class MetadetectionShearConfig( | |||
): | |||
"""Configuration definition for MetadetectionShearTask.""" | |||
|
|||
from lsst.meas.base import SkyMapIdGeneratorConfig | |||
|
|||
required_bands = ListField[str]( | |||
"Bands expected to be present. Cells with one or more of these bands " | |||
"missing will be skipped. Bands other than those listed here will " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this docstring is consistent with what happens. If g
is specified by the coadd data doesn't exist, I think a NoWorkFound
gets raised and nothing gets processed. I think that's the correct behavior and the docstring needs to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jim wrote that doc string, maybe he can comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote what I expect to happen. Happy to have it changed to what will happen.
I've created DM-40698 to look into why the command-line config overrides are not being applied; I'm considering that a middleware bug until I can prove otherwise. |
('cell_y', 'u1'), | ||
('shear_type', 'U2'), | ||
('mask_frac', 'f4'), | ||
('primary', bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these items in the schema above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhh nvm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note also that the code that creates the final output on disk seems to ignore fields that are not in the schema.
Jim your link is wrong, should be https://jira.lsstcorp.org/browse/DM-40698 |
We currently run with a config restricting to r for real data, as other bands don't yet exist for the testbed Note we will want to process g separately, this is still TODO
currently the sims cannot generate the bright objects because the catalog is not available Also when using real data we need to pass in the info for bright objects to be masked
Is there a way to have an input that can be used as a placeholder for the star catalog? It does not have to represent a real catalog. |
I can add a connection that will load Gaia or Pan-STARRS (when running on real data) or the DC2 refcat. I'm not super familiar with that part of the codebase but it shouldn't take me long to figure out. Should have something later today or maybe Thursday. |
Thanks Jim. I just want to point out the practical issue of memory limitations, and the possible need of loading a subset of that data (either columns or rows/spatially) |
Something like this might work (change astromRefCat = cT.PrerequisiteInput(
doc="Reference catalog to use for astrometry",
name="gaia_dr3_20230707",
storageClass="SimpleCatalog",
dimensions=("skypix",),
deferLoad=True,
multiple=True,
)
photoRefCat = cT.PrerequisiteInput(
doc="Reference catalog to use for photometric calibration",
name="ps1_pv3_3pi_20170110",
storageClass="SimpleCatalog",
dimensions=("skypix",),
deferLoad=True,
multiple=True
) |
Yes, that's what we'd want for the connections, and it'd need to remain a prerequisite (I don't think that's a problem; we'll always have these in hand before we start DR processing). The part I don't remember is how to use the objects that do the spatial filtering and concatenation to get just the relevant shards for a particular data ID. Erin, the spatial filtering is taken care of - these catalogs are sharded by HTM across the sky and the middleware that runs the tasks has information to grab just the shards it needs for each patch. And then we have some in-memory filtering (the stuff I need to look up) to shrink it down further. I don't know what sort of column-filtering will be possible, at least on read, since these are all stored in FITS binary tables, but they are all preprocessed to have a similar set of columns across all reference catalogs and that set of columns is typically much smaller than the upstream catalog, so there may not be a need. |
a25beb1
to
bbe1965
Compare
Ok, I've (fast-forward) merged this branch into tickets/DM-40513, rebased that on |
Jim, I don't understand. All I see in the other PR is a linter fail |
You should be able to find rebased versions of all of your commits on PR #22 now, too, as well as three new ones by me; two are mechanical linter things along with bbe1965. If you pull those changes to your fork and either to a |
on PR #22 it doesn't mention a conflict. What is it we are trying to fix? |
I was just talking about the |
Yes, we are using However when I try
Here is my command line pipetask run \
-b /sdf/data/rubin/repo/dc2 \
-i u/kannawad/DM-39243 \
-o u/$USER/mdetTest \
--task lsst.drp.tasks.metadetection_shear.MetadetectionShearTask \
-d "tract=3828 AND patch=42 AND band='r' AND skymap='DC2_cells_v1'" \
-C metadetectionShear:metadetection_shear_config.py |
Trying to reproduce this now. Can you post the full traceback? |
|
Any changes from the currently-pushed branch in your local copy? That |
No changes other than putting in the new name |
I'm on stackvana corresponding to |
I get only an empty quantum graph error, not the one that Erin posted. I used Erin's environment to run this:
|
What's in the |
|
I've reproduced Erin's error with Arun's command and a config-file with the Debugging now. |
Looks like a middleware bug Jim. This happens only when |
These lines are the suspect I think (Edit: nope, never mind)
|
That's what triggers it here, but the bug is in |
Quantum graph builds, and we can another error here:
I might be able to tackle this one. |
The task runs without any errors now. I have arbitrarily set the |
I didn't know Jim was out of commission, sorry to hear that, whatever the reason is. |
@@ -559,7 +564,8 @@ def runQuantum( | |||
config=self.config.ref_loader, | |||
log=self.log, | |||
) | |||
ref_cat = ref_loader.loadRegion(qc.quantum.dataId.region) | |||
# What should decide the filterName? | |||
ref_cat = ref_loader.loadRegion(qc.quantum.dataId.region, filterName="r") # THIS IS A HACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it says it's a hack but can't you get the band from the dataId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a multi-band task, so qc.quantum.dataId
doesn't have a band.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do know all possible bands in play though because you calculate them in the next line. I have no idea how you guess which of the dataset ref bands is the one that you choose for the refcat loader though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. I don't know either. And it might not matter at all for this purpose if the downstream code (which is metadetect
here) does not rely on a reference band for fluxes (which I don't think it does)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, it does not
super().setDefaults() | ||
# This is a DC2/cal_ref_cat_2_2 specific hack. This should be ideally specified in a config file | ||
# To be removed in the cleanup before merging to main | ||
self.ref_loader.filterMap = {band: f"lsst_{band}_smeared" for band in self.required_bands} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not a hack as the comment above says. What I meant is that the mapping here is specific to the reference catalog loaded.
I got the new pipe_base and latest changes to this PR. However I am still seeing the QuantumGraph error. Was there another code updated?
|
OK, the error occurs with |
Yes, I was going to comment just that because I had seen that error when I allowed for more bands than available on the coadd. |
We need to make this configurable for different repos. E.g. for /repo/main we would need gaia_dr3_20230707 Removed temporary break statement added for testing Fixed flake8 complaints
@@ -65,9 +65,14 @@ class MetadetectionShearConnections(PipelineTaskConnections, dimensions={"patch" | |||
dimensions={"patch", "band"}, | |||
) | |||
|
|||
# TODO: make "name" configurable, as it will depend on | |||
# the repo we are using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are already configurable from the pipeline yaml file which are specific to what repo we are running this against. cal_ref_cat_2_2
is just the default value in the absence of a pipeline file.
Should I be using a pipeline file? |
A pipeline file is not needed for testing right now and the current command line invocation should be fine. That was just an FYI that the |
No description provided.