-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-7847: Add mid-level drivers for measurement algorithms #1020
base: main
Are you sure you want to change the base?
Conversation
78474d6
to
dd1ce59
Compare
def setDefaults(self): | ||
super().setDefaults() | ||
if self.deblender == "scarlet": | ||
self.deblend.retarget(scarlet.ScarletDeblendTask) | ||
elif self.deblender == "meas_deblender": | ||
self.deblend.retarget(measDeblender.SourceDeblendTask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to rename this method to setDeblender()
and just call it in the setDefaults()
method? This way our __setattr__
method becomes:
def __setattr__(self, key, value):
super().__setattr__(key, value)
if key == "deblender":
self.setDeblender()
It just makes more sense to me if I add a setDeblender()
method because we might want to put other things in setDefaults()
that don’t need to be set every time the deblender is changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above explaining why I used both deblend
and deblender
in the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at the code I actually think that it would be better to separate this into multiple tasks, a SingleBandMeasurementDriverTask
and a MultiBandMeasurementDriverTask
that both inherit from a MeasurementDriverBaseTask
. I would also break up run
into multiple subtasks that are implemented in the MeasurementDriverBaseTask (ie. _detection
and measurement
) that are the same for both the single-band and multi-band versions.
This would allow you to give more flexibility to the multi-band version, which could optionally use https://github.com/lsst/drp_tasks/blob/main/python/lsst/drp/tasks/assemble_chi2_coadd.py to build a chi^2 coadd and use that for detection (as opposed to having to choose a reference band). So I would make band
a config option and if band
is None
(the default), it would detect on a chi^2 coadd and perform measurement in all bands, producing a measurement catalog in each band. Otherwise if a user wants measurements in multiple bands then they would have to run the driver multiple times, doing detection and deblending again even though the results will be the same.
deblender = pexConfig.ChoiceField[str]( | ||
doc="The deblender to use.", | ||
default="meas_deblender", | ||
allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this option is necessary. You already set the deblender in deblend
, so having a config option that could accidentally be misaligned with the target deblend
seems like an unnecessary option that could lead to user error.
This will also allow you to remove the other methods implemented in this class below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that I couldn’t directly set the deblender in deblend
because it’s defined as a ConfigurableField
, which can only be retargeted. This was done to enable subconfigs like config.deblend.tinyFootprintSize = 3
. I left it as is, but if I missed something and you know a workaround, let me know — I’ll give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was thinking that you remove the field from the config entirely. What I was thinking is that you could use
if isinstance(self.config.deblender, ScarletDeblendTask):
# use meas_extensions_scarlet deblending
else:
# use meas_deblender
and add a doDeblend
field like other tasks use. While it's still the same number of parameters, it eliminates all of the extra methods in the config and I think that it's a cleaner API, as you could default to multi-band using ScarletDeblendTask
in MultiBandMeasurementDriverConfig.deblender
and single-band using SourceDeblendTask
in SingleBandMeasurementDriverConfig.deblender
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed your suggestion and made _deblendSources
abstract, with each subclass having its own deblend
configurable field and overriding the method in the single-band and multi-band versions.
def setDefaults(self): | ||
super().setDefaults() | ||
if self.deblender == "scarlet": | ||
self.deblend.retarget(scarlet.ScarletDeblendTask) | ||
elif self.deblender == "meas_deblender": | ||
self.deblend.retarget(measDeblender.SourceDeblendTask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my previous comment
a18044d
to
30f2f2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fred3m for your thorough review.
After looking at the code I actually think that it would be better to separate this into multiple tasks, a
SingleBandMeasurementDriverTask
and aMultiBandMeasurementDriverTask
that both inherit from aMeasurementDriverBaseTask
. I would also break uprun
into multiple subtasks that are implemented in theMeasurementDriverBaseTask
(ie._detection
andmeasurement
) that are the same for both the single-band and multi-band versions.
Done! This was definitely an improvement.
This would allow you to give more flexibility to the multi-band version, which could optionally use https://github.com/lsst/drp_tasks/blob/main/python/lsst/drp/tasks/assemble_chi2_coadd.py to build a chi^2 coadd and use that for detection (as opposed to having to choose a reference band). So I would make
band
a config option and ifband
isNone
(the default), it would detect on a chi^2 coadd and perform measurement in all bands, producing a measurement catalog in each band. Otherwise if a user wants measurements in multiple bands then they would have to run the driver multiple times, doing detection and deblending again even though the results will be the same.
Good idea. I'll include it in my improvement ticket if you're okay with it.
deblender = pexConfig.ChoiceField[str]( | ||
doc="The deblender to use.", | ||
default="meas_deblender", | ||
allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that I couldn’t directly set the deblender in deblend
because it’s defined as a ConfigurableField
, which can only be retargeted. This was done to enable subconfigs like config.deblend.tinyFootprintSize = 3
. I left it as is, but if I missed something and you know a workaround, let me know — I’ll give it a try.
def setDefaults(self): | ||
super().setDefaults() | ||
if self.deblender == "scarlet": | ||
self.deblend.retarget(scarlet.ScarletDeblendTask) | ||
elif self.deblender == "meas_deblender": | ||
self.deblend.retarget(measDeblender.SourceDeblendTask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above explaining why I used both deblend
and deblender
in the config.
30f2f2f
to
c5958af
Compare
is particularly suited for simple use cases, such as processing images | ||
without neighbor-noise-replacement or extensive configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This neighbor-noise-replacement part was added based on the ticket description, but I’m not sure why it’s such a big deal for the driver task. It can be turned on or off easily with config.measurement.doReplaceWithNoise
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes @enourbakhsh, I do like this version much better.
Looking back at the ticket I think that there's still some desired functionality that's missing. It might help to model it after MeasureMergedCoaddSourcesTask in https://github.com/lsst/pipe_tasks/blob/main/python/lsst/pipe/tasks/multiBand.py with a few convenience additions:
- add a
doDetect
,doDeblend
, anddoMeasure
config options so that a user can run any subset of the tasks. - Add an optional
catalog
parameter to therun
method so that ifdoDetect
is off, deblending and measurement can still be done by extending the input catalog - To make the above work you'll need to move the schema code into an
_initializeSchema
method that can be called fromrun
ifdoDetect
isFalse
and a catalog is passed to therun
method. To me this is part of the desired functionality to reduce the schema boilerplate, as I (and I think others) often forget house to create a new source catalog with the appropriate schema and have all of the columns copied over correctly.
""" | ||
# Make the `deblend` subtask only if it is enabled. | ||
if self.config.deblender is None: | ||
self.subtasks.remove("deblend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right name? It looks like in init you call it subtaskNames
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! However, we no longer keep track of subtask names, so it's automatically fixed.
afwTable.CoordKey.addErrorFields(self.schema) | ||
|
||
# Standard subtasks to run in sequence. | ||
self.subtaskNames = ["detection", "deblend", "measurement"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use my suggestion from the PR review then you won't need to keep track of subtask names, instead you'll just check the config options doDetect
, doDeblend
, doMeasure
in run and make the subtasks and run them as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back at the ticket I think that there's still some desired functionality that's missing. It might help to model it after
MeasureMergedCoaddSourcesTask
in https://github.com/lsst/pipe_tasks/blob/main/python/lsst/pipe/tasks/multiBand.py with a few convenience additions:
Added some functionalities from MeasureMergedCoaddSourcesTask
. Let me know if it is not enough.
- add a doDetect, doDeblend, and doMeasure config options so that a user can run any subset of the tasks.
Done!
- Add an optional catalog parameter to the run method so that if
doDetect
is off, deblending and measurement can still be done by extending the input catalog
Done! I'm also curious about handling cases where we input a deblended catalog and only need to run measurement
. What's the safest way to distinguish between a post-detection and post-deblending catalog so we can use that in input validation?
To make the above work you'll need to move the schema code into an
_initializeSchema
method that can be called fromrun
ifdoDetect
isFalse
and a catalog is passed to the run method. To me this is part of the desired functionality to reduce the schema boilerplate, as I (and I think others) often forget house to create a new source catalog with the appropriate schema and have all of the columns copied over correctly.
Done! The schema part was a bit of a headache, but it should work now.
""" | ||
# Make the `deblend` subtask only if it is enabled. | ||
if self.config.deblender is None: | ||
self.subtasks.remove("deblend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! However, we no longer keep track of subtask names, so it's automatically fixed.
afwTable.CoordKey.addErrorFields(self.schema) | ||
|
||
# Standard subtasks to run in sequence. | ||
self.subtaskNames = ["detection", "deblend", "measurement"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
deblender = pexConfig.ChoiceField[str]( | ||
doc="The deblender to use.", | ||
default="meas_deblender", | ||
allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed your suggestion and made _deblendSources
abstract, with each subclass having its own deblend
configurable field and overriding the method in the single-band and multi-band versions.
8f18e67
to
50c9f3e
Compare
50c9f3e
to
aa831ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for integrating all of the changes. I think that there is still a little more needed to clean up the task. Thanks for your patience, but it's really close now.
# Placeholders for subclasses to populate. | ||
self.scaleVariance: measAlgorithms.ScaleVarianceTask | ||
self.detection: measAlgorithms.SourceDetectionTask | ||
self.deblend: measDeblender.SourceDeblendTask | scarlet.ScarletDeblendTask | ||
self.measure: measBase.SingleFrameMeasurementTask | ||
self.applyApCorr: measBase.ApplyApCorrTask | ||
self.catalogCalculation: measBase.CatalogCalculationTask | ||
self.exposure: afwImage.Exposure | ||
self.catalog: afwTable.SourceCatalog | ||
self.idGenerator: measBase.IdGenerator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add these as class attributes instead
self.schema = afwTable.SourceTable.makeMinimalSchema() | ||
|
||
# Add coordinate error fields to avoid missing field issues. | ||
afwTable.CoordKey.addErrorFields(self.schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm wrong, but I think that adding this new column to the schema should be done in either case, since the schema needs to be updated with the new column. So just move this line to the end of the method.
if self.config.doScaleVariance and not hasattr(self, "scaleVariance"): | ||
self.makeSubtask("scaleVariance") | ||
|
||
if self.config.doDetect and not hasattr(self, "detection"): | ||
self.makeSubtask("detection", schema=self.schema) | ||
|
||
if self.config.doDeblend and not hasattr(self, "deblend"): | ||
self.makeSubtask("deblend", schema=self.schema, peakSchema=self.peakSchema) | ||
|
||
if self.config.doMeasure and not hasattr(self, "measurement"): | ||
self.makeSubtask("measurement", schema=self.schema) | ||
|
||
if self.config.doApCorr and not hasattr(self, "applyApCorr"): | ||
self.makeSubtask("applyApCorr", schema=self.schema) | ||
|
||
if self.config.doRunCatalogCalculation and not hasattr(self, "catalogCalculation"): | ||
self.makeSubtask("catalogCalculation", schema=self.schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see why you didn't use class attributes. But do you actually need to check if the attribute exists already? I think it's ok to always make a subtask if the config parameter is set, unless there's something that I don't understand about PipelineTasks (which is possible).
def run( | ||
self, | ||
mExposure: afwImage.MultibandExposure | list[afwImage.Exposure], | ||
band: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this refBand
, since it's the reference band used for detection. You could potentially add a config option useReferenceForMeasurement
(or something like that) to the MultiBandMeasurementDriverConfig
class that tells it to run measurement on only the reference band, otherwise it will default to generating a catalog in each band.
self.catalog = newCatalog | ||
|
||
@abstractmethod | ||
def run(self) -> afwTable.SourceCatalog: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're doing trying to use a single run method to reuse code but I don't think that this will work. I think it would be better to make catalog, etc parameters of eg. _deblendSources
and the other methods. I'll give more notes below on how this could be better to subdivide these tasks.
# Set psfcache. | ||
self.exposure.getPsf().setCacheCapacity(self.config.psfCache) | ||
|
||
# Scale variance plane. | ||
if self.config.doScaleVariance: | ||
varScale = self.scaleVariance.run(self.exposure.maskedImage) | ||
self.exposure.getMetadata().add("VARIANCE_SCALE", varScale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For multi-band this needs to be done per band, so this could be moved to a _calibrateExposure
(exposure) method (or something similar) so that you can call it for each band in multi-band or pass the single band exposure for single band.
if self.config.doDetect: | ||
if self.catalog is None: | ||
# Create an empty source table with the known Schema into which | ||
# detected sources will be placed next. | ||
self.table = afwTable.SourceTable.make(self.schema, self.idGenerator.make_table_id_factory()) | ||
else: | ||
raise RuntimeError( | ||
"An input catalog was given to bypass detection, but detection is still on." | ||
) | ||
else: | ||
if self.catalog is None: | ||
raise RuntimeError("Cannot run without detection if no catalog is provided.") | ||
else: | ||
self.log.info("Using detections from provided catalog; skipping detection") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move all of this code into _detectSources(catalog)
and just always call that method and let it do the checking as to whether or not detection is turned on or if it just needs to return the input catalog. Or also move this to each individual run method.
f"fields and {len(self.catalog)} records" | ||
) | ||
|
||
return self.catalog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could return a Struct
that will be different for single and multi-band, as the single band will have at a minimum the output catalog (and as you mentioned in person maybe backgrounds and other intermediate data products) and multi-band should have a catalog in each band and the scarlet model data (see comment below).
# Strip HeavyFootprints to save space on disk. | ||
if self.config.doStripHeavyFootprints: | ||
sources = self.catalog | ||
for source in sources[sources["parent"] != 0]: | ||
source.setFootprint(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for pipeline tasks. It should never be used as it removes all of the results of deblending.
return super().run() | ||
|
||
def _deblendSources(self): | ||
self.catalog, modelData = self.deblend.run(mExposure=self.mExposure, mergedSources=self.catalog) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modelData should be added to the struct that is returned, as it contains important information about blends that is often useful for debugging.
No description provided.