Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-7847: Add mid-level drivers for measurement algorithms #1020

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

enourbakhsh
Copy link
Contributor

No description provided.

Comment on lines 88 to 93
def setDefaults(self):
super().setDefaults()
if self.deblender == "scarlet":
self.deblend.retarget(scarlet.ScarletDeblendTask)
elif self.deblender == "meas_deblender":
self.deblend.retarget(measDeblender.SourceDeblendTask)
Copy link
Contributor Author

@enourbakhsh enourbakhsh Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to rename this method to setDeblender() and just call it in the setDefaults() method? This way our __setattr__ method becomes:

def __setattr__(self, key, value):
    super().__setattr__(key, value)
    if key == "deblender":
        self.setDeblender()

It just makes more sense to me if I add a setDeblender() method because we might want to put other things in setDefaults() that don’t need to be set every time the deblender is changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment

Copy link
Contributor Author

@enourbakhsh enourbakhsh Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above explaining why I used both deblend and deblender in the config.

Copy link
Contributor

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the code I actually think that it would be better to separate this into multiple tasks, a SingleBandMeasurementDriverTask and a MultiBandMeasurementDriverTask that both inherit from a MeasurementDriverBaseTask. I would also break up run into multiple subtasks that are implemented in the MeasurementDriverBaseTask (ie. _detection and measurement) that are the same for both the single-band and multi-band versions.

This would allow you to give more flexibility to the multi-band version, which could optionally use https://github.com/lsst/drp_tasks/blob/main/python/lsst/drp/tasks/assemble_chi2_coadd.py to build a chi^2 coadd and use that for detection (as opposed to having to choose a reference band). So I would make band a config option and if band is None (the default), it would detect on a chi^2 coadd and perform measurement in all bands, producing a measurement catalog in each band. Otherwise if a user wants measurements in multiple bands then they would have to run the driver multiple times, doing detection and deblending again even though the results will be the same.

Comment on lines 50 to 61
deblender = pexConfig.ChoiceField[str](
doc="The deblender to use.",
default="meas_deblender",
allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this option is necessary. You already set the deblender in deblend, so having a config option that could accidentally be misaligned with the target deblend seems like an unnecessary option that could lead to user error.

This will also allow you to remove the other methods implemented in this class below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that I couldn’t directly set the deblender in deblend because it’s defined as a ConfigurableField, which can only be retargeted. This was done to enable subconfigs like config.deblend.tinyFootprintSize = 3. I left it as is, but if I missed something and you know a workaround, let me know — I’ll give it a try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was thinking that you remove the field from the config entirely. What I was thinking is that you could use

if isinstance(self.config.deblender, ScarletDeblendTask):
    # use meas_extensions_scarlet deblending
else:
    # use meas_deblender

and add a doDeblend field like other tasks use. While it's still the same number of parameters, it eliminates all of the extra methods in the config and I think that it's a cleaner API, as you could default to multi-band using ScarletDeblendTask in MultiBandMeasurementDriverConfig.deblender and single-band using SourceDeblendTask in SingleBandMeasurementDriverConfig.deblender.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed your suggestion and made _deblendSources abstract, with each subclass having its own deblend configurable field and overriding the method in the single-band and multi-band versions.

Comment on lines 88 to 93
def setDefaults(self):
super().setDefaults()
if self.deblender == "scarlet":
self.deblend.retarget(scarlet.ScarletDeblendTask)
elif self.deblender == "meas_deblender":
self.deblend.retarget(measDeblender.SourceDeblendTask)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment

python/lsst/pipe/tasks/measurementDriver.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/measurementDriver.py Outdated Show resolved Hide resolved
@enourbakhsh enourbakhsh force-pushed the tickets/DM-7847 branch 2 times, most recently from a18044d to 30f2f2f Compare January 22, 2025 04:58
Copy link
Contributor Author

@enourbakhsh enourbakhsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fred3m for your thorough review.

After looking at the code I actually think that it would be better to separate this into multiple tasks, a SingleBandMeasurementDriverTask and a MultiBandMeasurementDriverTask that both inherit from a MeasurementDriverBaseTask. I would also break up run into multiple subtasks that are implemented in the MeasurementDriverBaseTask (ie. _detection and measurement) that are the same for both the single-band and multi-band versions.

Done! This was definitely an improvement.

This would allow you to give more flexibility to the multi-band version, which could optionally use https://github.com/lsst/drp_tasks/blob/main/python/lsst/drp/tasks/assemble_chi2_coadd.py to build a chi^2 coadd and use that for detection (as opposed to having to choose a reference band). So I would make band a config option and if band is None (the default), it would detect on a chi^2 coadd and perform measurement in all bands, producing a measurement catalog in each band. Otherwise if a user wants measurements in multiple bands then they would have to run the driver multiple times, doing detection and deblending again even though the results will be the same.

Good idea. I'll include it in my improvement ticket if you're okay with it.

python/lsst/pipe/tasks/measurementDriver.py Outdated Show resolved Hide resolved
python/lsst/pipe/tasks/measurementDriver.py Outdated Show resolved Hide resolved
Comment on lines 50 to 61
deblender = pexConfig.ChoiceField[str](
doc="The deblender to use.",
default="meas_deblender",
allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"},
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that I couldn’t directly set the deblender in deblend because it’s defined as a ConfigurableField, which can only be retargeted. This was done to enable subconfigs like config.deblend.tinyFootprintSize = 3. I left it as is, but if I missed something and you know a workaround, let me know — I’ll give it a try.

Comment on lines 88 to 93
def setDefaults(self):
super().setDefaults()
if self.deblender == "scarlet":
self.deblend.retarget(scarlet.ScarletDeblendTask)
elif self.deblender == "meas_deblender":
self.deblend.retarget(measDeblender.SourceDeblendTask)
Copy link
Contributor Author

@enourbakhsh enourbakhsh Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above explaining why I used both deblend and deblender in the config.

Comment on lines 122 to 123
is particularly suited for simple use cases, such as processing images
without neighbor-noise-replacement or extensive configuration.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This neighbor-noise-replacement part was added based on the ticket description, but I’m not sure why it’s such a big deal for the driver task. It can be turned on or off easily with config.measurement.doReplaceWithNoise.

Copy link
Contributor

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes @enourbakhsh, I do like this version much better.

Looking back at the ticket I think that there's still some desired functionality that's missing. It might help to model it after MeasureMergedCoaddSourcesTask in https://github.com/lsst/pipe_tasks/blob/main/python/lsst/pipe/tasks/multiBand.py with a few convenience additions:

  • add a doDetect, doDeblend, and doMeasure config options so that a user can run any subset of the tasks.
  • Add an optional catalog parameter to the run method so that if doDetect is off, deblending and measurement can still be done by extending the input catalog
  • To make the above work you'll need to move the schema code into an _initializeSchema method that can be called from run if doDetect is False and a catalog is passed to the run method. To me this is part of the desired functionality to reduce the schema boilerplate, as I (and I think others) often forget house to create a new source catalog with the appropriate schema and have all of the columns copied over correctly.

"""
# Make the `deblend` subtask only if it is enabled.
if self.config.deblender is None:
self.subtasks.remove("deblend")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right name? It looks like in init you call it subtaskNames?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! However, we no longer keep track of subtask names, so it's automatically fixed.

afwTable.CoordKey.addErrorFields(self.schema)

# Standard subtasks to run in sequence.
self.subtaskNames = ["detection", "deblend", "measurement"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use my suggestion from the PR review then you won't need to keep track of subtask names, instead you'll just check the config options doDetect, doDeblend, doMeasure in run and make the subtasks and run them as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

python/lsst/pipe/tasks/measurementDriver.py Show resolved Hide resolved
Copy link
Contributor Author

@enourbakhsh enourbakhsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at the ticket I think that there's still some desired functionality that's missing. It might help to model it after MeasureMergedCoaddSourcesTask in https://github.com/lsst/pipe_tasks/blob/main/python/lsst/pipe/tasks/multiBand.py with a few convenience additions:

Added some functionalities from MeasureMergedCoaddSourcesTask. Let me know if it is not enough.

  • add a doDetect, doDeblend, and doMeasure config options so that a user can run any subset of the tasks.

Done!

  • Add an optional catalog parameter to the run method so that if doDetect is off, deblending and measurement can still be done by extending the input catalog

Done! I'm also curious about handling cases where we input a deblended catalog and only need to run measurement. What's the safest way to distinguish between a post-detection and post-deblending catalog so we can use that in input validation?

To make the above work you'll need to move the schema code into an _initializeSchema method that can be called from run if doDetect is False and a catalog is passed to the run method. To me this is part of the desired functionality to reduce the schema boilerplate, as I (and I think others) often forget house to create a new source catalog with the appropriate schema and have all of the columns copied over correctly.

Done! The schema part was a bit of a headache, but it should work now.

"""
# Make the `deblend` subtask only if it is enabled.
if self.config.deblender is None:
self.subtasks.remove("deblend")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! However, we no longer keep track of subtask names, so it's automatically fixed.

python/lsst/pipe/tasks/measurementDriver.py Show resolved Hide resolved
afwTable.CoordKey.addErrorFields(self.schema)

# Standard subtasks to run in sequence.
self.subtaskNames = ["detection", "deblend", "measurement"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 50 to 61
deblender = pexConfig.ChoiceField[str](
doc="The deblender to use.",
default="meas_deblender",
allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"},
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed your suggestion and made _deblendSources abstract, with each subclass having its own deblend configurable field and overriding the method in the single-band and multi-band versions.

Copy link
Contributor

@fred3m fred3m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for integrating all of the changes. I think that there is still a little more needed to clean up the task. Thanks for your patience, but it's really close now.

Comment on lines +143 to +152
# Placeholders for subclasses to populate.
self.scaleVariance: measAlgorithms.ScaleVarianceTask
self.detection: measAlgorithms.SourceDetectionTask
self.deblend: measDeblender.SourceDeblendTask | scarlet.ScarletDeblendTask
self.measure: measBase.SingleFrameMeasurementTask
self.applyApCorr: measBase.ApplyApCorrTask
self.catalogCalculation: measBase.CatalogCalculationTask
self.exposure: afwImage.Exposure
self.catalog: afwTable.SourceCatalog
self.idGenerator: measBase.IdGenerator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add these as class attributes instead

self.schema = afwTable.SourceTable.makeMinimalSchema()

# Add coordinate error fields to avoid missing field issues.
afwTable.CoordKey.addErrorFields(self.schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong, but I think that adding this new column to the schema should be done in either case, since the schema needs to be updated with the new column. So just move this line to the end of the method.

Comment on lines +184 to +200
if self.config.doScaleVariance and not hasattr(self, "scaleVariance"):
self.makeSubtask("scaleVariance")

if self.config.doDetect and not hasattr(self, "detection"):
self.makeSubtask("detection", schema=self.schema)

if self.config.doDeblend and not hasattr(self, "deblend"):
self.makeSubtask("deblend", schema=self.schema, peakSchema=self.peakSchema)

if self.config.doMeasure and not hasattr(self, "measurement"):
self.makeSubtask("measurement", schema=self.schema)

if self.config.doApCorr and not hasattr(self, "applyApCorr"):
self.makeSubtask("applyApCorr", schema=self.schema)

if self.config.doRunCatalogCalculation and not hasattr(self, "catalogCalculation"):
self.makeSubtask("catalogCalculation", schema=self.schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see why you didn't use class attributes. But do you actually need to check if the attribute exists already? I think it's ok to always make a subtask if the config parameter is set, unless there's something that I don't understand about PipelineTasks (which is possible).

def run(
self,
mExposure: afwImage.MultibandExposure | list[afwImage.Exposure],
band: str | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this refBand, since it's the reference band used for detection. You could potentially add a config option useReferenceForMeasurement (or something like that) to the MultiBandMeasurementDriverConfig class that tells it to run measurement on only the reference band, otherwise it will default to generating a catalog in each band.

self.catalog = newCatalog

@abstractmethod
def run(self) -> afwTable.SourceCatalog:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're doing trying to use a single run method to reuse code but I don't think that this will work. I think it would be better to make catalog, etc parameters of eg. _deblendSources and the other methods. I'll give more notes below on how this could be better to subdivide these tasks.

Comment on lines +247 to +253
# Set psfcache.
self.exposure.getPsf().setCacheCapacity(self.config.psfCache)

# Scale variance plane.
if self.config.doScaleVariance:
varScale = self.scaleVariance.run(self.exposure.maskedImage)
self.exposure.getMetadata().add("VARIANCE_SCALE", varScale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multi-band this needs to be done per band, so this could be moved to a _calibrateExposure(exposure) method (or something similar) so that you can call it for each band in multi-band or pass the single band exposure for single band.

Comment on lines +255 to +268
if self.config.doDetect:
if self.catalog is None:
# Create an empty source table with the known Schema into which
# detected sources will be placed next.
self.table = afwTable.SourceTable.make(self.schema, self.idGenerator.make_table_id_factory())
else:
raise RuntimeError(
"An input catalog was given to bypass detection, but detection is still on."
)
else:
if self.catalog is None:
raise RuntimeError("Cannot run without detection if no catalog is provided.")
else:
self.log.info("Using detections from provided catalog; skipping detection")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move all of this code into _detectSources(catalog) and just always call that method and let it do the checking as to whether or not detection is turned on or if it just needs to return the input catalog. Or also move this to each individual run method.

f"fields and {len(self.catalog)} records"
)

return self.catalog
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could return a Struct that will be different for single and multi-band, as the single band will have at a minimum the output catalog (and as you mentioned in person maybe backgrounds and other intermediate data products) and multi-band should have a catalog in each band and the scarlet model data (see comment below).

Comment on lines +639 to +643
# Strip HeavyFootprints to save space on disk.
if self.config.doStripHeavyFootprints:
sources = self.catalog
for source in sources[sources["parent"] != 0]:
source.setFootprint(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for pipeline tasks. It should never be used as it removes all of the results of deblending.

return super().run()

def _deblendSources(self):
self.catalog, modelData = self.deblend.run(mExposure=self.mExposure, mergedSources=self.catalog)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modelData should be added to the struct that is returned, as it contains important information about blends that is often useful for debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants