DM-7847: Add mid-level drivers for measurement algorithms #1020

enourbakhsh · 2025-01-07T20:02:27Z

No description provided.

enourbakhsh · 2025-01-07T20:54:27Z

python/lsst/pipe/tasks/measurementDriver.py

+    def setDefaults(self):
+        super().setDefaults()
+        if self.deblender == "scarlet":
+            self.deblend.retarget(scarlet.ScarletDeblendTask)
+        elif self.deblender == "meas_deblender":
+            self.deblend.retarget(measDeblender.SourceDeblendTask)


Would it be better to rename this method to setDeblender() and just call it in the setDefaults() method? This way our __setattr__ method becomes:

def __setattr__(self, key, value): super().__setattr__(key, value) if key == "deblender": self.setDeblender()

It just makes more sense to me if I add a setDeblender() method because we might want to put other things in setDefaults() that don’t need to be set every time the deblender is changed.

See my previous comment

See my comment above explaining why I used both deblend and deblender in the config.

fred3m

After looking at the code I actually think that it would be better to separate this into multiple tasks, a SingleBandMeasurementDriverTask and a MultiBandMeasurementDriverTask that both inherit from a MeasurementDriverBaseTask. I would also break up run into multiple subtasks that are implemented in the MeasurementDriverBaseTask (ie. _detection and measurement) that are the same for both the single-band and multi-band versions.

This would allow you to give more flexibility to the multi-band version, which could optionally use https://github.com/lsst/drp_tasks/blob/main/python/lsst/drp/tasks/assemble_chi2_coadd.py to build a chi^2 coadd and use that for detection (as opposed to having to choose a reference band). So I would make band a config option and if band is None (the default), it would detect on a chi^2 coadd and perform measurement in all bands, producing a measurement catalog in each band. Otherwise if a user wants measurements in multiple bands then they would have to run the driver multiple times, doing detection and deblending again even though the results will be the same.

fred3m · 2025-01-08T20:54:58Z

python/lsst/pipe/tasks/measurementDriver.py

+    deblender = pexConfig.ChoiceField[str](
+        doc="The deblender to use.",
+        default="meas_deblender",
+        allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"},
+    )


I don't think that this option is necessary. You already set the deblender in deblend, so having a config option that could accidentally be misaligned with the target deblend seems like an unnecessary option that could lead to user error.

This will also allow you to remove the other methods implemented in this class below.

The problem is that I couldn’t directly set the deblender in deblend because it’s defined as a ConfigurableField, which can only be retargeted. This was done to enable subconfigs like config.deblend.tinyFootprintSize = 3. I left it as is, but if I missed something and you know a workaround, let me know — I’ll give it a try.

Sorry, I was thinking that you remove the field from the config entirely. What I was thinking is that you could use

if isinstance(self.config.deblender, ScarletDeblendTask): # use meas_extensions_scarlet deblending else: # use meas_deblender

and add a doDeblend field like other tasks use. While it's still the same number of parameters, it eliminates all of the extra methods in the config and I think that it's a cleaner API, as you could default to multi-band using ScarletDeblendTask in MultiBandMeasurementDriverConfig.deblender and single-band using SourceDeblendTask in SingleBandMeasurementDriverConfig.deblender.

Followed your suggestion and made _deblendSources abstract, with each subclass having its own deblend configurable field and overriding the method in the single-band and multi-band versions.

fred3m · 2025-01-08T20:55:10Z

python/lsst/pipe/tasks/measurementDriver.py

+    def setDefaults(self):
+        super().setDefaults()
+        if self.deblender == "scarlet":
+            self.deblend.retarget(scarlet.ScarletDeblendTask)
+        elif self.deblender == "meas_deblender":
+            self.deblend.retarget(measDeblender.SourceDeblendTask)


See my previous comment

python/lsst/pipe/tasks/measurementDriver.py

enourbakhsh

Thanks @fred3m for your thorough review.

After looking at the code I actually think that it would be better to separate this into multiple tasks, a SingleBandMeasurementDriverTask and a MultiBandMeasurementDriverTask that both inherit from a MeasurementDriverBaseTask. I would also break up run into multiple subtasks that are implemented in the MeasurementDriverBaseTask (ie. _detection and measurement) that are the same for both the single-band and multi-band versions.

Done! This was definitely an improvement.

This would allow you to give more flexibility to the multi-band version, which could optionally use https://github.com/lsst/drp_tasks/blob/main/python/lsst/drp/tasks/assemble_chi2_coadd.py to build a chi^2 coadd and use that for detection (as opposed to having to choose a reference band). So I would make band a config option and if band is None (the default), it would detect on a chi^2 coadd and perform measurement in all bands, producing a measurement catalog in each band. Otherwise if a user wants measurements in multiple bands then they would have to run the driver multiple times, doing detection and deblending again even though the results will be the same.

Good idea. I'll include it in my improvement ticket if you're okay with it.

python/lsst/pipe/tasks/measurementDriver.py

enourbakhsh · 2025-01-22T05:15:40Z

python/lsst/pipe/tasks/measurementDriver.py

+    deblender = pexConfig.ChoiceField[str](
+        doc="The deblender to use.",
+        default="meas_deblender",
+        allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"},
+    )


The problem is that I couldn’t directly set the deblender in deblend because it’s defined as a ConfigurableField, which can only be retargeted. This was done to enable subconfigs like config.deblend.tinyFootprintSize = 3. I left it as is, but if I missed something and you know a workaround, let me know — I’ll give it a try.

enourbakhsh · 2025-01-22T05:17:40Z

python/lsst/pipe/tasks/measurementDriver.py

+    def setDefaults(self):
+        super().setDefaults()
+        if self.deblender == "scarlet":
+            self.deblend.retarget(scarlet.ScarletDeblendTask)
+        elif self.deblender == "meas_deblender":
+            self.deblend.retarget(measDeblender.SourceDeblendTask)


See my comment above explaining why I used both deblend and deblender in the config.

enourbakhsh · 2025-01-22T06:28:16Z

python/lsst/pipe/tasks/measurementDriver.py

+    is particularly suited for simple use cases, such as processing images
+    without neighbor-noise-replacement or extensive configuration.


This neighbor-noise-replacement part was added based on the ticket description, but I’m not sure why it’s such a big deal for the driver task. It can be turned on or off easily with config.measurement.doReplaceWithNoise.

fred3m

Thanks for making the changes @enourbakhsh, I do like this version much better.

Looking back at the ticket I think that there's still some desired functionality that's missing. It might help to model it after MeasureMergedCoaddSourcesTask in https://github.com/lsst/pipe_tasks/blob/main/python/lsst/pipe/tasks/multiBand.py with a few convenience additions:

add a doDetect, doDeblend, and doMeasure config options so that a user can run any subset of the tasks.
Add an optional catalog parameter to the run method so that if doDetect is off, deblending and measurement can still be done by extending the input catalog
To make the above work you'll need to move the schema code into an _initializeSchema method that can be called from run if doDetect is False and a catalog is passed to the run method. To me this is part of the desired functionality to reduce the schema boilerplate, as I (and I think others) often forget house to create a new source catalog with the appropriate schema and have all of the columns copied over correctly.

fred3m · 2025-01-22T16:48:47Z

python/lsst/pipe/tasks/measurementDriver.py

+        """
+        # Make the `deblend` subtask only if it is enabled.
+        if self.config.deblender is None:
+            self.subtasks.remove("deblend")


Is this the right name? It looks like in init you call it subtaskNames?

Correct! However, we no longer keep track of subtask names, so it's automatically fixed.

fred3m · 2025-01-22T18:55:53Z

python/lsst/pipe/tasks/measurementDriver.py

+        afwTable.CoordKey.addErrorFields(self.schema)
+
+        # Standard subtasks to run in sequence.
+        self.subtaskNames = ["detection", "deblend", "measurement"]


If you use my suggestion from the PR review then you won't need to keep track of subtask names, instead you'll just check the config options doDetect, doDeblend, doMeasure in run and make the subtasks and run them as needed.

python/lsst/pipe/tasks/measurementDriver.py

enourbakhsh

Looking back at the ticket I think that there's still some desired functionality that's missing. It might help to model it after MeasureMergedCoaddSourcesTask in https://github.com/lsst/pipe_tasks/blob/main/python/lsst/pipe/tasks/multiBand.py with a few convenience additions:

Added some functionalities from MeasureMergedCoaddSourcesTask. Let me know if it is not enough.

add a doDetect, doDeblend, and doMeasure config options so that a user can run any subset of the tasks.

Done!

Add an optional catalog parameter to the run method so that if doDetect is off, deblending and measurement can still be done by extending the input catalog

Done! I'm also curious about handling cases where we input a deblended catalog and only need to run measurement. What's the safest way to distinguish between a post-detection and post-deblending catalog so we can use that in input validation?

To make the above work you'll need to move the schema code into an _initializeSchema method that can be called from run if doDetect is False and a catalog is passed to the run method. To me this is part of the desired functionality to reduce the schema boilerplate, as I (and I think others) often forget house to create a new source catalog with the appropriate schema and have all of the columns copied over correctly.

Done! The schema part was a bit of a headache, but it should work now.

enourbakhsh · 2025-01-23T04:10:39Z

python/lsst/pipe/tasks/measurementDriver.py

+        """
+        # Make the `deblend` subtask only if it is enabled.
+        if self.config.deblender is None:
+            self.subtasks.remove("deblend")


Correct! However, we no longer keep track of subtask names, so it's automatically fixed.

python/lsst/pipe/tasks/measurementDriver.py

enourbakhsh · 2025-01-29T05:19:12Z

python/lsst/pipe/tasks/measurementDriver.py

+        afwTable.CoordKey.addErrorFields(self.schema)
+
+        # Standard subtasks to run in sequence.
+        self.subtaskNames = ["detection", "deblend", "measurement"]


enourbakhsh · 2025-01-29T05:29:22Z

python/lsst/pipe/tasks/measurementDriver.py

+    deblender = pexConfig.ChoiceField[str](
+        doc="The deblender to use.",
+        default="meas_deblender",
+        allowed={"meas_deblender": "Deblend using meas_deblender", "scarlet": "Deblend using scarlet"},
+    )


Followed your suggestion and made _deblendSources abstract, with each subclass having its own deblend configurable field and overriding the method in the single-band and multi-band versions.

fred3m

Thanks for integrating all of the changes. I think that there is still a little more needed to clean up the task. Thanks for your patience, but it's really close now.

fred3m · 2025-01-29T15:37:03Z

python/lsst/pipe/tasks/measurementDriver.py

+        # Placeholders for subclasses to populate.
+        self.scaleVariance: measAlgorithms.ScaleVarianceTask
+        self.detection: measAlgorithms.SourceDetectionTask
+        self.deblend: measDeblender.SourceDeblendTask | scarlet.ScarletDeblendTask
+        self.measure: measBase.SingleFrameMeasurementTask
+        self.applyApCorr: measBase.ApplyApCorrTask
+        self.catalogCalculation: measBase.CatalogCalculationTask
+        self.exposure: afwImage.Exposure
+        self.catalog: afwTable.SourceCatalog
+        self.idGenerator: measBase.IdGenerator


You can add these as class attributes instead

fred3m · 2025-01-29T15:41:00Z

python/lsst/pipe/tasks/measurementDriver.py

+                self.schema = afwTable.SourceTable.makeMinimalSchema()
+
+                # Add coordinate error fields to avoid missing field issues.
+                afwTable.CoordKey.addErrorFields(self.schema)


Maybe I'm wrong, but I think that adding this new column to the schema should be done in either case, since the schema needs to be updated with the new column. So just move this line to the end of the method.

fred3m · 2025-01-29T15:42:55Z

python/lsst/pipe/tasks/measurementDriver.py

+        if self.config.doScaleVariance and not hasattr(self, "scaleVariance"):
+            self.makeSubtask("scaleVariance")
+
+        if self.config.doDetect and not hasattr(self, "detection"):
+            self.makeSubtask("detection", schema=self.schema)
+
+        if self.config.doDeblend and not hasattr(self, "deblend"):
+            self.makeSubtask("deblend", schema=self.schema, peakSchema=self.peakSchema)
+
+        if self.config.doMeasure and not hasattr(self, "measurement"):
+            self.makeSubtask("measurement", schema=self.schema)
+
+        if self.config.doApCorr and not hasattr(self, "applyApCorr"):
+            self.makeSubtask("applyApCorr", schema=self.schema)
+
+        if self.config.doRunCatalogCalculation and not hasattr(self, "catalogCalculation"):
+            self.makeSubtask("catalogCalculation", schema=self.schema)


Ah, I see why you didn't use class attributes. But do you actually need to check if the attribute exists already? I think it's ok to always make a subtask if the config parameter is set, unless there's something that I don't understand about PipelineTasks (which is possible).

fred3m · 2025-01-29T17:44:20Z

python/lsst/pipe/tasks/measurementDriver.py

+    def run(
+        self,
+        mExposure: afwImage.MultibandExposure | list[afwImage.Exposure],
+        band: str | None = None,


Maybe call this refBand, since it's the reference band used for detection. You could potentially add a config option useReferenceForMeasurement (or something like that) to the MultiBandMeasurementDriverConfig class that tells it to run measurement on only the reference band, otherwise it will default to generating a catalog in each band.

fred3m · 2025-01-29T18:41:42Z

python/lsst/pipe/tasks/measurementDriver.py

+        self.catalog = newCatalog
+
+    @abstractmethod
+    def run(self) -> afwTable.SourceCatalog:


I see what you're doing trying to use a single run method to reuse code but I don't think that this will work. I think it would be better to make catalog, etc parameters of eg. _deblendSources and the other methods. I'll give more notes below on how this could be better to subdivide these tasks.

fred3m · 2025-01-29T19:54:30Z

python/lsst/pipe/tasks/measurementDriver.py

+        # Set psfcache.
+        self.exposure.getPsf().setCacheCapacity(self.config.psfCache)
+
+        # Scale variance plane.
+        if self.config.doScaleVariance:
+            varScale = self.scaleVariance.run(self.exposure.maskedImage)
+            self.exposure.getMetadata().add("VARIANCE_SCALE", varScale)


For multi-band this needs to be done per band, so this could be moved to a _calibrateExposure(exposure) method (or something similar) so that you can call it for each band in multi-band or pass the single band exposure for single band.

fred3m · 2025-01-29T19:55:39Z

python/lsst/pipe/tasks/measurementDriver.py

+        if self.config.doDetect:
+            if self.catalog is None:
+                # Create an empty source table with the known Schema into which
+                # detected sources will be placed next.
+                self.table = afwTable.SourceTable.make(self.schema, self.idGenerator.make_table_id_factory())
+            else:
+                raise RuntimeError(
+                    "An input catalog was given to bypass detection, but detection is still on."
+                )
+        else:
+            if self.catalog is None:
+                raise RuntimeError("Cannot run without detection if no catalog is provided.")
+            else:
+                self.log.info("Using detections from provided catalog; skipping detection")


You can move all of this code into _detectSources(catalog) and just always call that method and let it do the checking as to whether or not detection is turned on or if it just needs to return the input catalog. Or also move this to each individual run method.

fred3m · 2025-01-29T19:59:00Z

python/lsst/pipe/tasks/measurementDriver.py

+            f"fields and {len(self.catalog)} records"
+        )
+
+        return self.catalog


You could return a Struct that will be different for single and multi-band, as the single band will have at a minimum the output catalog (and as you mentioned in person maybe backgrounds and other intermediate data products) and multi-band should have a catalog in each band and the scarlet model data (see comment below).

fred3m · 2025-01-29T23:59:41Z

python/lsst/pipe/tasks/measurementDriver.py

+        # Strip HeavyFootprints to save space on disk.
+        if self.config.doStripHeavyFootprints:
+            sources = self.catalog
+            for source in sources[sources["parent"] != 0]:
+                source.setFootprint(None)


This is only for pipeline tasks. It should never be used as it removes all of the results of deblending.

fred3m · 2025-01-30T00:00:14Z

python/lsst/pipe/tasks/measurementDriver.py

+        return super().run()
+
+    def _deblendSources(self):
+        self.catalog, modelData = self.deblend.run(mExposure=self.mExposure, mergedSources=self.catalog)


modelData should be added to the struct that is returned, as it contains important information about blends that is often useful for debugging.

enourbakhsh mentioned this pull request Jan 7, 2025

DM-7847: Add mid-level drivers for measurement algorithms lsst/meas_algorithms#406

Closed

enourbakhsh force-pushed the tickets/DM-7847 branch from 78474d6 to dd1ce59 Compare January 7, 2025 20:25

enourbakhsh commented Jan 7, 2025

View reviewed changes

fred3m requested changes Jan 9, 2025

View reviewed changes

enourbakhsh force-pushed the tickets/DM-7847 branch 2 times, most recently from a18044d to 30f2f2f Compare January 22, 2025 04:58

enourbakhsh commented Jan 22, 2025

View reviewed changes

enourbakhsh force-pushed the tickets/DM-7847 branch from 30f2f2f to c5958af Compare January 22, 2025 05:53

enourbakhsh commented Jan 22, 2025

View reviewed changes

fred3m requested changes Jan 22, 2025

View reviewed changes

enourbakhsh commented Jan 29, 2025

View reviewed changes

enourbakhsh added 2 commits January 29, 2025 00:43

Add mid-level measurement driver task

8bf6291

Address review comments round 1 (to be squashed)

08f34c1

enourbakhsh force-pushed the tickets/DM-7847 branch from 8f18e67 to 50c9f3e Compare January 29, 2025 05:43

Address review comments round 2 (to be squashed)

aa831ce

enourbakhsh force-pushed the tickets/DM-7847 branch from 50c9f3e to aa831ce Compare January 29, 2025 07:06

fred3m requested changes Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-7847: Add mid-level drivers for measurement algorithms #1020

DM-7847: Add mid-level drivers for measurement algorithms #1020

enourbakhsh commented Jan 7, 2025

enourbakhsh Jan 7, 2025 •

edited

Loading

fred3m Jan 8, 2025

enourbakhsh Jan 22, 2025 •

edited

Loading

fred3m left a comment

fred3m Jan 8, 2025

enourbakhsh Jan 22, 2025

fred3m Jan 22, 2025

enourbakhsh Jan 29, 2025

fred3m Jan 8, 2025

enourbakhsh left a comment

enourbakhsh Jan 22, 2025

enourbakhsh Jan 22, 2025 •

edited

Loading

enourbakhsh Jan 22, 2025

fred3m left a comment

fred3m Jan 22, 2025

enourbakhsh Jan 23, 2025

fred3m Jan 22, 2025

enourbakhsh Jan 29, 2025

enourbakhsh left a comment •

edited

Loading

enourbakhsh Jan 23, 2025

enourbakhsh Jan 29, 2025

enourbakhsh Jan 29, 2025

fred3m left a comment

fred3m Jan 29, 2025

fred3m Jan 29, 2025

fred3m Jan 29, 2025

fred3m Jan 29, 2025

fred3m Jan 29, 2025

fred3m Jan 29, 2025

fred3m Jan 29, 2025

fred3m Jan 29, 2025

fred3m Jan 29, 2025

fred3m Jan 30, 2025

		is particularly suited for simple use cases, such as processing images
		without neighbor-noise-replacement or extensive configuration.

DM-7847: Add mid-level drivers for measurement algorithms #1020

Are you sure you want to change the base?

DM-7847: Add mid-level drivers for measurement algorithms #1020

Conversation

enourbakhsh commented Jan 7, 2025

enourbakhsh Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enourbakhsh Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

fred3m left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enourbakhsh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enourbakhsh Jan 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fred3m left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enourbakhsh left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fred3m left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enourbakhsh Jan 7, 2025 •

edited

Loading

enourbakhsh Jan 22, 2025 •

edited

Loading

enourbakhsh Jan 22, 2025 •

edited

Loading

enourbakhsh left a comment •

edited

Loading