Proposal for a new interface. #334

DinoBektesevic · 2023-07-07T01:14:46Z

See example usage in this gist.

Note that the functionality within the classes themselves is not necessarily correct. It was a lot of work so some things have cut corners.

Some classes are maybe not named the best, see: DECamCPFits for example. Rename
DECamCPFits also uses the astro_metadata_translator package but only works for DECam - no bueno, either make the header translation yourself or make that class general enough to work for all instruments supported by metadata translator.
Everything about how DECamCPFits works is completely made up and does not represent the data correctly. Fix how variance is calculated, fix how masks are created. Find datasets to test on because that's the current difficulty.
Nearly all Standardizers except the ButlerStandardizer have made up PSFs - figure out what to do with them
Standardizers don't take in a config that sets how masks or PSFs are created
Write tests, unittests, CI tests etc.
Try to look into moving the C++ code onto ndarray or something, the cost of copying the Python arrays is super large and if we could just pass a pointer via numpy arrays API it would basically be a no-op. We could internalize the GPU representation in C++ code if we really need to flatten them on transfer to GPU via the strides or something.
I'm sure there's more, I left comments strewn throughout the code where I got stuck on something.

Use it, I'm sure there'll be more ideas on what we can fix or add or change. I'm still trying to work out how best to integrate in the repo.

DinoBektesevic · 2023-07-07T06:45:44Z

DinoBektesevic · 2023-07-07T16:08:48Z

Time the masking code and make sure we haven't lost too much by moving that out of C++. Try to optimize, or just move to C++ if it's that bad.

src/kbmod/image_collection.py

src/kbmod/standardizers/butler_standardizer.py

src/kbmod/image_collection.py

jeremykubica · 2023-11-28T14:02:31Z

src/kbmod/image_collection.py

+        # solution and a flat lookup table.
+        # If they weren't added when standardizers were unravelled it's
+        # basically not possible to reconstruct them. Guess attempt is no good?
+        no_std_map = False


What survey's have this type of nested data? I remember discussing this in terms of the type of input file, but can't remember where we'd expect to see those input files.

Edit: I see you mention deccam below. Were the previous files KBMOD used flattened as part of preprocessing?

Almost all of them are like this. DECam processed via the Science Pipelines will have 1 FITS file per CCD. DECam processed via Community pipelines will have 1 FITS file per 62-72 CCDs depending on the data product in question. Raws will have 72 (focus and tracking CCDs) HDUs, calibrated exposures will have 62 HDUs (just the science CCDs).

This doesn't mean that the Rubin Sci. Pipes. will have only 1 HDU. They have ~16HDUs per FITS file processed to a calexp and ~30 for coadds. It's just that the other HDUs hold data like PSF, mask, variance etc. whereas for CommunityPipeline products these are separate data products (different FITS files for example).

More details are given in the following comment and the design document as well.

src/kbmod/image_collection.py

src/kbmod/standardizers/fits_standardizers/multi_extension_fits.py

src/kbmod/standardizers/standardizer.py

jeremykubica · 2023-11-28T14:26:23Z

src/kbmod/standardizers/standardizer.py

+        raise NotImplementedError()
+
+    # no idea really what to do bout this one? AFAIK almost no images come
+    # with PSFs in them


What is Rubin's plan for providing PSFs? Is is a separate query?

It's in the FITS file (that is, it's part of the object you can retrieve via the Butler), but it's their own internal representation of a PSF - so this isn't something we can just rely on as a generic thing. We can make the ButlerStandardizer use it to load up the object in Python - but then we need to do something with that object, like evaluate a realization of a PSF at some wavelength and position on the CCD as a numpy array for example. But even then we need to line up that realization with pixel size basically, so that the realization is basically over the same pixels as our images, i.e. so their physical sizes are comparable, because that's how the current PSF class works - the kernel dimension is directly related to the pixel size. For other instruments I've no idea what to do tbh.

A part of the problem is that PSF class we have is not a good representation of a PSF. Not because Gaussians are bad functional representation of a PSFm but because it only stores a representation of a PSF as an array and not some functional expression.
But even then, as far as I have seen there are no ways the Rubin PSF class can be "exported" as any of the to extract the functional PSF forms from AstroPy (f.e. like moffat) or photutils - which again wouldn't even help us that much because we still need an C++ representation of the same again if we want to evaluate it at points (in a way that is generic to where we can do things like PSFs that vary across the CCD) and so on. I guess we can evaluate it and then fit it - but the costs involved here are kinda crazy.

src/kbmod/image_collection.py

src/kbmod/standardizers/standardizer.py

tests/test_standardizer.py

src/kbmod/image_collection.py

jeremykubica · 2024-01-18T20:56:12Z

tests/test_butlerstd.py

+        # Rubin Sci. Pipes. return their own internal SkyWcs object. We mock a
+        # Header that'll work with ButlerStd instead. It works because in the
+        # STD we cast SkyWcs to dict-like thing, from which we make a WCS. What
+        # happens if SkyWcs changes though?


We could add a test that creates a SkyWcs and makes sure it can convert to a dict-like thing? Or, more realistically we can do a fix if it changes later.

SkyWcs comes from the lsst.afw, if I mock it I can mock it in a way that it dictifies, but that still doesn't tell me anything about the real SkyWcs from AFW C++ code used by Rubin. Thankfully this is not likely to change, but generally doing these tests is a pain. We need a canary CI that will build from Rubin weeklies, but the setup is hard so it's a big job to do and the CI run itself will probably go on for a few hours as it would need to build the weekly, run some data through it and then run the KBMOD tests.
This has been my long-time goal for a while though.

I'm not sure how to create a SkyWcs here, could you elaborate?

I'm also not sure how to create a SkyWcs either. I was just suggesting that, if we could create one, that we could use that to watch for breakages.

DinoBektesevic · 2024-01-26T19:32:48Z

Blocked by #440
Do not merge without rebasing after #440 merges.

…ment.

performance penalty. _isImageLike loads every image from the disk, and then promptly forgets them. This is super costly. Rejigger the whole canStandardize pipeline to get that to go faster. Cleanup some docs, remove _isMultiExt method - needless overhead.

Required changing how Standardizer.get works, which also fixed how `forceStandardizer` works and how "unravelling" of metadata from Standardizers work. It provides even more impetus to record some metadata into the .meta attribute; so that the full butler could be reconstructed from it. This requires some thinking - but it could be handled by silently ignoring **kwargs from meta that are not explicitly defined in a particular Standardizer's __init__. This was significantly harder than anticipated and very unintuitive. Reworked the way bitmasks work - now via Astropy.bitmask module.

Implement the methods required to run KBMOD on ImageCollection. Cleanup ImageCollection behaviour: *) return image collection when indexed by lists, arrays and slices *) return Row when indexed by integer *) return Table when sliced by columns *) Rename the exts to processable. Alias it to a property so that each Standardizer can implement its own internal structure the way it wants (but also because I was too lazy to rename everything) *) Fix documentation *) Move WCS and BBOX as properties to a Standardizer - if that's where we need to explain why they are special that's where they need to live. Make them an abstractproperty and demand that the Standardizers return a list of None's if need be. *) Fix forceStandardizer keyword (again). *) Add toLayeredImage as an abstract method to the Standardizers Implement them for the three example Standardizers we have. *) Add toImageStack as an abstract method to the standardizers. Implment them in ImageCollection *) Add run method prototype to ImageCollection to showcase how we can neatly integrate with the ImageCollection to execute KBMOD runs. Write an example python script showcasing most of this functionality. TODO: tests, unittests, integrationtests all the tests.

This is perhaps not the best solution, but the old solution lead to a proliferation of factory functions that basically interpreted the different input data type and then constructed standardizers from that. This floated up to ImgCollection which now had about a million fromHDUL fromHDULs fromPath fromPaths etc.

…onfig.

Refactor the mock_fits util a bit. Cleanup affected code. Fix the wrong resolveTarget for ButlerStandardizer. Update the init methods for fits files to specialize for hdulist and location instead of a target. This is required to make roundtripping of ImageCollection work, but should be updated in the future.

DinoBektesevic requested review from cchris28 and jeremykubica July 7, 2023 06:45

jeremykubica reviewed Jul 12, 2023

View reviewed changes

DinoBektesevic removed the request for review from cchris28 September 18, 2023 18:28

DinoBektesevic force-pushed the proposals/kbi branch from bebf093 to aebc80c Compare October 31, 2023 16:40

DinoBektesevic force-pushed the proposals/kbi branch from aebc80c to 2ef1a76 Compare November 22, 2023 20:09

jeremykubica reviewed Nov 28, 2023

View reviewed changes

DinoBektesevic force-pushed the proposals/kbi branch from 2ef1a76 to 61c15f3 Compare December 7, 2023 20:25

This was referenced Dec 21, 2023

Data ingestion from Butler #413

Closed

Data ingestion and standardization from FITS #414

Closed

DinoBektesevic force-pushed the proposals/kbi branch 4 times, most recently from b2fcd9a to c2ec4dc Compare January 17, 2024 17:37

DinoBektesevic marked this pull request as ready for review January 17, 2024 17:41

DinoBektesevic mentioned this pull request Jan 18, 2024

Transition to pyproject.toml and enable automatic module detection. #435

Merged

jeremykubica reviewed Jan 18, 2024

View reviewed changes

jeremykubica mentioned this pull request Jan 19, 2024

Clean up fake data generation #436

Closed

DinoBektesevic force-pushed the proposals/kbi branch from 3da0f39 to 5a440ed Compare January 23, 2024 01:22

jeremykubica approved these changes Jan 29, 2024

View reviewed changes

DinoBektesevic added 7 commits January 29, 2024 09:32

Initial sketch of the new interface.

90b4d6d

Cleanup incomplete modules and accidentally commited files. Fixup #1

03d8544

Clean up the test implementation, be more truthful to the design docu…

10908ce

…ment.

Add equality comparisons and serialization.

2db7e87

DinoBektesevic added 21 commits January 29, 2024 09:32

Fix spelling issues referenced in the review.

7dc8df8

Organize the standardizers.

e68a9e7

Add configuration to standardizers.

5c1c05e

Improve on the Config class.

3c22dac

Add Standardizers and tests. Rework Config (again).

fa7a02b

Add more tests, attempt to integrate with ImageCollection.

91df9ad

Add ButlerStandardizer mocking and tests. Add tests for StandardizerC…

f8930b7

…onfig.

Add _repr_html_ to ImageCollection for better printouts in notebooks.

caddb15

Rename forceStd and stdConfig, superfluous and annoying.

27f9155

Fix a bug with IC.fromDir and IC.toLayeredImage.

e6524f8

Fix big with defered import in butler std.

619a503

Fix bug in Fits Standardizers.

7c44e78

Cleanup on the PR comments.

6f592c4

Mock lsst.daf.butler module as dafButler in tests.

0d517a3

Blacken.

bcbd66d

Clarify comments on the initialization of imgcollection.

bef59b8

Fix import issue with missing modules.

3c2d363

Add export to WorkUnit to ImageCollection.

c0f0c98

Respond to PR comments and clean up rebase mistakes.

000cf61

DinoBektesevic force-pushed the proposals/kbi branch from aa9fccb to c659a29 Compare January 29, 2024 17:38

Sort table based on obstime column.

ffac246

DinoBektesevic force-pushed the proposals/kbi branch from c659a29 to ffac246 Compare January 29, 2024 17:44

DinoBektesevic merged commit c4dfc13 into main Jan 29, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for a new interface. #334

Proposal for a new interface. #334

DinoBektesevic commented Jul 7, 2023

DinoBektesevic commented Jul 7, 2023

DinoBektesevic commented Jul 7, 2023

jeremykubica Nov 28, 2023

DinoBektesevic Jan 10, 2024

jeremykubica Nov 28, 2023

DinoBektesevic Jan 10, 2024

jeremykubica Jan 18, 2024

DinoBektesevic Jan 23, 2024 •

edited

Loading

jeremykubica Jan 25, 2024

DinoBektesevic commented Jan 26, 2024

Proposal for a new interface. #334

Proposal for a new interface. #334

Conversation

DinoBektesevic commented Jul 7, 2023

DinoBektesevic commented Jul 7, 2023

DinoBektesevic commented Jul 7, 2023

jeremykubica Nov 28, 2023

Choose a reason for hiding this comment

DinoBektesevic Jan 10, 2024

Choose a reason for hiding this comment

jeremykubica Nov 28, 2023

Choose a reason for hiding this comment

DinoBektesevic Jan 10, 2024

Choose a reason for hiding this comment

jeremykubica Jan 18, 2024

Choose a reason for hiding this comment

DinoBektesevic Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

jeremykubica Jan 25, 2024

Choose a reason for hiding this comment

DinoBektesevic commented Jan 26, 2024

DinoBektesevic Jan 23, 2024 •

edited

Loading