-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for a new interface. #334
Conversation
I came up with a couple of more want-to's and good to-do's:
These would all be good ways to create an The exported FITS file could have a primary header, or a BinTable HDU, containing the internal metadata and indices that could carry the metadata table with it. The challenge would be associating masks, variance and science exposure extensions with each other - which would need a mechanism (a lookup table of names or extension indices) in it. This could be based on the existing indices in the table-row-to-serializer-extension-index but they could also be unrelated to that. |
|
bebf093
to
aebc80c
Compare
aebc80c
to
2ef1a76
Compare
src/kbmod/image_collection.py
Outdated
# solution and a flat lookup table. | ||
# If they weren't added when standardizers were unravelled it's | ||
# basically not possible to reconstruct them. Guess attempt is no good? | ||
no_std_map = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What survey's have this type of nested data? I remember discussing this in terms of the type of input file, but can't remember where we'd expect to see those input files.
Edit: I see you mention deccam below. Were the previous files KBMOD used flattened as part of preprocessing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all of them are like this. DECam processed via the Science Pipelines will have 1 FITS file per CCD. DECam processed via Community pipelines will have 1 FITS file per 62-72 CCDs depending on the data product in question. Raws will have 72 (focus and tracking CCDs) HDUs, calibrated exposures will have 62 HDUs (just the science CCDs).
This doesn't mean that the Rubin Sci. Pipes. will have only 1 HDU. They have ~16HDUs per FITS file processed to a calexp and ~30 for coadds. It's just that the other HDUs hold data like PSF, mask, variance etc. whereas for CommunityPipeline products these are separate data products (different FITS files for example).
More details are given in the following comment and the design document as well.
raise NotImplementedError() | ||
|
||
# no idea really what to do bout this one? AFAIK almost no images come | ||
# with PSFs in them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is Rubin's plan for providing PSFs? Is is a separate query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the FITS file (that is, it's part of the object you can retrieve via the Butler), but it's their own internal representation of a PSF - so this isn't something we can just rely on as a generic thing. We can make the ButlerStandardizer
use it to load up the object in Python - but then we need to do something with that object, like evaluate a realization of a PSF at some wavelength and position on the CCD as a numpy array for example. But even then we need to line up that realization with pixel size basically, so that the realization is basically over the same pixels as our images, i.e. so their physical sizes are comparable, because that's how the current PSF class works - the kernel dimension is directly related to the pixel size. For other instruments I've no idea what to do tbh.
A part of the problem is that PSF class we have is not a good representation of a PSF. Not because Gaussians are bad functional representation of a PSFm but because it only stores a representation of a PSF as an array and not some functional expression.
But even then, as far as I have seen there are no ways the Rubin PSF class can be "exported" as any of the to extract the functional PSF forms from AstroPy (f.e. like moffat) or photutils - which again wouldn't even help us that much because we still need an C++ representation of the same again if we want to evaluate it at points (in a way that is generic to where we can do things like PSFs that vary across the CCD) and so on. I guess we can evaluate it and then fit it - but the costs involved here are kinda crazy.
2ef1a76
to
61c15f3
Compare
b2fcd9a
to
c2ec4dc
Compare
# Rubin Sci. Pipes. return their own internal SkyWcs object. We mock a | ||
# Header that'll work with ButlerStd instead. It works because in the | ||
# STD we cast SkyWcs to dict-like thing, from which we make a WCS. What | ||
# happens if SkyWcs changes though? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a test that creates a SkyWcs and makes sure it can convert to a dict-like thing? Or, more realistically we can do a fix if it changes later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SkyWcs
comes from the lsst.afw
, if I mock it I can mock it in a way that it dictifies, but that still doesn't tell me anything about the real SkyWcs
from AFW
C++ code used by Rubin. Thankfully this is not likely to change, but generally doing these tests is a pain. We need a canary CI that will build from Rubin weeklies, but the setup is hard so it's a big job to do and the CI run itself will probably go on for a few hours as it would need to build the weekly, run some data through it and then run the KBMOD tests.
This has been my long-time goal for a while though.
I'm not sure how to create a SkyWcs here, could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure how to create a SkyWcs
either. I was just suggesting that, if we could create one, that we could use that to watch for breakages.
3da0f39
to
5a440ed
Compare
performance penalty. _isImageLike loads every image from the disk, and then promptly forgets them. This is super costly. Rejigger the whole canStandardize pipeline to get that to go faster. Cleanup some docs, remove _isMultiExt method - needless overhead.
Required changing how Standardizer.get works, which also fixed how `forceStandardizer` works and how "unravelling" of metadata from Standardizers work. It provides even more impetus to record some metadata into the .meta attribute; so that the full butler could be reconstructed from it. This requires some thinking - but it could be handled by silently ignoring **kwargs from meta that are not explicitly defined in a particular Standardizer's __init__. This was significantly harder than anticipated and very unintuitive. Reworked the way bitmasks work - now via Astropy.bitmask module.
Implement the methods required to run KBMOD on ImageCollection. Cleanup ImageCollection behaviour: *) return image collection when indexed by lists, arrays and slices *) return Row when indexed by integer *) return Table when sliced by columns *) Rename the exts to processable. Alias it to a property so that each Standardizer can implement its own internal structure the way it wants (but also because I was too lazy to rename everything) *) Fix documentation *) Move WCS and BBOX as properties to a Standardizer - if that's where we need to explain why they are special that's where they need to live. Make them an abstractproperty and demand that the Standardizers return a list of None's if need be. *) Fix forceStandardizer keyword (again). *) Add toLayeredImage as an abstract method to the Standardizers Implement them for the three example Standardizers we have. *) Add toImageStack as an abstract method to the standardizers. Implment them in ImageCollection *) Add run method prototype to ImageCollection to showcase how we can neatly integrate with the ImageCollection to execute KBMOD runs. Write an example python script showcasing most of this functionality. TODO: tests, unittests, integrationtests all the tests.
This is perhaps not the best solution, but the old solution lead to a proliferation of factory functions that basically interpreted the different input data type and then constructed standardizers from that. This floated up to ImgCollection which now had about a million fromHDUL fromHDULs fromPath fromPaths etc.
Refactor the mock_fits util a bit. Cleanup affected code. Fix the wrong resolveTarget for ButlerStandardizer. Update the init methods for fits files to specialize for hdulist and location instead of a target. This is required to make roundtripping of ImageCollection work, but should be updated in the future.
aa9fccb
to
c659a29
Compare
c659a29
to
ffac246
Compare
See example usage in this gist.
Note that the functionality within the classes themselves is not necessarily correct. It was a lot of work so some things have cut corners.
DECamCPFits
for example. RenameDECamCPFits
also uses theastro_metadata_translator
package but only works for DECam - no bueno, either make the header translation yourself or make that class general enough to work for all instruments supported by metadata translator.DECamCPFits
works is completely made up and does not represent the data correctly. Fix how variance is calculated, fix how masks are created. Find datasets to test on because that's the current difficulty.Standardizers
except theButlerStandardizer
have made up PSFs - figure out what to do with themStandardizers
don't take in a config that sets how masks or PSFs are createdstrides
or something.Use it, I'm sure there'll be more ideas on what we can fix or add or change. I'm still trying to work out how best to integrate in the repo.