Image model (c3-time-domain#34)

guynir42 · Jun 15, 2023 · 20d1ece · 20d1ece
1 parent 238e696
commit 20d1ece
Show file tree

Hide file tree

Showing 22 changed files with 134,425 additions and 213 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,6 @@
+# some specific files we don't want in the repo
+data/DECam_examples/c4d_221104_074232_ori.fits.fz
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]

diff --git a/README.md b/README.md
@@ -211,7 +211,7 @@ data into numpy arrays.
 For example: 
 
 ```python
-from matplotlib.pyplot import plt 
+import matplotlib.pyplot as plt 
 from models.exposure import Exposure
 exp = Exposure(filename='Camera_Project_2021-01-01T00:00:00.000.fits')
 plt.show(exp.data[0])
@@ -379,3 +379,82 @@ and in the `test_instrument.py` file in the `tests/models` folder.
 
 To be added... 
 
+
+
+### Image data and headers
+
+A really important requirement from this pipeline is to make the data quickly accessible. 
+So where is the data stored and how to get it quickly? 
+
+Each Exposure object is associated with a single FITS file (or sometimes multiple files, for different sections). 
+To get the imaging data for an Exposure, simply call the `data` property. This dictionary-like object will
+provide a numpy array for each section of the instrument:
+
+```python
+exp = Exposure('path/to/file.fits')
+print(type(exp.data))  # this is a SectionData object, defined in models/exposure.py
+print(type(exp.data[0]))  # numpy array for section zero
+
+for section_id in exp.instrument_object.get_section_ids():
+    print(exp.data[section_id].shape)  # print the shape of each section
+```
+
+The `data` property is a SectionData object, which acts like a dictionary 
+that lazy loads the data array from the FITS file when needed
+(in most cases these will be FITS files, but other formats can be added just as well). 
+For single-section instruments, `data[0]` will usually be good enough. 
+When there are several sections, use `exp.instrument_object.get_section_ids()` to get the list of section IDs. 
+Note that these could be integers or strings, but the SectionData can use either type. 
+
+Header information is also loaded from the FITS file, but this information can be kept in three different places. 
+The first is the `header` property of the Exposure object. This is a dictionary-like object that contains
+a small subset of the full FITS header. Generally only the properties we intend to query on will be saved here. 
+Since some of this information is given as independent columns (like `exp_time`), the `header` column does not 
+necessarily keep much information beyond that. Note that this header is filled using the global header, 
+not the header of individual sections.
+The keywords in this header are all lower-case, and are translated to standardized names using the
+`_get_header_keyword_translations()` method of the Instrument class. This makes it easy to tell them apart
+from the raw header information (in upper case) which also uses instrument-specific keywords. 
+The value of the header cards are also converted to standard units using the `_get_header_values_converters()`
+
+In addition to the `header` column which is saved to the database, the Exposure also has a `raw_header` and
+a `section_headers` properties. The `raw_header` is a dictionary-like object that contains the full FITS header
+of the file. This is not saved to the database, but is lazy loaded from the file when needed. 
+The raw headers use all upper case keywords. They are saved with the file on disk, and are not kept in the database. 
+The `section_headers` property is a SectionHeaders object (also defined in `models/exposure.py`) 
+which acts like a dictionary that lazy loads the FITS header from file for a specific section when needed. 
+Note that the "global" header could be the same as the header of the first section. 
+This usually happens if the raw data includes a separate FITS file for each section. 
+Each one would have a different raw header, and the "exposure global header" would arbitrarily be the header of the 
+first file. If the instrument only has one section, this is trivially true as well. 
+In cases where multiple section data is saved in one FITS file, there would usually be a primary HDU that contains
+the global exposure header information, and additional extension HDUs with their own image data and headers. 
+In this case the `section_headers` are all different from the `raw_header`. 
+
+After running basic pre-processing, we split each Exposure object into one or more Image objects. 
+These are already section-specific, so we have less properties to track when looking for the data or headers. 
+The Image object's `data` property contains the pixel values (usually after some pre-processing). 
+In addition to the pixel values, we also keep some more data arrays relevant to the image. 
+These include the `flags` array, which is an integer bit-flag array marking things like bad pixels,
+the `weight` array, giving the inverse variance of each pixel (noise model), 
+and additional, optional arrays like the `score` array which is a "match-filtered" image, 
+normalized to units of signal-to-noise. 
+If the point spread function (PSF) of the image is calculated, it can be stored in the `psf` property.
+These arrays are all numpy arrays, and are saved to disk using the format defined in the config file. 
+
+The Image object's `raw_header` property contains the section-specific header, copied directly from 
+the Exposure's `section_headers` property. Some header keywords may be added or modified in the pre-processing step. 
+This header is saved to the file, and not the database. 
+The Image object's `header` property contains a subset of the section-specific header. 
+Again this header uses standardized names, in lower case, that are searchable on the database. 
+This header is produced during the pre-processing step and contains only the important (searchable) keywords. 
+
+The Image data is saved to disk using the format defined in the config file (using `storage.images.format`). 
+Unlike the Exposure object, which is linked to files that were created by an instrument we are not in control of, 
+the files associated with an Image object are created by the pipeline. 
+We can also choose to save all the different arrays (data, weight, flags, etc.) in different files, 
+or in the same file (using multiple extensions). This is defined in the config file using `storage.images.single_file`. 
+In either case, the additional arrays are saved with their own headers, which are all identical to the Image object's
+`raw_header` dictionary. 
+
+
diff --git a/alembic/versions/2023_05_31_1639-4114e36a2555_image_model.py b/alembic/versions/2023_05_31_1639-4114e36a2555_image_model.py
@@ -0,0 +1,135 @@
+"""image model
+
+Revision ID: 4114e36a2555
+Revises: f940bef6bf71
+Create Date: 2023-05-31 16:39:35.909083
+
+"""
+from alembic import op
+import sqlalchemy as sa
+from sqlalchemy.dialects import postgresql
+
+# revision identifiers, used by Alembic.
+revision = '4114e36a2555'
+down_revision = 'f940bef6bf71'
+branch_labels = None
+depends_on = None
+
+
+def upgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    image_type = postgresql.ENUM('science', 'reference', 'difference', 'bias', 'dark', 'flat', name='image_type')
+    image_type.create(op.get_bind())
+    image_format = postgresql.ENUM('fits', 'hdf5', name='image_format')
+    image_format.create(op.get_bind())
+    image_combine_method = postgresql.ENUM('coadd', 'subtraction', name='image_combine_method')
+    image_combine_method.create(op.get_bind())
+
+    op.create_table('image_sources',
+    sa.Column('source_id', sa.Integer(), nullable=False),
+    sa.Column('combined_id', sa.Integer(), nullable=False),
+    sa.ForeignKeyConstraint(['combined_id'], ['images.id'], ondelete='CASCADE'),
+    sa.ForeignKeyConstraint(['source_id'], ['images.id'], ondelete='CASCADE'),
+    sa.PrimaryKeyConstraint('source_id', 'combined_id')
+    )
+    op.add_column('exposures', sa.Column('type', sa.Enum('science', 'reference', 'difference', 'bias', 'dark', 'flat', name='image_type'), nullable=False))
+    op.add_column('exposures', sa.Column('format', sa.Enum('fits', 'hdf5', name='image_format'), nullable=False))
+    op.drop_index('ix_exposures_section_id', table_name='exposures')
+    op.create_index(op.f('ix_exposures_type'), 'exposures', ['type'], unique=False)
+    op.drop_column('exposures', 'section_id')
+    op.add_column('images', sa.Column('exposure_id', sa.BigInteger(), nullable=True))
+    op.add_column('images', sa.Column('combine_method', sa.Enum('coadd', 'subtraction', name='image_combine_method'), nullable=True))
+    op.add_column('images', sa.Column('type', sa.Enum('science', 'reference', 'difference', 'bias', 'dark', 'flat', name='image_type'), nullable=False))
+    op.add_column('images', sa.Column('format', sa.Enum('fits', 'hdf5', name='image_format'), nullable=False))
+    op.add_column('images', sa.Column('provenance_id', sa.BigInteger(), nullable=False))
+    op.add_column('images', sa.Column('header', postgresql.JSONB(astext_type=sa.Text()), nullable=False))
+    op.add_column('images', sa.Column('mjd', sa.Double(), nullable=False))
+    op.add_column('images', sa.Column('end_mjd', sa.Double(), nullable=False))
+    op.add_column('images', sa.Column('exp_time', sa.Float(), nullable=False))
+    op.add_column('images', sa.Column('instrument', sa.Text(), nullable=False))
+    op.add_column('images', sa.Column('telescope', sa.Text(), nullable=False))
+    op.add_column('images', sa.Column('filter', sa.Text(), nullable=False))
+    op.add_column('images', sa.Column('section_id', sa.Text(), nullable=False))
+    op.add_column('images', sa.Column('project', sa.Text(), nullable=False))
+    op.add_column('images', sa.Column('target', sa.Text(), nullable=False))
+    op.add_column('images', sa.Column('filepath', sa.Text(), nullable=False))
+    op.add_column('images', sa.Column('filepath_extensions', sa.ARRAY(sa.Text()), nullable=True))
+    op.add_column('images', sa.Column('ra', sa.Double(), nullable=False))
+    op.add_column('images', sa.Column('dec', sa.Double(), nullable=False))
+    op.add_column('images', sa.Column('gallat', sa.Double(), nullable=True))
+    op.add_column('images', sa.Column('gallon', sa.Double(), nullable=True))
+    op.add_column('images', sa.Column('ecllat', sa.Double(), nullable=True))
+    op.add_column('images', sa.Column('ecllon', sa.Double(), nullable=True))
+    op.create_index('images_q3c_ang2ipix_idx', 'images', [sa.text('q3c_ang2ipix(ra, dec)')], unique=False)
+    op.create_index(op.f('ix_images_combine_method'), 'images', ['combine_method'], unique=False)
+    op.create_index(op.f('ix_images_ecllat'), 'images', ['ecllat'], unique=False)
+    op.create_index(op.f('ix_images_end_mjd'), 'images', ['end_mjd'], unique=False)
+    op.create_index(op.f('ix_images_exp_time'), 'images', ['exp_time'], unique=False)
+    op.create_index(op.f('ix_images_exposure_id'), 'images', ['exposure_id'], unique=False)
+    op.create_index(op.f('ix_images_filepath'), 'images', ['filepath'], unique=True)
+    op.create_index(op.f('ix_images_filter'), 'images', ['filter'], unique=False)
+    op.create_index(op.f('ix_images_gallat'), 'images', ['gallat'], unique=False)
+    op.create_index(op.f('ix_images_instrument'), 'images', ['instrument'], unique=False)
+    op.create_index(op.f('ix_images_mjd'), 'images', ['mjd'], unique=False)
+    op.create_index(op.f('ix_images_provenance_id'), 'images', ['provenance_id'], unique=False)
+    op.create_index(op.f('ix_images_section_id'), 'images', ['section_id'], unique=False)
+    op.create_index(op.f('ix_images_project'), 'images', ['project'], unique=False)
+    op.create_index(op.f('ix_images_target'), 'images', ['target'], unique=False)
+    op.create_index(op.f('ix_images_telescope'), 'images', ['telescope'], unique=False)
+    op.create_index(op.f('ix_images_type'), 'images', ['type'], unique=False)
+    op.create_foreign_key(None, 'images', 'provenances', ['provenance_id'], ['id'], ondelete='CASCADE')
+    op.create_foreign_key(None, 'images', 'exposures', ['exposure_id'], ['id'])
+    # ### end Alembic commands ###
+
+
+def downgrade() -> None:
+    # ### commands auto generated by Alembic - please adjust! ###
+    op.drop_constraint(None, 'images', type_='foreignkey')
+    op.drop_index(op.f('ix_images_type'), table_name='images')
+    op.drop_index(op.f('ix_images_telescope'), table_name='images')
+    op.drop_index(op.f('ix_images_project'), table_name='images')
+    op.drop_index(op.f('ix_images_target'), table_name='images')
+    op.drop_index(op.f('ix_images_section_id'), table_name='images')
+    op.drop_index(op.f('ix_images_provenance_id'), table_name='images')
+    op.drop_index(op.f('ix_images_mjd'), table_name='images')
+    op.drop_index(op.f('ix_images_instrument'), table_name='images')
+    op.drop_index(op.f('ix_images_gallat'), table_name='images')
+    op.drop_index(op.f('ix_images_filter'), table_name='images')
+    op.drop_index(op.f('ix_images_filepath'), table_name='images')
+    op.drop_index(op.f('ix_images_exposure_id'), table_name='images')
+    op.drop_index(op.f('ix_images_exp_time'), table_name='images')
+    op.drop_index(op.f('ix_images_end_mjd'), table_name='images')
+    op.drop_index(op.f('ix_images_ecllat'), table_name='images')
+    op.drop_index(op.f('ix_images_combine_method'), table_name='images')
+    op.drop_index('images_q3c_ang2ipix_idx', table_name='images')
+    op.drop_column('images', 'ecllon')
+    op.drop_column('images', 'ecllat')
+    op.drop_column('images', 'gallon')
+    op.drop_column('images', 'gallat')
+    op.drop_column('images', 'dec')
+    op.drop_column('images', 'ra')
+    op.drop_column('images', 'filepath_extensions')
+    op.drop_column('images', 'filepath')
+    op.drop_column('images', 'target')
+    op.drop_column('images', 'project')
+    op.drop_column('images', 'section_id')
+    op.drop_column('images', 'filter')
+    op.drop_column('images', 'telescope')
+    op.drop_column('images', 'instrument')
+    op.drop_column('images', 'exp_time')
+    op.drop_column('images', 'end_mjd')
+    op.drop_column('images', 'mjd')
+    op.drop_column('images', 'header')
+    op.drop_column('images', 'provenance_id')
+    op.drop_column('images', 'type')
+    op.drop_column('images', 'format')
+    op.drop_column('images', 'combine_method')
+    op.drop_column('images', 'exposure_id')
+    op.add_column('exposures', sa.Column('section_id', sa.TEXT(), autoincrement=False, nullable=False))
+    op.drop_index(op.f('ix_exposures_type'), table_name='exposures')
+    op.create_index('ix_exposures_section_id', 'exposures', ['section_id'], unique=False)
+    op.create_index('exposure_q3c_ang2ipix_idx', 'exposures', [sa.text('q3c_ang2ipix(ra, "dec")')], unique=False)
+    op.drop_column('exposures', 'type')
+    op.drop_column('exposures', 'format')
+    op.drop_table('image_sources')
+    # ### end Alembic commands ###
diff --git a/data/DECam_examples/c4d_20221002_040239_r_v1.24.fits b/data/DECam_examples/c4d_20221002_040239_r_v1.24.fits
diff --git a/data/DECam_examples/c4d_20221002_040434_i_v1.24.fits b/data/DECam_examples/c4d_20221002_040434_i_v1.24.fits
diff --git a/default_config.yaml b/default_config.yaml
@@ -10,3 +10,22 @@ db:
   host: localhost
   port: 5432
   database: seechange
+
+storage:
+  images:
+    # can choose hdf5 as well, but this is not yet implemented
+    format: fits
+    # should Image object save the weights/flags/etc in a single file with the image data?
+    single_file: false
+    # The convention for building filenames for images
+    # Use any of the following: short_name, date, time, section_id, filter, ra, dec, prov_id
+    # Can also use ra_int and ra_frac to get the integer number before/after the decimal point
+    # (the same can be done for dec). Also use ra_int_h to get the number in hours.
+    # to get the declination with "p" or "m" replacing the sign, use dec_int_pm.
+    # The string given here is fed into the python format() function
+    # so you can use e.g., {ra_int:03d} to get a 3 digit zero padded right ascension.
+    # The name convention can also include subfolders (e.g., using {ra_int}/...).
+    # The minimal set of fields to make the filenames unique include:
+    # short_name (instrument name), date, time, section_id, prov_id (the unique provenance ID)
+    name_convention: "{ra_int:03d}/{short_name}_{date}_{time}_{section_id:02d}_{filter}_{prov_id:03d}"
+
diff --git a/docker/application/Dockerfile b/docker/application/Dockerfile
@@ -179,6 +179,7 @@ RUN pip install \
       sqlalchemy==2.0.7 \
       sqlalchemy-utils==0.40.0 \
       urllib3==1.26.15 \
+      wget==3.2 \
    && rm -rf /home/seechange/.cache/pip
 
 # Other pip packages I removed from above that were