Skip to content

Commit

Permalink
Image model (c3-time-domain#34)
Browse files Browse the repository at this point in the history
  • Loading branch information
guynir42 authored Jun 15, 2023
1 parent 238e696 commit 20d1ece
Show file tree
Hide file tree
Showing 22 changed files with 134,425 additions and 213 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# some specific files we don't want in the repo
data/DECam_examples/c4d_221104_074232_ori.fits.fz

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
81 changes: 80 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ data into numpy arrays.
For example:

```python
from matplotlib.pyplot import plt
import matplotlib.pyplot as plt
from models.exposure import Exposure
exp = Exposure(filename='Camera_Project_2021-01-01T00:00:00.000.fits')
plt.show(exp.data[0])
Expand Down Expand Up @@ -379,3 +379,82 @@ and in the `test_instrument.py` file in the `tests/models` folder.

To be added...



### Image data and headers

A really important requirement from this pipeline is to make the data quickly accessible.
So where is the data stored and how to get it quickly?

Each Exposure object is associated with a single FITS file (or sometimes multiple files, for different sections).
To get the imaging data for an Exposure, simply call the `data` property. This dictionary-like object will
provide a numpy array for each section of the instrument:

```python
exp = Exposure('path/to/file.fits')
print(type(exp.data)) # this is a SectionData object, defined in models/exposure.py
print(type(exp.data[0])) # numpy array for section zero

for section_id in exp.instrument_object.get_section_ids():
print(exp.data[section_id].shape) # print the shape of each section
```

The `data` property is a SectionData object, which acts like a dictionary
that lazy loads the data array from the FITS file when needed
(in most cases these will be FITS files, but other formats can be added just as well).
For single-section instruments, `data[0]` will usually be good enough.
When there are several sections, use `exp.instrument_object.get_section_ids()` to get the list of section IDs.
Note that these could be integers or strings, but the SectionData can use either type.

Header information is also loaded from the FITS file, but this information can be kept in three different places.
The first is the `header` property of the Exposure object. This is a dictionary-like object that contains
a small subset of the full FITS header. Generally only the properties we intend to query on will be saved here.
Since some of this information is given as independent columns (like `exp_time`), the `header` column does not
necessarily keep much information beyond that. Note that this header is filled using the global header,
not the header of individual sections.
The keywords in this header are all lower-case, and are translated to standardized names using the
`_get_header_keyword_translations()` method of the Instrument class. This makes it easy to tell them apart
from the raw header information (in upper case) which also uses instrument-specific keywords.
The value of the header cards are also converted to standard units using the `_get_header_values_converters()`

In addition to the `header` column which is saved to the database, the Exposure also has a `raw_header` and
a `section_headers` properties. The `raw_header` is a dictionary-like object that contains the full FITS header
of the file. This is not saved to the database, but is lazy loaded from the file when needed.
The raw headers use all upper case keywords. They are saved with the file on disk, and are not kept in the database.
The `section_headers` property is a SectionHeaders object (also defined in `models/exposure.py`)
which acts like a dictionary that lazy loads the FITS header from file for a specific section when needed.
Note that the "global" header could be the same as the header of the first section.
This usually happens if the raw data includes a separate FITS file for each section.
Each one would have a different raw header, and the "exposure global header" would arbitrarily be the header of the
first file. If the instrument only has one section, this is trivially true as well.
In cases where multiple section data is saved in one FITS file, there would usually be a primary HDU that contains
the global exposure header information, and additional extension HDUs with their own image data and headers.
In this case the `section_headers` are all different from the `raw_header`.

After running basic pre-processing, we split each Exposure object into one or more Image objects.
These are already section-specific, so we have less properties to track when looking for the data or headers.
The Image object's `data` property contains the pixel values (usually after some pre-processing).
In addition to the pixel values, we also keep some more data arrays relevant to the image.
These include the `flags` array, which is an integer bit-flag array marking things like bad pixels,
the `weight` array, giving the inverse variance of each pixel (noise model),
and additional, optional arrays like the `score` array which is a "match-filtered" image,
normalized to units of signal-to-noise.
If the point spread function (PSF) of the image is calculated, it can be stored in the `psf` property.
These arrays are all numpy arrays, and are saved to disk using the format defined in the config file.

The Image object's `raw_header` property contains the section-specific header, copied directly from
the Exposure's `section_headers` property. Some header keywords may be added or modified in the pre-processing step.
This header is saved to the file, and not the database.
The Image object's `header` property contains a subset of the section-specific header.
Again this header uses standardized names, in lower case, that are searchable on the database.
This header is produced during the pre-processing step and contains only the important (searchable) keywords.

The Image data is saved to disk using the format defined in the config file (using `storage.images.format`).
Unlike the Exposure object, which is linked to files that were created by an instrument we are not in control of,
the files associated with an Image object are created by the pipeline.
We can also choose to save all the different arrays (data, weight, flags, etc.) in different files,
or in the same file (using multiple extensions). This is defined in the config file using `storage.images.single_file`.
In either case, the additional arrays are saved with their own headers, which are all identical to the Image object's
`raw_header` dictionary.


135 changes: 135 additions & 0 deletions alembic/versions/2023_05_31_1639-4114e36a2555_image_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
"""image model
Revision ID: 4114e36a2555
Revises: f940bef6bf71
Create Date: 2023-05-31 16:39:35.909083
"""
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects import postgresql

# revision identifiers, used by Alembic.
revision = '4114e36a2555'
down_revision = 'f940bef6bf71'
branch_labels = None
depends_on = None


def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
image_type = postgresql.ENUM('science', 'reference', 'difference', 'bias', 'dark', 'flat', name='image_type')
image_type.create(op.get_bind())
image_format = postgresql.ENUM('fits', 'hdf5', name='image_format')
image_format.create(op.get_bind())
image_combine_method = postgresql.ENUM('coadd', 'subtraction', name='image_combine_method')
image_combine_method.create(op.get_bind())

op.create_table('image_sources',
sa.Column('source_id', sa.Integer(), nullable=False),
sa.Column('combined_id', sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(['combined_id'], ['images.id'], ondelete='CASCADE'),
sa.ForeignKeyConstraint(['source_id'], ['images.id'], ondelete='CASCADE'),
sa.PrimaryKeyConstraint('source_id', 'combined_id')
)
op.add_column('exposures', sa.Column('type', sa.Enum('science', 'reference', 'difference', 'bias', 'dark', 'flat', name='image_type'), nullable=False))
op.add_column('exposures', sa.Column('format', sa.Enum('fits', 'hdf5', name='image_format'), nullable=False))
op.drop_index('ix_exposures_section_id', table_name='exposures')
op.create_index(op.f('ix_exposures_type'), 'exposures', ['type'], unique=False)
op.drop_column('exposures', 'section_id')
op.add_column('images', sa.Column('exposure_id', sa.BigInteger(), nullable=True))
op.add_column('images', sa.Column('combine_method', sa.Enum('coadd', 'subtraction', name='image_combine_method'), nullable=True))
op.add_column('images', sa.Column('type', sa.Enum('science', 'reference', 'difference', 'bias', 'dark', 'flat', name='image_type'), nullable=False))
op.add_column('images', sa.Column('format', sa.Enum('fits', 'hdf5', name='image_format'), nullable=False))
op.add_column('images', sa.Column('provenance_id', sa.BigInteger(), nullable=False))
op.add_column('images', sa.Column('header', postgresql.JSONB(astext_type=sa.Text()), nullable=False))
op.add_column('images', sa.Column('mjd', sa.Double(), nullable=False))
op.add_column('images', sa.Column('end_mjd', sa.Double(), nullable=False))
op.add_column('images', sa.Column('exp_time', sa.Float(), nullable=False))
op.add_column('images', sa.Column('instrument', sa.Text(), nullable=False))
op.add_column('images', sa.Column('telescope', sa.Text(), nullable=False))
op.add_column('images', sa.Column('filter', sa.Text(), nullable=False))
op.add_column('images', sa.Column('section_id', sa.Text(), nullable=False))
op.add_column('images', sa.Column('project', sa.Text(), nullable=False))
op.add_column('images', sa.Column('target', sa.Text(), nullable=False))
op.add_column('images', sa.Column('filepath', sa.Text(), nullable=False))
op.add_column('images', sa.Column('filepath_extensions', sa.ARRAY(sa.Text()), nullable=True))
op.add_column('images', sa.Column('ra', sa.Double(), nullable=False))
op.add_column('images', sa.Column('dec', sa.Double(), nullable=False))
op.add_column('images', sa.Column('gallat', sa.Double(), nullable=True))
op.add_column('images', sa.Column('gallon', sa.Double(), nullable=True))
op.add_column('images', sa.Column('ecllat', sa.Double(), nullable=True))
op.add_column('images', sa.Column('ecllon', sa.Double(), nullable=True))
op.create_index('images_q3c_ang2ipix_idx', 'images', [sa.text('q3c_ang2ipix(ra, dec)')], unique=False)
op.create_index(op.f('ix_images_combine_method'), 'images', ['combine_method'], unique=False)
op.create_index(op.f('ix_images_ecllat'), 'images', ['ecllat'], unique=False)
op.create_index(op.f('ix_images_end_mjd'), 'images', ['end_mjd'], unique=False)
op.create_index(op.f('ix_images_exp_time'), 'images', ['exp_time'], unique=False)
op.create_index(op.f('ix_images_exposure_id'), 'images', ['exposure_id'], unique=False)
op.create_index(op.f('ix_images_filepath'), 'images', ['filepath'], unique=True)
op.create_index(op.f('ix_images_filter'), 'images', ['filter'], unique=False)
op.create_index(op.f('ix_images_gallat'), 'images', ['gallat'], unique=False)
op.create_index(op.f('ix_images_instrument'), 'images', ['instrument'], unique=False)
op.create_index(op.f('ix_images_mjd'), 'images', ['mjd'], unique=False)
op.create_index(op.f('ix_images_provenance_id'), 'images', ['provenance_id'], unique=False)
op.create_index(op.f('ix_images_section_id'), 'images', ['section_id'], unique=False)
op.create_index(op.f('ix_images_project'), 'images', ['project'], unique=False)
op.create_index(op.f('ix_images_target'), 'images', ['target'], unique=False)
op.create_index(op.f('ix_images_telescope'), 'images', ['telescope'], unique=False)
op.create_index(op.f('ix_images_type'), 'images', ['type'], unique=False)
op.create_foreign_key(None, 'images', 'provenances', ['provenance_id'], ['id'], ondelete='CASCADE')
op.create_foreign_key(None, 'images', 'exposures', ['exposure_id'], ['id'])
# ### end Alembic commands ###


def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_constraint(None, 'images', type_='foreignkey')
op.drop_index(op.f('ix_images_type'), table_name='images')
op.drop_index(op.f('ix_images_telescope'), table_name='images')
op.drop_index(op.f('ix_images_project'), table_name='images')
op.drop_index(op.f('ix_images_target'), table_name='images')
op.drop_index(op.f('ix_images_section_id'), table_name='images')
op.drop_index(op.f('ix_images_provenance_id'), table_name='images')
op.drop_index(op.f('ix_images_mjd'), table_name='images')
op.drop_index(op.f('ix_images_instrument'), table_name='images')
op.drop_index(op.f('ix_images_gallat'), table_name='images')
op.drop_index(op.f('ix_images_filter'), table_name='images')
op.drop_index(op.f('ix_images_filepath'), table_name='images')
op.drop_index(op.f('ix_images_exposure_id'), table_name='images')
op.drop_index(op.f('ix_images_exp_time'), table_name='images')
op.drop_index(op.f('ix_images_end_mjd'), table_name='images')
op.drop_index(op.f('ix_images_ecllat'), table_name='images')
op.drop_index(op.f('ix_images_combine_method'), table_name='images')
op.drop_index('images_q3c_ang2ipix_idx', table_name='images')
op.drop_column('images', 'ecllon')
op.drop_column('images', 'ecllat')
op.drop_column('images', 'gallon')
op.drop_column('images', 'gallat')
op.drop_column('images', 'dec')
op.drop_column('images', 'ra')
op.drop_column('images', 'filepath_extensions')
op.drop_column('images', 'filepath')
op.drop_column('images', 'target')
op.drop_column('images', 'project')
op.drop_column('images', 'section_id')
op.drop_column('images', 'filter')
op.drop_column('images', 'telescope')
op.drop_column('images', 'instrument')
op.drop_column('images', 'exp_time')
op.drop_column('images', 'end_mjd')
op.drop_column('images', 'mjd')
op.drop_column('images', 'header')
op.drop_column('images', 'provenance_id')
op.drop_column('images', 'type')
op.drop_column('images', 'format')
op.drop_column('images', 'combine_method')
op.drop_column('images', 'exposure_id')
op.add_column('exposures', sa.Column('section_id', sa.TEXT(), autoincrement=False, nullable=False))
op.drop_index(op.f('ix_exposures_type'), table_name='exposures')
op.create_index('ix_exposures_section_id', 'exposures', ['section_id'], unique=False)
op.create_index('exposure_q3c_ang2ipix_idx', 'exposures', [sa.text('q3c_ang2ipix(ra, "dec")')], unique=False)
op.drop_column('exposures', 'type')
op.drop_column('exposures', 'format')
op.drop_table('image_sources')
# ### end Alembic commands ###
65,390 changes: 65,390 additions & 0 deletions data/DECam_examples/c4d_20221002_040239_r_v1.24.fits

Large diffs are not rendered by default.

66,775 changes: 66,775 additions & 0 deletions data/DECam_examples/c4d_20221002_040434_i_v1.24.fits

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions default_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,22 @@ db:
host: localhost
port: 5432
database: seechange

storage:
images:
# can choose hdf5 as well, but this is not yet implemented
format: fits
# should Image object save the weights/flags/etc in a single file with the image data?
single_file: false
# The convention for building filenames for images
# Use any of the following: short_name, date, time, section_id, filter, ra, dec, prov_id
# Can also use ra_int and ra_frac to get the integer number before/after the decimal point
# (the same can be done for dec). Also use ra_int_h to get the number in hours.
# to get the declination with "p" or "m" replacing the sign, use dec_int_pm.
# The string given here is fed into the python format() function
# so you can use e.g., {ra_int:03d} to get a 3 digit zero padded right ascension.
# The name convention can also include subfolders (e.g., using {ra_int}/...).
# The minimal set of fields to make the filenames unique include:
# short_name (instrument name), date, time, section_id, prov_id (the unique provenance ID)
name_convention: "{ra_int:03d}/{short_name}_{date}_{time}_{section_id:02d}_{filter}_{prov_id:03d}"

1 change: 1 addition & 0 deletions docker/application/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ RUN pip install \
sqlalchemy==2.0.7 \
sqlalchemy-utils==0.40.0 \
urllib3==1.26.15 \
wget==3.2 \
&& rm -rf /home/seechange/.cache/pip

# Other pip packages I removed from above that were
Expand Down
Loading

0 comments on commit 20d1ece

Please sign in to comment.