-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdatamodel.txt
528 lines (364 loc) · 21.5 KB
/
datamodel.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
A Datamodel for the AuxTel spectrograph (LATISS)
===================
(The cognoscenti will recognise certain similarities to the PFS data model,
https://github.com/Subaru-PFS/datamodel)
This document outlines the data model for LATISS (The LSST Atmospheric Transmission Imager
and Slitless Spectrograph) which is a slitless spectrograph on the 1.2m auxilliary telescope
on Cerro Pachon, approximately 100m from LSST.
There will be implementations of python classes that represent many of descriptions in this document,
supporting reading and writing FITS files that conform to this model.
To use them, add the python directory in this package to your $PYTHONPATH
and import e.g. atmospec.datamodel.latissConfig
The following symbolic names will be used in the discussion:
xxx
NCOLUMN The number of columns in the detector
NROW The number of rows that spectra nominally extend over in the dispersion direction.
This will change depending on the dispersive element used
NSIMROW Number of rows for simulated data. May be different in different files, depending
on the desired oversampling.
NCOARSE Number of wavelengths used in computing the full covariance matrix of the spectra
xxx
The values of NCOLUMN and NROW will differ for raw and reduced data.
Possible values are given in the section, "Parameters"
In various places SHA-1 is refered to, which is a 160-bit hash, as used by e.g. git
(https://en.wikipedia.org/wiki/SHA-1). We truncate these hashes to 64bits (so as to fit
in standard 64-bit integers), and abbreviate these SHA-1 values to 8 hexadecimal digits,
2^32 ~ 4e9 possible values, in filenames. In filenames, hexadecimal characters shall be
written in lower case (0x12345abcdef not 0X12345ABCDEF or 0x12345ABCDEF).
The following variables are used in filename formats (which are written in python notation):
site:
T: Tuscon
B: BNL
N: NCSA (xxx remove?)
P: Princeton (xxx remove?)
S: Summit
I: SLAC I&T
X: AuxTel offline
F: simulation (fake)
category:
A: Science
B: Lab
visit:
An incrementing exposure number, unique at any site
disperser:
0 to n, where the value of n is number of different dispersive elements
and 0 represents no disperser, i.e. direct imaging
We consider the different dispersive elements as fundamental as they have different:
* transmission functions
* wavelength solutions
* resolutions
* flatfielding properties/datasets
tract:
(xxx will we have any need for this given one source density == 1/pointing)
An integer in the range (0, 99999) specifying an area of the sky
patch:
(xxx will we have any need for this given one source density == 1/pointing)
A string of the form "m,n" specifying a region within a tract
objId:
A unique 64-bit object ID for an object. For example, the LSST object ID from the database.
(xxx we won't have LSST object IDs for quite a while, need a stop-gap. Can we use Gaia obj IDs?)
objIds are written out using %08x formats (rather than e.g. %08d) as a 64 bit integer is a large number.
catId:
A small integer specifying the source of the objId. Currently only
0: Simulated
1: Gaia
are defined.
latissConfigId:
An integer uniquely specifying the configuration of LATISS; specifically a SHA-1 of the
(disperser, ra, dec) tuples (with position rounded to the nearest arcsecond) truncated to 64 bits.
See calculate_latissConfigId() in
python/atmospec/datas model/utils.py
We include the standard prefix "0x" in filenames to visually separate SHA-1s from the 64-bit objId; this is
especially important in latissObject filenames which have both an objId and a hash.
An alternative to a SHA-1 would be the visit number of the first data taken with this configuration but that
would have the disadvantage that it would not be available until the data was taken.
Lab data may choose to set the latissConfigId to 0x0. As a convenience to simulators, if the disperserIds
in a latissConfig file are consecutive integers starting at 1 and all ra/dec values are 0.0 the latissConfigId
is taken to be 0x0
latissVisitHash:
An integer uniquely defining the set of visits contributing to a reduced spectrum;
this will be calculated as a SHA-1 truncated to 64 bits
See calculate_latissVisitHash() in
python/latiss/datamodel/utils.py
As a special case, if the only visit is 0, latissVisitHash shall be taken to be 0x0 for the convenience
of simulation code.
MaskedImage:
A data type which consists of three images:
A floating point image
An integer mask image
A floating point variance image
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Raw Data (from the spectrograph camera, both direct and dispersed imaging)
"PF%1s%1s%06d%1d%1d.fits" % (site, category, visit, spectrograph, armNum)
Format: as written by the DAQ system. I believe that the "A" (science) frames are currently a single image
extension of size > NCOLUMN*NROW due to extended registers and overclocks, even for the red and blue arms
which are physically two devices.
N.b. the restriction to filenames of the form 4 letters followed by 8 decimal digits comes from Subaru.
Note that the order of the last two numbers is "spectrograph, armNum" whereas in all other filenames
we use the order "arm, spectrograph" but now arm is a string (e.g. r2, n4).
---------------------------
PFS:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Calibration products (i.e. processed flats/biases/darks ready to be used in processing)
"pfsFlat-%06d-%1s%1d.fits" % (visit0, arm, spectrograph)
"pfsBias-%06d-%1s%1d.fits" % (visit0, arm, spectrograph)
"pfsDark-%06d-%1s%1d.fits" % (visit0, arm, spectrograph)
(While theoretically we don't need to distinguish between the r and m chips, it's simpler to be
consistent, so we will have 16, not 12, biases/darks for the complete spectrograph)
The visit is replaced by visit0, the first visit for which these flats/biases/darks are valid; in general
there will be many files for each (arm, spectrograph) but with different values of visit0. Note that we
don't know the upper range of validity when these files are created, so it cannot be encoded in the filename.
Visit0 is usually the last input visit used in preparing the calibration files. The calibration file that is
used by the pipeline is selected based on a list of valid dates provided to the calibration database; visit0
is only used to provide a unique filename and a sensible sort order.
Single extension fits files of size NCOLUMN*NROW
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
The positions and widths of fiber traces in the 2-D flat-field images.
"pfsFiberTrace-%06d-%1s%1d.fits" % (visit0, arm, spectrograph)
Note that visit0 is defined under "Calibration products"
Each fiber trace is represented as a MaskedImage, with the "image" consisting of the amplitude of the trace,
and a bit FIBERTRACE set in the "mask" to identify which pixels belong to the trace.
In the pfsFiberTrace file, these fibre profiles are stored as a subimage of a single large MaskedImage, packed
from left to right and all with the same number of rows (n.b. because of trace curvature and overlap this
MaskedImage may have more then the number of columns in a flat-fielded data image). The image/mask/variance
are stored in 3 HDUs, followed by an HDU describing where fibre traces appear in the flat-fielded images (this
information is also used to pack/unpack the traces into the image/mask/variance HDUs). FiberTraces which
could not be traced in the Quartz image will have zero widths.
HDU #0 PDU
HDU #1 IMAGE Image representing the FiberTrace profiles [FLOAT] NROW*sum_i(NCOL_i)
HDU #2 MASK Mask representing the FiberTrace profiles [INT32] NROW*sum_i(NCOL_i)
HDU #3 VARIANCE Variance representing the FiberTrace profiles [FLOAT] NROW*sum_i(NCOL_i)
HDU #4 ID_BOX Fiber ID and bounding boxes [BINARY FITS TABLE] NFIBER*5
The PDU has keys to define which data were used to construct the FiberTrace:
SPECTROGRAPH Which spectrograph
ARM The arm of the spectrograph
VISIT%03d Visits which were used
I.e. if only visit 5830 were used, the r1 file would have ARM='r', SPECTROGRAPH=1, VISIT000=5394
The individual parameters per fiber are stored in HDU #4:
FIBERID NFIBER*32-bit int
MINX NFIBER*32-bit int
MAXX NFIBER*32-bit int
MINY NFIBER*32-bit int
MAXY NFIBER*32-bit int
where the bounding box of the trace is defined by the lower left and upper right corners (MINX, MINY) and
(MAXX, MAXY) respectively.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
The state of the PFI and corresponding targetting information
"pfsConfig-0x%08x.fits" % (pfsConfigId)
The choice of a hex format is because the the pfsConfigId is a SHA-1
Format:
HDU #0 PDU
HDU #1 Fits binary table
lists fiberId, catId, objId, ra, dec, fiber flux, MCS centroid, ... for each object. The types should be:
fiberId 32-bit int
catId 32-bit int
tract 32-bit int
patch 3-byte string
objId 64-bit int
ra 32-bit float
dec 32-bit float
fiberMag 5*32-bit float
MCS centroid pair of 32-bit floats
N.b. fiberIds start at 1.
N.b. "MCS centroid, ..." is a placeholder for state of the PFI
The HDU1 header should contain a set of 5 keywords FILTERN (FILTER0...FILTER4) giving the
names of the filters in the fiberMag entry.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Reduced but not combined single spectra from a single exposure (flux and wavelength calibrated)
"pfsArm-%06d-%1s%1d.fits" % (visit, arm, spectrograph)
N.b. visit numbers are only unique at a site, but we don't preserve this in derived product names. There will
keywords in the header to disambiguate this.
The file will have several HDUs:
HDU #0 PDU
HDU #1 FLUX Flux in units of nJy [FLOAT] NROW*NFIBER
HDU #2 COVAR Near-diagonal part of HDU 1's covariance [FLOAT] NROW*3*NFIBER
HDU #3 MASK Pixel mask [32-bit INT] NROW*NFIBER
HDU #4 WAVELENGTH Wavelength solution in nm (vacuum) [FLOAT] NROW*NFIBER
HDU #5 SKY Sky flux in same units as HDU1 [FLOAT] NROW*NFIBER
HDU #6 CONFIG Information about the PFI, targetting, etc. [BINARY FITS TABLE]
Note that the data need not be resampled onto a uniform grid, as a wavelength is provided for each pixel.
The COVAR data contains the diagonal COVAR[fiberId][0][0:]
+-1 COVAR[fiberId][1][0:-1]
and
+-2 COVAR[fiberId][2][0:-2]
terms in the covariance matrix.
At a minimum the CONFIG table contains columns for pfsConfigId and visit (and only one row! But it's
a table not header keywords for consistency with the pfsObject file).
It would be possible to denormalise additional information about the observation into this HDU. I do not
know if it should also contain things we've learned about the observation from the analysis that created
this file; I'd be inclined to put it in a separate temporary file, and then build a single file describing
the entire PFS from the 12 temporaries. Alternatively, this could be done at the database level.
Note that the pfsFiberTrace file contains enough information to recover the raw counts and the widths
of the fiber traces.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
The sky model for a single exposure
"pfsSky-%06d-%1s%1d.fits" % (visit, arm, spectrograph)
Format: TBD
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
The PSF for a single exposure
"pfsPSF-%06d-%1s%1d.fits" % (visit, arm, spectrograph)
Format: TBD
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Combined spectra.
In SDSS we used "spPlate" files, named by the plugplate and MJD of observation but this is not suitable for
PFS where we:
1. Will split observations of the same object over multiple nights
2. Will potentially reconfigure the PFI between observations.
I don't think it makes sense to put multiple spectra together based on sky coordinates as we may go back and
add more observations later, so I think we're forced to separate files for every object. That's a lot of
files, but maybe not too bad? We could use a directory structure based on HSC's (tract, patch) -- note that
these are well defined even if we are not using HSC data to target. An alternative would be to use a
healpix or HTM id.
Because we may later obtain more data on a given object, or decide that some data we have already taken is
bad, or process a number of subsets of the available data, there may be more than one set of visits used
to produce a pfsObject file for a given object. We therefore include both the number of visits (nVisit)
and a SHA-1 hash of the visits, pfsVisitHash. We use both as nVisits may be ambiguous, while pfsVisitHash
isn't human-friendly; in particular it doesn't sort in a helpful way. It seems improbable that we will
ever have more than 100 visits, but as the pfsVisitHash is unambiguous it seemed safer to only allow for
larger values of nVisit, but record them only modulo 100.
"pfsObject-%05d-%s-%3d-%08x-%02d-0x%08x.fits" % (tract, patch, catId, objId, nVisit % 100, pfsVisitHash)
The path would be
tract/patch/pfsObject-*.fits
The file will have several HDUs:
HDU #0 PDU
HDU #1 FLUX Flux in units of nJy [FLOAT] NROW
HDU #2 FLUXTBL Binary table [FITS BINARY TABLE] NROW
Columns for:
wavelength in units of nm (vacuum) [FLOAT]
intensity in units of nJy [FLOAT]
intensity error same units as intensity [FLOAT]
mask [32-bit INT]
HDU #3 COVAR Near-diagonal part of HDU 2's covariance [FLOAT] NROW*3
HDU #4 COVAR2 Low-resolution non-sparse estimate covariance [FLOAT] NCOARSE*NCOARSE
HDU #5 MASK Pixel mask [32-bit INT] NROW
HDU #6 SKY Sky flux in same units as HDU1 [FLOAT] NROW
HDU #7 CONFIG Information about the PFI, targetting, etc. [BINARY FITS TABLE]
The PDU must contain at least the keywords
tract tract INT
patch patch STRING
catId catId INT
objId objId INT
pfsVHash pfsVisitHash INT
(N.b. the keywords are case-insensitive). Other HDUs should specify INHERIT=T.
The wavelengths are specified via the WCS cards in the header (e.g. CRPIX1, CRVAL1) for HDU #1, and
explicitly in the table for HDU #2. We chose these two representations for the data due to the
difficulty in resampling marginally sampled data onto a regular grid, while recognising the convenience
of such a grid when rebinning, performing PCAs, or stacking spectra. For highest precision the data
in HDU #2 is likely to be used.
See pfsObject for definition of the COVAR data
What resolution should we use for HDU #1? The instrument has a dispersion per pixel which is roughly constant
(in the blue arm Jim-sensei calculates that it varies from 0.70 to 0.65 (going red) A/pix; in the red, 0.88 to
0.82, and in the IR, 0.84 to 0.77). We propose that we sample at 0.8 A/pixel.
The second covariance table (COVAR2) is the full covariance at low spectral resolution, maybe 10x10. It's
really only 0.5*NCOARSE*(NCOARSE + 1) numbers, but it doesn't seem worth the trouble to save a few bytes.
This covariance is needed to model the spectrophotometric errors.
The CONFIG table has nVisit rows, and at a minimum contains columns for visit and pfsConfigId:
visit: 32-bit int
pfsConfigId: 64-bit int
The targetting information that SDSS put in the PLUGMAP table is available from the pfsConfigId column, which
links to the relevant pfsConfig files. This could be denormalised into the CONFIG table if desired.
Note that we don't keep the SDSS "AND" and "OR" masks -- if needs be we could set two mask bits to capture
the same information, but in practice SDSS's OR masks were not very useful.
For data taken with the medium resolution spectrograph, HDU #1 is expected to be at the resolution of
the medium arm, and to omit the data from the blue and IR arms.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Simulations
We need to specify a format for the spectra going into the simulator. At first glance we could adopt the
pfsObject file, along with a pfsConfig file to map them to the sky. This would work, but there is a problem
with resolution, in particular emission lines. Also, the resolution of the pfsObject files for
medium-resolution spectra may be different from that for low-resolution (extra-galactic) spectra (and the
simulators are probably not interested in covariances and masks)
The PSF of an extracted emission line is not the same as the marginal shape of the fibre spot, and I don't
think that the simulators want to know the details of the spectrograph anyway! This leads us to going to
higher resolution, and this would be possible (e.g. R ~ 1e4?). An alternative would be to provide the
continuum spectrum as an array, possibly even at PFS's resolution and an additional table of (lambda,
amplitude, width) for emission lines. Will a continuum + lines model like this work for the Galactic
evolution stellar spectra?
For now, I propose:
"pfsSimObject-%05d-%s-%3d-%08x.fits" % (tract, patch, catId, objId)
The tract, patch, catId, objId values will also be available as header keywords.
The file will have two HDUs:
HDU #0 PDU
HDU #1 Flux in units of nJy [FLOAT] NSIMROW
HDU #2 WAVELENGTH Wavelength solution in nm (vacuum) [FLOAT] NSIMROW
The simulation team will also need to provide a matching pfsConfig file, and may well choose to set
tract, patch = 0, "0,0"
The catId/objId is only needed to allow the set of pfsSimObject files and a pfsConfig file to specify a
set of simulated exposures.
Note that the wavelength need not be uniformly sampled.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Example File Names
------------------
PFSA00066611.fits
Raw science exposure, visit 666, taken on the summit with spectrograph 1 using the blue arm
PFLB00123423.fits
Raw up-the-ramp exposure, visit 1234, taken at LAM with spectrograph 2 using the IR arm
pfsConfig-0xad349fe2.fits
The fiber and targetting information for the PFI configuration with hash 0xad349fe2
pfsFlat-000333-m2.fits
Flat for spectrograph 2, medium resolution arm, valid for 333 <= visit < ??
pfsBias-000333-r1.fits
Bias for spectrograph 1, red arm
pfsBias-000333-m1.fits
Bias for spectrograph 1, medium resolution arm (identical to pfsBias-000333-1r.fits)
pfsDark-000333-b3.fits
Dark for spectrograph 3, blue arm
pfsFiberTrace-000333-n2.fits
Fiber traces for spectrograph 2, IR arm
pfsArm-000666-b1.fits
Extracted spectra for visit 666, from spectrograph 1's blue arm
pfsPSF-000666-b1.fits
The 2-D PSF for spectrograph 1's blue arm in visit 666
pfsObject-07621-2,2-001-02468ace-03-0xdeadbeef.fits
Combined spectrum for HSC (the "001") object 0x02468ace, located in tract 7621, patch 2,2.
The pfsVisitHash SHA-1 (calculated from the 3 visits included in this reduction) is 0xdeadbeef.
pfsSimObject-00000-0,0-000-13579bdf.fits
A simulated spectrum (the "000") for object 0x13579bdf, located in tract 0, patch 0,0
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Parameters
----------
The raw CCD images will have:
NAMP=8
AMPCOLS=520
CCDROWS=4224
OVERROWS = 76 # as of 2016-02-01
OVERCOLS = 32 # as of 2016-02-01
LEADINROWS = 48 # necked rows
LEADINCOLS = 8 # real leadin pixels
So the raw CCD data will have
NROW = 4300 # CCDROWS + OVERROWS
NCOL = 4416 # NAMP*(AMPCOLS + OVERCOLS)
And the ISR-extracted CCD images will probably be close to:
NROW = 4176 # CCDROWS - LEADINROWS
NCOL = 4096 # NAMP*(AMPCOLS - LEADINCOLS)
This value of NROW is applicable to the pfsArm files.
The raw H4RG images will be in the range
NROW = 4096 to 8192
NCOL = 4096
(n.b. there's a 4-pixel reference pixel border)
The ISR-extracted H4RG images will be no bigger than
NROW = 4088
NCOL = 4088
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Open Questions and TBD
----------------------
Pin down the HDUs in pfsArm files
Specify more details about expected header keywords
Provide examples of all these files
Decide where to put information about the analysis resulting in pfsArm files
Define the 1-D outputs
Define pfsObject files for medium resolution data
Consider using healpix or HTM ID not tract/patch
Do we need to save the covariance of the wavelength solution
Should we save the per-pfsConfigId combined spectra? Note that this is doable within
my naming scheme now we have added a pfsVisitHash and an nVisit field
Reruns in directory tree, and also headers of course
DB of bad visits
Define format of pfsPSF files. Choosing a better name is probably hopeless
Need a DB (as well as the butler) to map visit to visit0 for calibration products
Do we want to model the sky over the entire exposure? I put in a place holder that is
per spectrograph/arm but it isn't obvious that this is correct. E.g. there's spatial
structure over the whole focal plane, and the lines in the red and IR are closely related.
Think of a better name than NCOARSE
Do we really want HDU #1 (rebinned spectra) in medium-resolution pfsObject files to
omit the blue and IR data?