Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-48178: Add documentation for DRP schemas. #299

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TallJimbo
Copy link
Member

Checklist

When making changes to YAML files in the schemas directory:

  • If applicable, incremented the schema version number, following the guidelines in the contribution guide
  • Referred to the documentation on specific schemas for additional versioning information, change constraints, or tasks that may need to be performed, based on which schema is being updated

@JeremyMcCormick JeremyMcCormick self-requested a review January 10, 2025 21:35
Copy link
Collaborator

@JeremyMcCormick JeremyMcCormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution, @TallJimbo !

This should be useful information, especially for repository contributors.

I have made a few relatively minor inlined suggestions:

  • Added links for referenced GitHub packages and files so that they are more easily accessible.
  • Changed some repository and file links to absolute style, as github.com should be unneeded.
  • Made a few minor changes to the wording and grammar.

I have left this in "Request changes" state until I am able to discuss with @gpdf, in case he would like to see more major changes in the form of additional information, etc.

My only suggestion for additional text might be including some information on who should be considered responsible for reviewing PRs on the DRP schemas. You had mentioned to me that the reviewers of the pipeline updates should also be reviewing the corresponding sdm_schemas updates. Would this information be worth stating explicitly here, especially for the benefit of new Rubin staff members working on the pipelines who may not be completely familiar with the procedure?

Data Release Production Schemas
===============================

The Data Release Production (DRP) table schemas describe the `Object`, `Source`, `CcdVisit`, and `Visit` tables produced by either a regularly-tested "live" pipeline or a historical pipeline used in an important production.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Data Release Production (DRP) table schemas describe the `Object`, `Source`, `CcdVisit`, and `Visit` tables produced by either a regularly-tested "live" pipeline or a historical pipeline used in an important production.
The Data Release Production (DRP) schemas describe the `Object`, `Source`, `CcdVisit`, and `Visit` tables produced by either a regularly-tested "live" pipeline or a historical pipeline used in an important production.

Copy link
Collaborator

@JeremyMcCormick JeremyMcCormick Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have typically just used "schema" instead of "table schema" within the project docs. (This is just a minor stylistic suggestion. Feel free to ignore if you prefer the original wording.)

===============================

The Data Release Production (DRP) table schemas describe the `Object`, `Source`, `CcdVisit`, and `Visit` tables produced by either a regularly-tested "live" pipeline or a historical pipeline used in an important production.
In the future all data release tables (`ForcedSource`, `DIASource`, `DIAObject`, etc.) will be included as well.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In the future all data release tables (`ForcedSource`, `DIASource`, `DIAObject`, etc.) will be included as well.
In the future, all data release tables (`ForcedSource`, `DIASource`, `DIAObject`, etc.) will be included as well.

In the future all data release tables (`ForcedSource`, `DIASource`, `DIAObject`, etc.) will be included as well.
When new major data release productions occur (e.g. a new Data Preview or Data Release), one of the live schemas is typically copied into a new file and adjusted to account for any differences specific to that production.

In particular:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In particular:
The DRP schemas include the following:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a complete sentence, but the exact verbiage I will leave to you. :)


In particular:

- `hsc.yaml` maps to the live pipelines as configured for the Subaru Hypersuprime-Cam instrument and its Strategic Survey Program, one of the primary precursor datasets used for LSST development.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `hsc.yaml` maps to the live pipelines as configured for the Subaru Hypersuprime-Cam instrument and its Strategic Survey Program, one of the primary precursor datasets used for LSST development.
- The [HSC schema](../python/lsst/sdm_schemas/schemas/hsc.yaml) maps to the live pipelines as configured for the Subaru Hypersuprime-Cam instrument and its Strategic Survey Program, one of the primary precursor datasets used for LSST development.

In particular:

- `hsc.yaml` maps to the live pipelines as configured for the Subaru Hypersuprime-Cam instrument and its Strategic Survey Program, one of the primary precursor datasets used for LSST development.
The `ci_hsc_gen3` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/HSC/DRP-ci_hsc.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/HSC/DRP-ci_hsc.yaml).
Copy link
Collaborator

@JeremyMcCormick JeremyMcCormick Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `ci_hsc_gen3` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/HSC/DRP-ci_hsc.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/HSC/DRP-ci_hsc.yaml).
The [ci_hsc_gen3](/lsst/ci_hsc_gen3) package is run in nightly Jenkins tests, as well as optionally prior to other pipeline code merges, and checks that the HSC schema matches the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/HSC/DRP-ci_hsc.yaml`](/lsst/drp_pipe/blob/main/pipelines/HSC/DRP-ci_hsc.yaml).


- `hsc.yaml` maps to the live pipelines as configured for the Subaru Hypersuprime-Cam instrument and its Strategic Survey Program, one of the primary precursor datasets used for LSST development.
The `ci_hsc_gen3` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/HSC/DRP-ci_hsc.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/HSC/DRP-ci_hsc.yaml).
The other HSC pipelines in `drp_pipe` should produce files with the same schemas as well, because they share almost all configuration with the `ci_hsc` pipeline.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The other HSC pipelines in `drp_pipe` should produce files with the same schemas as well, because they share almost all configuration with the `ci_hsc` pipeline.
The other HSC pipelines in [drp_pipe](/lsst/drp_pipe) should produce files with the same schemas as well, because they share almost all configuration with the [ci_hsc](/lsst/ci_hsc) pipeline.

The `ci_hsc_gen3` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/HSC/DRP-ci_hsc.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/HSC/DRP-ci_hsc.yaml).
The other HSC pipelines in `drp_pipe` should produce files with the same schemas as well, because they share almost all configuration with the `ci_hsc` pipeline.

- `imsim.yaml` similarly maps to the live pipelines as configured for the LSST ImSim simulator, in particular as run for the LSST Dark Energy Science Collaboration's "Data Challenge 2" project (DESC DC2).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `imsim.yaml` similarly maps to the live pipelines as configured for the LSST ImSim simulator, in particular as run for the LSST Dark Energy Science Collaboration's "Data Challenge 2" project (DESC DC2).
- The [ImSim schema](../python/lsst/sdm_schemas/schemas/imsim.yaml) similarly maps to the live pipelines as configured for the LSST ImSim simulator, in particular as run for the [LSST Dark Energy Science Collaboration](https://lsstdesc.org/)'s "Data Challenge 2" project ([DESC DC2](https://dp0-2.lsst.io/)).


- `imsim.yaml` similarly maps to the live pipelines as configured for the LSST ImSim simulator, in particular as run for the LSST Dark Energy Science Collaboration's "Data Challenge 2" project (DESC DC2).
This is the same simulated dataset used for LSST's Data Preview 0.1 and 0.2, but the pipelines have evolved considerably since those productions.
The `ci_imsim` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `ci_imsim` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml).
The [ci_imsim](/lsst/ci_imsim) package is run nightly in Jenkins, as well as optionally prior to other pipeline code merges, and checks that the ImSim schema match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml).

The `ci_imsim` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml).
The other `LSSTCam-imSim` pipelines in `drp_pipe` should produce files with the same schemas as well, because they share almost all configuration with the `ci_imsim` pipeline.

These files must be updated whenever the final pipeline output tables change, but it is expected that these changes will usually be minor, since they are not formally change-controlled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These files must be updated whenever the final pipeline output tables change, but it is expected that these changes will usually be minor, since they are not formally change-controlled.
The DRP schemas must be updated whenever the final pipeline output tables change, but it is expected that these changes will usually be minor, since they are not formally change-controlled.

Copy link
Collaborator

@JeremyMcCormick JeremyMcCormick Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is expected that these changes will usually be minor, since they are not formally change-controlled.

Can you clarify what this sentence means?

I'm not sure I understand how minor changes relates to "not formally change-controlled."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants