👍 First of all: Thank you for taking the time to contribute!
The following is a set of guidelines for contributing to nmdc-schema. This guide is aimed primarily at the core NMDC schema developers, although anyone is welcome to contribute.
The nmdc-schema team strives to create a welcoming environment for editors, users and other contributors.
Please carefully read our Code of Conduct.
Please use the Issue Tracker for reporting problems with the schema.
Please review GitHub's overview article, ["Tracking Your Work with Issues"][about-issues].
See Pull Requests for all pull requests.
Please review GitHub's article, "About Pull Requests", and make your changes on a new branch.
We recommend also reading GitHub Pull Requests: 10 Tips to Know
While anyone is welcome to make issues or pull requests in this repository, it is expected that the core schema team become familiar with the schema, the basics of the LinkML framework, and NMDC Best Practices.
(Note: these best practices apply to most development in NMDC; these guidelines may later be moved somewhere central)
- Read ["About Issues"][about-issues] and ["About Pull Requests"][about-pulls]
- Issues should be focused and actionable
- Complex issues should be broken down into simpler issues where possible
- Pull Requests (PRs) should be atomic and aim to close a single issue
- Long running PRs should be avoided where possible
- PRs should reference issues following standard conventions (e.g. “fixes #123”)
- Schema developers should always be working on a single issue at any one time
- Never work on the main branch, always work on an issue/feature branch
- Core developers can work on branches off origin rather than forks
- Always create a PR on a branch to maximize transparency of what you are doing
- When a PR includes a breaking change, include a migration
- PRs should be reviewed and merged in a timely fashion by the nmdc-schema technical leads
- PRs that do not pass GitHub actions should never be merged
- In the case of git conflicts, the contributor should try and resolve the conflict
- If a PR fails a GitHub action check, the contributor should try and resolve the issue in a timely fashion
Core developers should read the material on the LinkML site, in particular:
- Follow Naming conventions
- Standard LinkML naming conventions should be followed (UpperCamelCase for classes and enums, snake_case for slots)
- Know how to use the LinkML linter to check style and conventions
- The names for classes should be nouns or noun-phrases: Person, SequenceAlignment, Annotation, SequenceTrimming
- Spell out abbreviations and short forms, except where this goes against convention (e.g. do not spell out DNA)
- Elements that are imported from outside (e.g. MIxS) need not follow the same naming conventions
- Multivalued slots should be named as plurals
- Older elements may be "grandfathered in" - modifying them to match naming conventions may be too expensive
- Document model elements
- All model elements should have documentation (descriptions) and other textual annotations (e.g. comments, notes)
- Textual annotations on classes, slots and enumerations should be written with minimal jargon, clear grammar and no misspellings
- Include examples and counter-examples (intentionally invalid examples)
- Rationale: these serve as documentation and unit tests
- All elements of the nmdc-schema must be illustrated with valid and invalid data examples in src/data. New schema elements will not be merged into the main branch until examples are provided
- Invalid example data files should be invalid for one single reason, which should be reflected in the filename. It should be possible to render the invalid example files valid by addressing that single fault.
- Use enums for categorical values
- Rationale: Open-ended string ranges encourage multiple values to represent the same entity, like “water”, “H2O” and “HOH”
- Any slot whose values could be constrained to a finite set should use an Enum
- Non-categorical values, e.g. descriptive fields like
name
ordescription
fall outside of this.
- Reuse
- Existing scheme elements should be reused where appropriate, rather than making duplicative elements
- More specific classes can be created by refinining classes using inheritance (
is_a
)
- Place new classes under existing upper level classes
- Note: this is partially aspirational until we have a stable upper level structure in place
- Most new classes should be refinement of existing classes
- Follow the naming conventions of the parent class
- Descriptions of child classes may reference parent classes in a genus-differentia definition structure (e.g. "A workflow execution activity that...")
- Inheritance should be monotonic:
slot_usage
should refine rather than override
- ID patterning and checks
- ID patterns for new classes should follow conventions found here
- In the rare case that NMCD records must support legacy typecodes, typecodes can be declared on new classes with multiple typecodes (i.e.
syntax: "{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$"
). In this case, the first typecode is the one the NMDC Runtime's minter will use when generating new ids for the class. - Class-linking slots (i.e.
has_input
) should have aslot_usage
declared that limits the slot's values to ids of instances of only the specific classes you want to allow the slot to link to (e.g. usingsyntax: "{id_nmdc_prefix}:chrcon-{id_shoulder}-{id_blade}$"
on thestructured_pattern
will make it so only ids having the typecodechrcon
can fill that slot)
While all PRs will be automatically checked using GitHub actions, core developers should understand how to run tests locally, as well as how to deploy a test version of the schema documentation. This requires some basic Python installation (e.g poetry)
- Contributors should be comfortable running makefile targets, like
squeaky-clean
,all
,test
- Anyone who is involved in writing migrations or otherwise checking data from MongoDB against the schema should be comfortable running make
make-rdf
. - The main Makefile should in general not be edited. Instead, edits should be made to project.Makefile (advanced contributors only)
- Use the NMDC ADR Log
TODO: Add to this section later