Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Helm): Add JSON schema for values.yaml validation #3588

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

fhennig
Copy link
Contributor

@fhennig fhennig commented Jan 29, 2025

resolves #2796, resolves #2798

preview URL for the new docs: https://docs-values-yaml-schema.loculus.org/reference/helm-chart-config/

Summary

The main change in this PR is the addition of the values.schema.json file to the Helm Chart, which Helm will by default use to validate values.yaml (You can do this with helm lint .).

The Helm Chart reference is now completely generated from the schema (it was written by hand before).

The new schema has extra keys in the properties (groups, placeholder, docsIncludePrefix) to facilitate generating the documentation in individual chunks.

A new github action was added to lint the existing values.yaml and make sure that we add all the new things we add to the values.yaml also to the schema.

TODO

  • The values.schema.json isn't available in the Dockerimage of the docs - needs to be added. To consider: hwo to have it at a sensible path in the Image but also use the 'live' version from the repo at dev time.
  • setting additionalProperties to false revealed some more settings that aren't in the schema yet, I need to add those.

Screenshot

n/A

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by an appropriate test.

@corneliusroemer

This comment was marked as outdated.

@fhennig
Copy link
Contributor Author

fhennig commented Jan 29, 2025

Ah damn, looks like this sort of composing is actually not supported: helm/helm#10481

That's a shame.

@fhennig
Copy link
Contributor Author

fhennig commented Jan 29, 2025

I like how currently the docs are split into sections: https://loculus.org/reference/helm-chart-config/ and I thought I could replicate that with using multiple files as sources - I guess I can't.

@theosanderson
Copy link
Member

Does the schema JSON schema support arbitrary additional properties in some way? If so we could add "docs_category" or something ?

@fhennig
Copy link
Contributor Author

fhennig commented Jan 29, 2025

A quick search showed that it should be possible! That's a great idea!

@fhennig
Copy link
Contributor Author

fhennig commented Jan 31, 2025

The schema should include that the pipeline version starts at v1: #3534 (comment) (thanks Cornelius!)

@fhennig
Copy link
Contributor Author

fhennig commented Feb 5, 2025

Maybe it would also be nice to have a few example yamls of stuff that users might want to configure, and then we can also lint/validate those?

@fhennig
Copy link
Contributor Author

fhennig commented Feb 6, 2025

In an effort to maybe extract examples and look further into which properties are required, I'm going to look into untangling our huge values.yaml into a minimal set of sensible defaults, and a more "example"-specific part that can be applied on top of the actual defaults.

@theosanderson
Copy link
Member

In an effort to maybe extract examples and look further into which properties are required, I'm going to look into untangling our huge values.yaml into a minimal set of sensible defaults

I think one can make a case for making the defaultOrganisms much more stripped back, which would make it easier for maintainers to read the master values file, while complicating our dev set up a bit.
But with that exception, I think we broadly do things according to Helm standard practices, which are to have tons of defaults: https://github.com/bitnami/charts/blob/main/bitnami/postgresql/values.yaml

@fhennig
Copy link
Contributor Author

fhennig commented Feb 6, 2025

True! I think there should be as many defaults as possible, and all switches etc should have a sensible default state. But exactly the actual "content" (i.e. organisms) and other instance specific defaults I think should be in there. Example:

auth:
  smtp:
    host: "in-v3.mailjet.com"
    port: 587
    user: "fafd505de339dd2e9c3e85ad9981af8a"
    replyTo: "[email protected]"
    from: "[email protected]"
    envelopeFrom: "[email protected]"
  identityProviders:
    orcid:
      clientId: "APP-P1P7N7T9YVBHQ4EH"
seqSets:
  crossRef:
    DOIPrefix: "placeholder"
    endpoint: "https://test.crossref.org"
lineageSystemDefinitions:
  pangoLineage:
    1: https://raw.githubusercontent.com/loculus-project/loculus/c400348ea0ba0b8178aa43475d5c7539fc097997/preprocessing/dummy/lineage.yaml

These default values are also currently undocumented - I think rather than documenting them, it'd make sense to not have them, since they are instance specific and do not make sense in the user context.

I also think this isn't great (from a tpl file):

...
{{- range $key, $instance := (.Values.organisms | default .Values.defaultOrganisms) }}
...

By default I would expect the Chart to not contain data (going back to your PSQL example, the defaults don't create any schema or data by default).

IMO we're using these defaults because it's convenient to us, but from an outside perspective it isn't super obvious what I have to un-set to get a "clean" install.

Of course one could argue that looking into this isn't in the scope of the issue being adressed here, and I can also drop it, but since I've spend so much time looking at the Chart now, I felt like it'd be sensible to tackle this.

I.e.:

helm install .

gives me a clean install, and if I want the example organisms or the ebola organism etc I can do:

helm install . -f values.default-organisms.yaml

(which will just layer the other yaml on top of the values.yaml)

@fhennig fhennig force-pushed the values-yaml-schema branch from a4efcb7 to 6070442 Compare February 6, 2025 14:17
@theosanderson
Copy link
Member

smtp config

totally agree

seqSets

I think these are broadly reasonable defaults (placeholders) - one could make it more sophisticated but it seems ok given where we are atm

lineage systems

This is probably a reason to put these within the organism as per #3603

By default I would expect the Chart to not contain data (going back to your PSQL example, the defaults don't create any schema or data by default).

So we have considered this, and decided that we wanted to allow people to have a working copy of loculus where they could actually see working organisms with a single command without needing to provide any additional configuration, just by deploying a bare helm chart. We can of course rethink that decision, but it is one we considered: it was considered around external users not just around us, and our tutorials are structured around it.

@fhennig
Copy link
Contributor Author

fhennig commented Feb 6, 2025

Ok! If that was intentional (it looked to me like it was just out of convenience for us) then I'll just not do anything to change the values.yaml for now!

@theosanderson
Copy link
Member

I guess the thought process is that there are lots of times when you would want to deploy an empty postgres DB instance and then populate it (without touching helm again), whereas there are no circumstances where you'd want to want to run loculus without any organisms.

@fhennig
Copy link
Contributor Author

fhennig commented Feb 6, 2025

I'll add something like this to the schema:

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "definitions": {
        "organism": {
            "type": "object",
            "properties": {
                "property1": { "type": "string" },
                "property2": { "type": "number" }
            },
            "required": ["property1", "property2"]
        }
    },
    "type": "object",
    "properties": {
        "organisms": { "$ref": "#/definitions/organism" },
        "defaultOrganisms": { "$ref": "#/definitions/organism" }
    }
}

for the default organisms, so they are also covered by the schema, even though we won't surface them in the docs


EDIT: Now there are more schema errors, will need to refine that!

@fhennig fhennig self-assigned this Feb 6, 2025
@fhennig fhennig added the preview Triggers a deployment to argocd label Feb 6, 2025
@fhennig fhennig force-pushed the values-yaml-schema branch from 481f43d to f04f25a Compare February 7, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Config file validation Config file schema
3 participants