Make pipeline version per Organism #3534

fhennig · 2025-01-16T10:34:22Z

resolves #1728

preview URL: https://pipeline-v-per-organism.loculus.org

Summary

The current_processing_pipeline table now has a new column: organism. We now store the current pipeline for each organism.

TODO: How does this affect users? Cornelius said there was previously a constraint to have to use the same version in the values.yaml for all organisms, does that still need to be removed?

Implementation notes

Before this change, we stored "V1" in this table (current_processing_pipeline) - that was simple and could be done without knowing the user configuration. Some of the additional complexity in this PR arises from having to configure a version for the CurrentPipelineVersion for each organism (and also i.e. having to reset this in the tests).

Future work could either include making entries in the table optional, and only filling it as soon as the first sequence for an organism is submitted, or maybe introducing a mechanism to register pipeline versions via API. I opted against making any of these changes to keep the PR small and not change too much of how things are currently functioning.

Testing

Exisiting tests were updated
A backend test was added to check that if one pipeline version updates for one organism, another organisms pipeline version is not changed.
A backend test was added to check that if the initialization runs again, we don't insert v1 rows for organisms that are already in the table.

Docs

Document pipeline version concept better. Relevant link: https://github.com/loculus-project/loculus/blob/main/preprocessing/specification.md#reprocessing

Screenshot

backend change - n/a

PR Checklist

All necessary documentation has been adapted.
The implemented feature is covered by an appropriate test.

fengelniederhammer · 2025-01-16T10:39:59Z

Wow, the pipeline version in the backend is not per organism? But in the values.yml it's per organism, isn't it?

fhennig · 2025-01-16T11:54:58Z

What do you mean? The pipeline version is internal, so it doesn't appear in the values.yaml if I'm not mistaken.

chaoran-chen · 2025-01-16T11:57:30Z

The pipeline version also occurs in the Values.yaml because it's also used to deploy the pipelines.

fengelniederhammer · 2025-01-16T11:58:24Z

loculus/kubernetes/loculus/values.yaml

Lines 1148 to 1150 in 9aa28ab

    
           preprocessing: 
        
             - &preprocessing 
        
               version: 1

This is the pipeline version, isn't it?

theosanderson · 2025-01-16T12:07:11Z

Currently for the backend (/in the database) there is a single "latest pipeline version". So if you make a new pipeline version for one organism in the values yaml you have to do so for all organisms. This issue is (intended to be) about addressing that.

fhennig · 2025-01-23T11:53:54Z

Note to self: The table now starts out empty, somehow it needs to be filled. (I don't think there is a way around the table starting empty now? Because we don't know which organisms there are)

fhennig · 2025-01-23T15:35:31Z

Update: I found a way to initialize the table on backend-startup.

Now I marked some places in the schema that still need to be adjusted, that currently lead to: "ERROR: more than one row returned by a subquery used as an expression"

fhennig · 2025-01-23T16:30:22Z

From all the places I had to touch, I think the general handling of this table isn't the best unfortunately. It is different to all the other tables because it starts out with data, and all the others don't.

fhennig · 2025-01-28T10:50:58Z

Ok at least some of the test failures are because the flyway property I added cannot be injected. maybe I can create a dummy bean for it in the test, or I can somehow make the dependency optional.

Ideally I'd solve the problem of "run x after flyway is finished running" in a different way maybe? Maybe the PostConstruct shouldn't be in the SubmissionDatabaseService, but somewhere else.

Also for anyone who is following along, I wanted to do it this way as a quick way to get it to run, and then iterate on it to make it prettier/less invasive, just so y'all know.

...end/src/main/kotlin/org/loculus/backend/service/submission/CurrentProcessingPipelineTable.kt

fhennig · 2025-01-29T09:29:24Z

I've moved the PR out of Draft, because I'm done with the code+tests. I'll look at the docs today.

fhennig · 2025-01-29T10:18:59Z

I have added some docs stuff:

Not sure if the new subsection is a good idea or not. But I added a page to explain a bit more what a preprocessing pipeline even is, taking some of the text from the specification.

I imagine that in the near future it'd be good to have a couple of sections like "setup", "preprocessing pipeline", "user management", "" sort of like chapters in a manual.

backend/src/main/resources/db/migration/V1.10__pipeline_version_per_organism.sql

...end/src/main/kotlin/org/loculus/backend/service/submission/CurrentProcessingPipelineTable.kt

backend/src/main/kotlin/org/loculus/backend/service/submission/SequenceEntriesTable.kt

...t/kotlin/org/loculus/backend/service/submission/UseNewerProcessingPipelineVersionTaskTest.kt

chaoran-chen

Thank you very much, Felix! The code looks good to me and the tests are reassuring!

As it is a rather complicated change and it is not easy to test manually, I think that it would be great to have another review by someone. @corneliusroemer, @theosanderson, would you like to take a look?

corneliusroemer

I'll review this by Thursday EOD

corneliusroemer · 2025-01-29T13:39:08Z

A simple/cheap way to test is to use different values for different pipelines in our default values.yaml

There are lines like this per organism (right now version is shared, we can now undo this constraint)

  preprocessing:
    - &preprocessing
      version: 1
      image: ghcr.io/loculus-project/preprocessing-nextclade

Update: I've now done this in dfdc7d3

backend/docs/db/schema.sql

…that line

fhennig · 2025-01-31T13:51:14Z

That's not quite true, it changes what's allowed in the values.yaml: processing version no longer has to be identical across all organisms.

Ah yes, good point! I meant that it isn't breaking (but I know that we use 'breaking' a little differently). From what I gathered, this constraint isn't enforced anywhere though, or is it? AFAIKT the Helm chart just passes the version through.

kubernetes/loculus/values.yaml

backend/docs/db/schema.sql

chaoran-chen

Thanks for fixing!

fhennig · 2025-02-03T11:00:33Z

I want to test this on staging before merging.

fhennig self-assigned this Jan 16, 2025

fhennig added the update_db_schema label Jan 23, 2025

fhennig force-pushed the pipeline-v-per-organism branch from 170d08b to b6997b9 Compare January 23, 2025 11:54

fhennig added the preview Triggers a deployment to argocd label Jan 23, 2025

fhennig force-pushed the pipeline-v-per-organism branch from 1164bc1 to 8009b7f Compare January 28, 2025 10:08

chaoran-chen reviewed Jan 28, 2025

View reviewed changes

...end/src/main/kotlin/org/loculus/backend/service/submission/CurrentProcessingPipelineTable.kt Outdated Show resolved Hide resolved

fhennig force-pushed the pipeline-v-per-organism branch from fdc37b2 to 3be6556 Compare January 28, 2025 14:11

fhennig marked this pull request as ready for review January 29, 2025 09:28

fhennig requested a review from chaoran-chen January 29, 2025 10:24