Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) #502

Closed
5 tasks
stevieing opened this issue Jan 27, 2022 · 2 comments
Labels
Enhancement New feature or request GSU Delivers work for the GSU unit Heron Heron priority samples RVI RVI Project - Sequence old Heron samples for respiratory diseases

Comments

@stevieing
Copy link
Contributor

stevieing commented Jan 27, 2022

User story
There are malformed root_sample_ids found in 6 different database tables across various systems.
These generally take the format ABC123456789_YZ01. The correct root_sample_id is the code before the underscore.

Who are the primary contacts for this story
Jonnie B
Alan K

Acceptance criteria

  • root_sample_id is updated in all 6 DBs or a reason given for not updating them if that decision is made
  • changes are tested locally and in UAT before deployment to Prod
  • code reviewed before deployment to Prod
  • changes documented in Confluence
  • before changes to Prod are made, email the email_mlwh email group to notify of data changes

Progress
Some of the root_sample_id data has been picked/sequenced; some of the IDs have been corrected and duplicated so the correct version of the data is in the database alongside the malformed version. For some IDs (94 of them), both of these things are true. Separate tickets have been made for different groups of data based on these characteristics.

Here is a table which shows how the data is divided up:

  Duplicated in lighthouse_sample Not duplicated in lighthouse_sample Totals
Picked/Sequenced 94 1,128 1,222
Not picked/sequenced 18,138 4,699 22,837
Totals 18,232 5,827 24,059

The 4,699 rows of data which are not duplicated and were not picked/sequenced can simply be corrected. This is covered in DPL-048-2.

The 18,138 rows of data which were duplicated but never picked/sequenced can potentially be deleted or just marked as withdrawn. This is covered in DPL-048-3.

The 1,128 rows of data which are not duplicated but were picked/sequenced need to be corrected in many places (MLWH, SequenceScape, Event Warehouse, Mongo). This is covered in DPL-048-4.

The 94 rows of data which were duplicated but also picked/sequenced can't be deleted since the data has propagated a long way. It also can't just be updated since the correct data is also there and there would be duplicates. This is a complicated issue and is covered in DPL-048-5.

@stevieing stevieing added Enhancement New feature or request priority samples Heron Heron labels Jan 27, 2022
@stevieing
Copy link
Contributor Author

linked to #375

@TWJW-SANGER TWJW-SANGER reopened this Feb 9, 2022
@Jonnie-Bevan Jonnie-Bevan self-assigned this Feb 14, 2022
@sdjmchattie sdjmchattie changed the title DPL-048-1 - fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) Aug 24, 2022
@andrewsparkes andrewsparkes added the RVI RVI Project - Sequence old Heron samples for respiratory diseases label Sep 7, 2022
@TWJW-SANGER TWJW-SANGER added the GSU Delivers work for the GSU unit label Sep 28, 2022
@TWJW-SANGER
Copy link

Deprecated by the Product Owner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request GSU Delivers work for the GSU unit Heron Heron priority samples RVI RVI Project - Sequence old Heron samples for respiratory diseases
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants