DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) #502

stevieing · 2022-01-27T10:18:20Z

User story
There are malformed root_sample_ids found in 6 different database tables across various systems.
These generally take the format ABC123456789_YZ01. The correct root_sample_id is the code before the underscore.

Who are the primary contacts for this story
Jonnie B
Alan K

Acceptance criteria

root_sample_id is updated in all 6 DBs or a reason given for not updating them if that decision is made
changes are tested locally and in UAT before deployment to Prod
code reviewed before deployment to Prod
changes documented in Confluence
before changes to Prod are made, email the email_mlwh email group to notify of data changes

Progress
Some of the root_sample_id data has been picked/sequenced; some of the IDs have been corrected and duplicated so the correct version of the data is in the database alongside the malformed version. For some IDs (94 of them), both of these things are true. Separate tickets have been made for different groups of data based on these characteristics.

Here is a table which shows how the data is divided up:

	Duplicated in lighthouse_sample	Not duplicated in lighthouse_sample	Totals
Picked/Sequenced	94	1,128	1,222
Not picked/sequenced	18,138	4,699	22,837
Totals	18,232	5,827	24,059

The 4,699 rows of data which are not duplicated and were not picked/sequenced can simply be corrected. This is covered in DPL-048-2.

The 18,138 rows of data which were duplicated but never picked/sequenced can potentially be deleted or just marked as withdrawn. This is covered in DPL-048-3.

The 1,128 rows of data which are not duplicated but were picked/sequenced need to be corrected in many places (MLWH, SequenceScape, Event Warehouse, Mongo). This is covered in DPL-048-4.

The 94 rows of data which were duplicated but also picked/sequenced can't be deleted since the data has propagated a long way. It also can't just be updated since the correct data is also there and there would be duplicates. This is a complicated issue and is covered in DPL-048-5.

The text was updated successfully, but these errors were encountered:

stevieing · 2022-01-31T09:46:42Z

linked to #375

TWJW-SANGER · 2022-10-05T16:52:43Z

Deprecated by the Product Owner.

stevieing added Enhancement New feature or request priority samples Heron Heron labels Jan 27, 2022

TWJW-SANGER closed this as completed Feb 9, 2022

TWJW-SANGER reopened this Feb 9, 2022

Jonnie-Bevan self-assigned this Feb 14, 2022

Jonnie-Bevan mentioned this issue Mar 8, 2022

DPL-048 fix root sample ids sanger/lighthouse#528

Closed

sdjmchattie unassigned Jonnie-Bevan Aug 17, 2022

sdjmchattie changed the title ~~DPL-048-1 - fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4)~~ DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) Aug 24, 2022

andrewsparkes added the RVI RVI Project - Sequence old Heron samples for respiratory diseases label Sep 7, 2022

TWJW-SANGER added the GSU Delivers work for the GSU unit label Sep 28, 2022

TWJW-SANGER closed this as completed Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) #502

DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) #502

stevieing commented Jan 27, 2022 •

edited by Jonnie-Bevan

Loading

stevieing commented Jan 31, 2022

TWJW-SANGER commented Oct 5, 2022

DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) #502

DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4) #502

Comments

stevieing commented Jan 27, 2022 • edited by Jonnie-Bevan Loading

stevieing commented Jan 31, 2022

TWJW-SANGER commented Oct 5, 2022

stevieing commented Jan 27, 2022 •

edited by Jonnie-Bevan

Loading