Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPL-471-1: malformed root_sample_ids that have duplicates in MLWH, and WERE picked #539

Open
5 tasks
Jonnie-Bevan opened this issue Mar 17, 2022 · 0 comments
Open
5 tasks
Labels
Data integrity data fix Enhancement New feature or request GSU Delivers work for the GSU unit Heron RVI RVI Project

Comments

@Jonnie-Bevan
Copy link

Jonnie-Bevan commented Mar 17, 2022

User Story
Part of the wider DPL-471 issue which spawned in turn from DPL-048. Related are DPL-048-2, DPL-048-3 and DPL-048-4.

This story concerns the 94 malformed root_sample_ids in the MLWH lighthouse_sample table (+ MongoDB) which have duplicate samples with the correct root_sample_id. These came from the MK lighthouse lab in August 2021 and have an extra substring (something like '_RNA123456789') concatenated on the end of the correct ID.

The samples were picked at some point and are therefore found in SequenceScape and Event Warehouse as well as in the MLWH sample table. As such these need to be addressed in all 5 places (MLWH x2, Mongo, SS, EW). These are also in the iq_seq_flowcell table, meaning they have been picked/sequenced and we need to investigate how far they have gone. Indeed these 94 samples all show up twice in the iq_seq_flowcell table, so they have been picked/sequenced twice.

Fix
The main issue is that since these are duplicated, we cannot simply fix the root_sample_id in the databases. The root_sample_id/plate_barcode/coordinate combination must be unique, and the fixed IDs break this uniqueness. We also can't really delete the rows since these samples were used and have a paper trail of picking -> sequencing etc. that shouldn't be broken.

For SequenceScape/the MLWH sample table, we can add a flag in the description or comments which shows that the sample is a duplicate. For MLWH/MongoDB, we may not be able to do this, and the data might have to be left (not ideal) or some other option will need to be found to deal with these. This could be difficult/complicated... the good news is it is just 94 samples.

Who are the primary contacts for this story
Jonnie B
Alan K

Acceptance criteria

  • data assessed to work out if it has been sequenced, and NPG alerted if need be
  • data are assessed and dealt with in MLWH lighthouse_sample table
  • data are assessed and dealt with in in MongoDB sample table
  • data are assessed and dealt with in SequenceScape (proliferates to MLWH samples table)
  • data are assessed and dealt with in Events Warehouse
@Jonnie-Bevan Jonnie-Bevan added Enhancement New feature or request Data integrity data fix labels Mar 17, 2022
@Jonnie-Bevan Jonnie-Bevan self-assigned this Mar 17, 2022
@sdjmchattie sdjmchattie changed the title DPL-048-5: malformed root_sample_ids that have duplicates in MLWH, and WERE picked DPL-471-1: malformed root_sample_ids that have duplicates in MLWH, and WERE picked Aug 24, 2022
@andrewsparkes andrewsparkes added the RVI RVI Project label Sep 7, 2022
@TWJW-SANGER TWJW-SANGER added the GSU Delivers work for the GSU unit label Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data integrity data fix Enhancement New feature or request GSU Delivers work for the GSU unit Heron RVI RVI Project
Projects
None yet
Development

No branches or pull requests

4 participants