DPL-048 fix root sample ids #528

Jonnie-Bevan · 2022-03-08T09:20:20Z

Fixes sanger/crawler#502 for MLWH lighthouse_sample table and MongoDB.

Includes test helper files.
Not included: test data (has IDs) in a folder called 'test-data', connection variables in a file called 'constants.py'.

The workflow is

Get data from MLWH -> fix it -> save the original IDs and corresponding fixed versions in a CSV file -> loop through the CSV to insert the correct data into DB of choice.

codecov · 2022-03-08T09:23:26Z

Codecov Report

Merging #528 (52b920f) into develop (6c0f559) will increase coverage by 0.21%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #528      +/-   ##
===========================================
+ Coverage    92.33%   92.55%   +0.21%     
===========================================
  Files           98      106       +8     
  Lines         3248     3397     +149     
  Branches       330      343      +13     
===========================================
+ Hits          2999     3144     +145     
- Misses         203      206       +3     
- Partials        46       47       +1

Impacted Files	Coverage Δ
lighthouse/messages/message.py	`91.66% <0.00%> (-8.34%)`	⬇️
lighthouse/config/test.py	`100.00% <0.00%> (ø)`
lighthouse/config/defaults.py	`100.00% <0.00%> (ø)`
lighthouse/config/development.py	`0.00% <0.00%> (ø)`
lighthouse/routes/v1/beckman_routes.py	`100.00% <0.00%> (ø)`
lighthouse/helpers/plate_events.py
lighthouse/routes/common/plate_events.py
lighthouse/helpers/plate_event_callbacks.py
...ouse/classes/events/beckman/source_unrecognised.py	`100.00% <0.00%> (ø)`
lighthouse/hooks/beckman_events.py	`95.77% <0.00%> (ø)`
... and 13 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

sdjmchattie · 2022-03-08T11:53:59Z

Looks good to me. Obviously this isn't actually going to be merged and if it works correctly that's good enough.

emrojo · 2022-03-08T12:04:27Z

lighthouse/data-fixes/data_writers.py

+            update_query = { "Root Sample ID": original_root_sample_id }
+            new_value = { "$set": { "Root Sample ID": root_sample_id } }
+
+            table.update_one(update_query, new_value)


I think this would be updateMany because we want updating all entries with the same root_sample_id.

emrojo · 2022-03-08T12:07:06Z

lighthouse/data-fixes/data_fix_and_save.py

@@ -0,0 +1,15 @@
+# get the root_sample_ids, fix them, write these to a CSV to be used with the 'write_data' script (which inserts the fixed IDs into the DBs)


I would group all files related with the same data fix together, so we can add other data fixes in future if needed but keeping things separate, what do you think on creating a subfolder dpl-048 and move all files inside there?

Also, for future reference, it will be very useful to have a README.md file in that subfolder with the description of this data fix and how to run it.

Hopefully so that these scripts can be used in future

sdjmchattie

Just a brief question. I didn't check all the logic makes sense but it looks like you know what you're trying to do. I don't know the background of what needs to happen.

sdjmchattie · 2022-03-23T17:15:49Z

lighthouse/data-fixes/DPL-048/data_fix_and_save.py


 if __name__ == "__main__":
-    save_data()
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--input_file", required=False)


required=False? I think you later rely on there being a value. Might be wrong.

If you don't give an input_file it will return 'None' when you ask for it on line 30. Then that gets passed into the method, which has a check for whether it exists (ie. whether it's not None or it is None), and if it wasn't None it will read the file but if not it goes to the DB to get the data. This line was only for testing really, so that I could check that it would fix the data correctly and save to CSV when given some dummy data. In reality it should go to the DB to get the data because that's the data we're trying to fix

Ahhh I see. That makes sense! Good stuff.

stevieing · 2024-07-09T10:35:27Z

Closed as not needed

Jonnie-Bevan added 5 commits March 8, 2022 09:01

Actual data fix method

034eeee

get the malformed IDs from MySQL

f62c503

fix the data and save it to CSV

86a9de0

write the fixed data to DB

fb602dd

test locally (helper functions)

77de455

Jonnie-Bevan requested a review from sdjmchattie March 8, 2022 09:20

emrojo reviewed Mar 8, 2022

View reviewed changes

Jonnie-Bevan added 3 commits March 9, 2022 08:17

move to a new subdirectory

64bc2e2

Update .gitignore

2560aa9

Generalise the code

e0c00e8

Hopefully so that these scripts can be used in future

sdjmchattie reviewed Mar 23, 2022

View reviewed changes

Jonnie-Bevan added 7 commits March 30, 2022 13:59

add MYSQL_PORT to constants

eb5bffa

move files up one directory

935f3b6

Update data_writers.py

36c1424

update name of data save file and combine in data getter

1940a50

update name of data_writers

917cb8a

user guide

d527cd9

Update README.md

52b920f

emrojo mentioned this pull request May 16, 2022

DPL-335 As a team (PSD) we would like to setup a private repository to store data patch scripts & instructions so that we have one easily discoverable place for this information [C=S, V=4] sanger/General-Backlog-Items#175

Closed

3 tasks

stevieing closed this Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPL-048 fix root sample ids #528

DPL-048 fix root sample ids #528

Jonnie-Bevan commented Mar 8, 2022

codecov bot commented Mar 8, 2022 •

edited

Loading

sdjmchattie commented Mar 8, 2022

emrojo Mar 8, 2022

emrojo Mar 8, 2022

emrojo Mar 8, 2022

sdjmchattie left a comment

sdjmchattie Mar 23, 2022

Jonnie-Bevan Mar 24, 2022

sdjmchattie Mar 24, 2022

stevieing commented Jul 9, 2024

		@@ -0,0 +1,15 @@
		# get the root_sample_ids, fix them, write these to a CSV to be used with the 'write_data' script (which inserts the fixed IDs into the DBs)

DPL-048 fix root sample ids #528

DPL-048 fix root sample ids #528

Conversation

Jonnie-Bevan commented Mar 8, 2022

codecov bot commented Mar 8, 2022 • edited Loading

Codecov Report

sdjmchattie commented Mar 8, 2022

emrojo Mar 8, 2022

Choose a reason for hiding this comment

emrojo Mar 8, 2022

Choose a reason for hiding this comment

emrojo Mar 8, 2022

Choose a reason for hiding this comment

sdjmchattie left a comment

Choose a reason for hiding this comment

sdjmchattie Mar 23, 2022

Choose a reason for hiding this comment

Jonnie-Bevan Mar 24, 2022

Choose a reason for hiding this comment

sdjmchattie Mar 24, 2022

Choose a reason for hiding this comment

stevieing commented Jul 9, 2024

codecov bot commented Mar 8, 2022 •

edited

Loading