Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DPL-048 fix root sample ids #528

Closed
wants to merge 15 commits into from

Conversation

Jonnie-Bevan
Copy link

Fixes sanger/crawler#502 for MLWH lighthouse_sample table and MongoDB.

Includes test helper files.
Not included: test data (has IDs) in a folder called 'test-data', connection variables in a file called 'constants.py'.

The workflow is

Get data from MLWH -> fix it -> save the original IDs and corresponding fixed versions in a CSV file -> loop through the CSV to insert the correct data into DB of choice.

@codecov
Copy link

codecov bot commented Mar 8, 2022

Codecov Report

Merging #528 (52b920f) into develop (6c0f559) will increase coverage by 0.21%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #528      +/-   ##
===========================================
+ Coverage    92.33%   92.55%   +0.21%     
===========================================
  Files           98      106       +8     
  Lines         3248     3397     +149     
  Branches       330      343      +13     
===========================================
+ Hits          2999     3144     +145     
- Misses         203      206       +3     
- Partials        46       47       +1     
Impacted Files Coverage Δ
lighthouse/messages/message.py 91.66% <0.00%> (-8.34%) ⬇️
lighthouse/config/test.py 100.00% <0.00%> (ø)
lighthouse/config/defaults.py 100.00% <0.00%> (ø)
lighthouse/config/development.py 0.00% <0.00%> (ø)
lighthouse/routes/v1/beckman_routes.py 100.00% <0.00%> (ø)
lighthouse/helpers/plate_events.py
lighthouse/routes/common/plate_events.py
lighthouse/helpers/plate_event_callbacks.py
...ouse/classes/events/beckman/source_unrecognised.py 100.00% <0.00%> (ø)
lighthouse/hooks/beckman_events.py 95.77% <0.00%> (ø)
... and 13 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@sdjmchattie
Copy link
Contributor

Looks good to me. Obviously this isn't actually going to be merged and if it works correctly that's good enough.

update_query = { "Root Sample ID": original_root_sample_id }
new_value = { "$set": { "Root Sample ID": root_sample_id } }

table.update_one(update_query, new_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be updateMany because we want updating all entries with the same root_sample_id.

@@ -0,0 +1,15 @@
# get the root_sample_ids, fix them, write these to a CSV to be used with the 'write_data' script (which inserts the fixed IDs into the DBs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would group all files related with the same data fix together, so we can add other data fixes in future if needed but keeping things separate, what do you think on creating a subfolder dpl-048 and move all files inside there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for future reference, it will be very useful to have a README.md file in that subfolder with the description of this data fix and how to run it.

Copy link
Contributor

@sdjmchattie sdjmchattie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a brief question. I didn't check all the logic makes sense but it looks like you know what you're trying to do. I don't know the background of what needs to happen.


if __name__ == "__main__":
save_data()
parser = argparse.ArgumentParser()
parser.add_argument("--input_file", required=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required=False? I think you later rely on there being a value. Might be wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't give an input_file it will return 'None' when you ask for it on line 30. Then that gets passed into the method, which has a check for whether it exists (ie. whether it's not None or it is None), and if it wasn't None it will read the file but if not it goes to the DB to get the data. This line was only for testing really, so that I could check that it would fix the data correctly and save to CSV when given some dummy data. In reality it should go to the DB to get the data because that's the data we're trying to fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see. That makes sense! Good stuff.

@stevieing
Copy link
Contributor

Closed as not needed

@stevieing stevieing closed this Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DPL-471 - Fix historical data for malfomed root sample ids to support important samples algorithim (C=?, V=4)
4 participants