Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sample_public_name for RVI Program - Bait Capture study #4586

Closed
wants to merge 39 commits into from

Conversation

sabrine33
Copy link
Contributor

Populate the sample_public_name field with sanger_sample_id values for samples associated with the 'RVI Program - Bait Capture' study, and where the sanger_sample_id starts with "RVI".

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

Attention: Patch coverage is 37.50000% with 5 lines in your changes missing coverage. Please review.

Project coverage is 89.21%. Comparing base (1e3a310) to head (190186e).

Files with missing lines Patch % Lines
...tasks/update_sample_public_name_for_rvi_study.rake 37.50% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4586      +/-   ##
===========================================
- Coverage    89.30%   89.21%   -0.09%     
===========================================
  Files         1406     1407       +1     
  Lines        30010    30018       +8     
===========================================
- Hits         26799    26780      -19     
- Misses        3211     3238      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@KatyTaylor KatyTaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL looks fine, but I've left a few questions about the approach.

namespace :samples do
desc 'Update sample_public_name for RVI Program - Bait Capture study'
task update_public_name: :environment do
ActiveRecord::Base.connection.execute('SET autocommit = 0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the benefit of having autocommit turned off, please?

I was reading the docs - https://dev.mysql.com/doc/refman/8.4/en/innodb-autocommit-commit-rollback.html - and it looks like the benefit would come if there were multiple SQL statements - but we only have one here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right; it is only one SQL statement, but it runs on all the samples of the RVI studies (quite a lot). Turning off auto-commit ensures the atomicity of the operation.

JOIN studies ON study_samples.study_id = studies.id
SET sample_metadata.sample_public_name = samples.sanger_sample_id
WHERE studies.name = 'RVI Program - Bait Capture' AND sanger_sample_id LIKE 'RVI%';
SQLQUERY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider whether to check if sample_public_name already has a value, to avoid overwriting it if it has been intentionally set already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using SQL directly means not using the ORM... which I think might mean you're bypassing some useful functionality. When a sample (and hopefully sample metadata) is updated, Sequencescape automatically updates the corresponding value in the MLWH (via the RabbitMQ queue). If you directly use SQL, I don't think this will happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@KatyTaylor
Copy link
Contributor

Oh and finally -

What's your plan for testing? You could add some unit tests, or just test it in training (should definitely do this anyway). If just manual tests in training, you could put a couple of SQL queries here that you intend to run to check the data?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants