Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bakta] update to latest version full and light db #740

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

fraser-combe
Copy link
Contributor

@fraser-combe fraser-combe commented Jan 24, 2025

This PR closes #445

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

This PR updates Bakta to the latest version 1.10.3, incorporating both the full and light databases

⚡ Impacted Workflows/Tasks

Tasks: task_bakta.wdl
Workflows TheiaProk workflows have been updated to incorporate the full and light database download before the call and updated outputs

This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

  • Updated Bakta version to 1.10.3.
  • Databases are in rp buckets GCS
  • Added support for both full and light database configurations.
  • Enhanced database handling for custom inputs.

⚙️ Algorithm

➡️ Inputs

Added bakta_db input to specify database type (light, full, or custom).
bakta_custom_db input for user-defined database paths.

⬅️ Outputs

Outputs now include a summary plot (.png)
image

🧪 Testing

Verified Bakta annotation with the full and light databases.
Confirmed database extraction and annotation processes.

Ran all Theiaprok workflows to confirm successful Bakta annotation and outputs file generation

Theiaprok_FASTA - light db

TheiaProk_Illumina_SE - light db

TheiaProk_Illumina_PE - light db

TheiaProk_Illumina_ONT - Light db

TheiaProk Illumina_ONT - Full db

Suggested Scenarios for Reviewer to Test

Test annotations with the light and full database
Confirm output files are generated correctly, including summary plots.

Opinions on the runtime parameters will be useful. I had to increase disk size to deal with the full database.

🔬 Final Developer Checklist

  • The workflow/task has been tested and results, including file contents, are as anticipated
  • The CI/CD has been adjusted and tests are passing (Theiagen developers)
  • Code changes follow the style guide
  • Documentation and/or workflow diagrams have been updated if applicable
    • You have updated the "Last Known Changes" field for any affected workflows in the respective workflow documentation page and for every entry in the three workflows_overview tables to be the tag for the next upcoming release. If you do not know the tag, please put "vX.X.X"

🎯 Reviewer Checklist

  • All changed results have been confirmed
  • You have tested the PR appropriately (see the testing guide for more information)
  • All code adheres to the style guide
  • MD5 sums have been updated
  • The PR author has addressed all comments
  • The documentation has been updated

@fraser-combe fraser-combe marked this pull request as ready for review January 27, 2025 14:07
@fraser-combe fraser-combe requested a review from a team as a code owner January 27, 2025 14:07
@fraser-combe
Copy link
Contributor Author

Good suggestion I've updated the workflow and documentation so user inputs string "light" , "full" or the gsurl to the custom bucket

Re ran theiaprok workflows to confirm custom light and full strings are working as expected

Theiaprok ont
light
custom
full (default)

confirmed other theiaprok workflows are successful for bakta

theiaprok illumina PE and SE (default settings)

theiaprok_fasta (default settings)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update bakta to latest version; host the "full" db as well as "light" db on requester pays bucket
2 participants