Prepare assign_clades CLI to use functionality in CladeTime and Tree classes #42

bsweger · 2024-10-24T14:06:02Z

Closes #19

Background

This PR lays the groundwork for upcoming "needed for eval" cladetime issues.

Even though it will be possible to write a "get target data script" using cladetime's new CladeTime and Tree classes, @elray1 and I decided to retain the small assign_clades CLI for usability.

Specific changes

Remove NCBI-related code
Clean up the Config class
For assign_clades steps that already have a CladeTime counterpart, switch to using CladeTime object
Create stubs for assign_clades steps that require not-yet-implemented-or-merged CladeTime/Tree features

There's also a few one-off tidying steps like removing the get_clade_list script, which has moved to variant-nowcast-hub and marking some of the functions as "private"

This functionality of this file is specific to the variant-nowcast-hub, so it's been moved there: reichlab/variant-nowcast-hub#123

The assign_clades CLI will no longer get its sequence data from the NCBI API. Instead, it will use methods from the newer CladeTime class to retrieve sequence data and metadata from Nextstrain. This changeset cleans up the NCBI compoments from the library and mocks out a new process for the assign_clades CLI. Once the changes are merged, the assign_clades CLI won't actually do anything, but the scaffolding will be in place to build out a newer version based on the building blocks that we're adding to the CladeTime and Tree classes.

bsweger · 2024-10-24T14:13:59Z

src/cladetime/assign_clades.py

        ]
    )

-    logger.info("Assigned sequences to clades via Nextclade CLI", output_file=config.assignment_no_metadata_file)
+    logger.info("Assigned sequences to clades via Nextclade CLI", output_file="some path stuff")


A temporary placeholder so the logger doesn't error when referencing a config attribute that no longer exists.

bsweger · 2024-10-24T14:15:38Z

src/cladetime/assign_clades.py

-    merged_data = merge_metadata(config)
-    merged_data.write_csv(config.assignment_file)
+
+    with tempfile.TemporaryDirectory() as tmpdir:


Unlike prior versions of assign_clade, this new thinking assumes that we can save the intermediate files required for clade assignment in a temporary directory and only worry about saving the final output(s) to disk.

bsweger · 2024-10-24T14:16:45Z

src/cladetime/util/config.py

        data_path_root: str | None,
    ):
        if data_path_root:
-            self.data_path = AnyPath(data_path_root)
+            self.data_path = Path(data_path_root)


AnyPath wasn't really helping us but was responsible for a lot of annoying type errors, so switched back to good old Path

bsweger · 2024-10-24T14:19:22Z

src/cladetime/util/sequence.py

-    session: Session, bucket: str, key: str, data_path: Path, as_of: str | None = None, use_existing: bool = False
-) -> Path:
-    """Download the latest GenBank genome metadata data from Nextstrain."""
+def _download_from_url(session: Session, url: str, data_path: Path) -> Path:


Now that CladeTime provides the S3 URLs that point to the correct versions of sequence data and metadata, we can replace download_covid_genome_metadata with a more generic "download data using a given URL" function.

bsweger · 2024-10-24T14:20:39Z

tests/unit/util/test_sequence.py

@@ -67,31 +65,6 @@ def test_get_covid_genome_metadata_url(s3_setup, test_file_path, metadata_file):
    assert isinstance(metadata, pl.LazyFrame)


-@pytest.mark.parametrize(


Removed these tests because the function is gone and was been replaced by something that doesn't have any logic (it just downloads data using the requests library)

bsweger · 2024-10-24T14:25:19Z

tests/unit/test_get_clade_list.py

@@ -2,7 +2,11 @@
 from unittest.mock import MagicMock, patch

 import pytest
-from cladetime.get_clade_list import main
+
+pytest.importorskip(


I really like this parameterized test for checking our "get_clade_list" logic. However, that script now (rightfully) lives in the variant nowcast hub.

Need to do some more thinking about how to incorporate these tests in the other repo, where we don't necessary have a python/pytest environment set up.

In the meantime, added this code to skip the test file.

elray1

approved

bsweger added 2 commits October 23, 2024 16:25

Remove get_clade_list.py

84bef2f

This functionality of this file is specific to the variant-nowcast-hub, so it's been moved there: reichlab/variant-nowcast-hub#123

bsweger force-pushed the bsweger/new-clade-assign-scaffolding/6 branch from 8bd9623 to 3661b93 Compare October 24, 2024 14:12

bsweger commented Oct 24, 2024

View reviewed changes

bsweger requested a review from elray1 October 24, 2024 14:51

elray1 approved these changes Oct 24, 2024

View reviewed changes

bsweger merged commit 31b982f into main Oct 24, 2024
2 checks passed

bsweger deleted the bsweger/new-clade-assign-scaffolding/6 branch October 24, 2024 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare assign_clades CLI to use functionality in CladeTime and Tree classes #42

Prepare assign_clades CLI to use functionality in CladeTime and Tree classes #42

bsweger commented Oct 24, 2024 •

edited

Loading

bsweger Oct 24, 2024

bsweger Oct 24, 2024

bsweger Oct 24, 2024

bsweger Oct 24, 2024

bsweger Oct 24, 2024

bsweger Oct 24, 2024

elray1 left a comment

		@@ -67,31 +65,6 @@ def test_get_covid_genome_metadata_url(s3_setup, test_file_path, metadata_file):
		assert isinstance(metadata, pl.LazyFrame)


		@pytest.mark.parametrize(

Prepare assign_clades CLI to use functionality in CladeTime and Tree classes #42

Prepare assign_clades CLI to use functionality in CladeTime and Tree classes #42

Conversation

bsweger commented Oct 24, 2024 • edited Loading

bsweger Oct 24, 2024

Choose a reason for hiding this comment

bsweger Oct 24, 2024

Choose a reason for hiding this comment

bsweger Oct 24, 2024

Choose a reason for hiding this comment

bsweger Oct 24, 2024

Choose a reason for hiding this comment

bsweger Oct 24, 2024

Choose a reason for hiding this comment

bsweger Oct 24, 2024

Choose a reason for hiding this comment

elray1 left a comment

Choose a reason for hiding this comment

bsweger commented Oct 24, 2024 •

edited

Loading