Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple history visualization #32

Merged
merged 5 commits into from
Aug 20, 2024
Merged

Simple history visualization #32

merged 5 commits into from
Aug 20, 2024

Conversation

afmagee42
Copy link
Collaborator

This PR adds a simple tool (currently filed in exploration) to visualize lineages over time from the NextStrain data without modeling.

It can make plots like the below, where we focus on a particular time range (here 2022) and filter out lineages not ever seen above some percent (here 10%).

whole_history_0 5

I've eschewed stacked charts so it's a bit easier to see what happens to any particular lineage, because the point of this is for us to choose parts of the life cycle of a lineage to model.

Out-of-scope additions

In making this, I found that some sequences are assigned impossible clades. Like a sequence from 2020 being assigned to 24A. Among all data with valid dates and valid clades, this appears to happen <1% of the time.

I have thus added linmod.data.with_bad_ns_assign() as a function to add a column impossible to a polars dataframe that says whether a lineage assignment is impossible or not.

I have also plugged this into our filtering in linmod.data.main.

@afmagee42 afmagee42 requested a review from thanasibakis August 19, 2024 18:17
@afmagee42 afmagee42 linked an issue Aug 19, 2024 that may be closed by this pull request
@thanasibakis thanasibakis merged commit 54099d4 into main Aug 20, 2024
1 check passed
@thanasibakis thanasibakis deleted the afm-27 branch August 20, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make a simple viz script for all data
2 participants