Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter which lineages are modeled #29

Closed
thanasibakis opened this issue Aug 14, 2024 · 2 comments
Closed

Filter which lineages are modeled #29

thanasibakis opened this issue Aug 14, 2024 · 2 comments

Comments

@thanasibakis
Copy link
Collaborator

One assumption that we've been hard-coding and need to make configurable is that we want to model all lineages. (... for the July data I'm playing with, the hierarchical model shows pretty clearly that most of the lineages have negligible proportions.)

Originally posted by @afmagee42 in #25 (comment)

@afmagee42
Copy link
Collaborator

afmagee42 commented Aug 19, 2024

While we're at it, we should probably be filtering on the lineage assigned being valid. Right now I don't think we're doing any filtering? But in the whole-US, all-time data, I'm seeing:

['23B', '20F', '21K', '20I', 'recombinant', '20E', '21E', '22C', '23G', '23C', '21I', '20H', '21H', '22D', '20G', '22E', '20B', '23H', '21G', '20A', '22F', '23D', None, '20D', '21J', '23A', '22A', '24A', '21B', '21M', '24B', '20J', '23E', '23F', '21L', '21C', '22B', '21D', '21F', '23I', '20C', '19B', '19A', '24C', '21A']

24C hasn't been put into the tree of clades yet, sadly.

None should be removed.

"Recombinant" I'm still not entirely sure what we want to do with, but it's probably best dealt with on a weekly basis.

(NB: added none-removal in #32)

thanasibakis added a commit that referenced this issue Aug 26, 2024
This PR identifies two sources of difficulty in fitting the model in
early 2022 (end of Delta, start of Omicron).
1. Data filtering now allows removing trivial lineages and grouping them
into "other," resolving #29 and greatly reducing the computational
burden when many negligible lineages are floating around.
2. The hierarchical model appears to have been a bit too flexible, even
with the changes in #41. Here we remove one layer of the hierarchy,
fixing `sigma_beta_1` instead of inferring it.

The combined result is that the hierarchical model now works (MCMC is
believable) and produces reasonable-seeming results for a 2022-01-01
forecast date.

Late addition, mostly out of scope but worth including: This PR also
removes the filtering based on comparing sequence date to clade name.
The intent was to avoid clearly incorrect calls like 23A in April 2020.
However, it was causing problems for lineages like 24A, which takes off
in late 2023 and starts 2024 at high prevalence. As the percent of all
instances of (clade year) > (sample year) is small, and as many of those
instances are clearly valid, leaving the remainder in the dataset is the
lesser evil. Especially with the institution of (1) which should sweep
those into "other," minimizing issues.

---------

Co-authored-by: Thanasi Bakis <[email protected]>
@afmagee42
Copy link
Collaborator

Fixed in #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants