-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter which lineages are modeled #29
Comments
While we're at it, we should probably be filtering on the lineage assigned being valid. Right now I don't think we're doing any filtering? But in the whole-US, all-time data, I'm seeing:
24C hasn't been put into the tree of clades yet, sadly.
"Recombinant" I'm still not entirely sure what we want to do with, but it's probably best dealt with on a weekly basis. (NB: added none-removal in #32) |
This PR identifies two sources of difficulty in fitting the model in early 2022 (end of Delta, start of Omicron). 1. Data filtering now allows removing trivial lineages and grouping them into "other," resolving #29 and greatly reducing the computational burden when many negligible lineages are floating around. 2. The hierarchical model appears to have been a bit too flexible, even with the changes in #41. Here we remove one layer of the hierarchy, fixing `sigma_beta_1` instead of inferring it. The combined result is that the hierarchical model now works (MCMC is believable) and produces reasonable-seeming results for a 2022-01-01 forecast date. Late addition, mostly out of scope but worth including: This PR also removes the filtering based on comparing sequence date to clade name. The intent was to avoid clearly incorrect calls like 23A in April 2020. However, it was causing problems for lineages like 24A, which takes off in late 2023 and starts 2024 at high prevalence. As the percent of all instances of (clade year) > (sample year) is small, and as many of those instances are clearly valid, leaving the remainder in the dataset is the lesser evil. Especially with the institution of (1) which should sweep those into "other," minimizing issues. --------- Co-authored-by: Thanasi Bakis <[email protected]>
Fixed in #45 |
One assumption that we've been hard-coding and need to make configurable is that we want to model all lineages. (... for the July data I'm playing with, the hierarchical model shows pretty clearly that most of the lineages have negligible proportions.)
Originally posted by @afmagee42 in #25 (comment)
The text was updated successfully, but these errors were encountered: