-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure incidence object as data frame to allow for facetting #104
Comments
This is related to #76 |
This would be very useful. Wanted to make some epicurves yesterday and used ggplot as couldn't facet with incidence. |
Note that this was brought up in Ben Bolker's review of the incidence paper:
My initial thoughts on this:
Thinking about it, this new object can simply inherit from a data frame so that it can easily be plugged into existing data manipulation architecture. We would need to store the aggregation information as attributes (date column, interval, date range, and grouping), but that' shouldn't be too difficult. The user should be able to use the accessors to get the dates, range, counts, etc. One of the challenges will be how to represent faceted counts when someone wants to use |
I just came up with an idea that we may implement the facetting functionality for incidence plots in an easy, efficient and elegant way. Why not make the best use of the existing facetting functionality provided by ggplot2 package? If incidence plots can be regarded as an extension of ggplot2 Looking at the very first example from the
Test the new
Once the
Therefore, instead of adjusting the storing structure, we can re-implement the incidence plots in the way of extending ggplot2, for instance, a bar geom called |
I like the idea, but it looks like this would not be using the |
I think the
|
I think @caijun has the right idea here. If we are going to move to a framework where incidence objects are created on the fly, there's no reason why we can't also create ggplot2 geom functions since these would inherently rely on the same underlying architecture to create the incidence object in the first place. This way people can do something like: indicence(x, dates = date_of_onset, interval = "1 ISO week", group = gender) %>%
ggplot() +
geom_incidence(show_cases = TRUE) + # no extra arguments since the internal data is already an incidence object
scale_fill_incidence(pal = 1) +
scale_x_incidence() +
facet_grid(aaa ~ bbb+ccc) or ggplot(x, aes(date_of_onset, fill = gender)) +
geom_incidence(interval = "1 ISO week", show_cases = TRUE) +
scale_fill_incidence(pal = 1) +
scale_x_incidence(interval = "1 ISO week") +
facet_grid(aaa ~ bbb+ccc) Theoretically, even adding the fit objects should work (though it will be wonky since the users would have to use the Of course, incidence(x, dates = date_of_onset, interval = "1 ISO week", group = gender) %>%
plot(show_cases = TRUE) +
facet_grid(aaa ~ bbb + ccc) We've already seen that several users want to do things with the epicurve that can't really be done with the current framework due to limitations on the data structure itself because it represents an immutable summary, so it make sense to show people that they can use the |
|
The more I think about it, the more I'm thinking that we should port the internal functionality to the {tsibble} package. It has everything that we have except for the plotting. All of the steps below can be abstracted away for our users and they can return an incidence object if they want or they can return a tsibble. Our plotting can take care of tsibble objects as well. The only problem is that it will make the incidence package heavier. library(tsibble)
library(dplyr)
library(aweek)
set_week_start("Saturday")
ll <- outbreaks::ebola_sim_clean$linelist
ll %>%
as_tsibble(key = case_id, index = date_of_onset) %>%
index_by(week = ~as.Date(aweek::as.aweek(.))) %>%
group_by(gender) %>%
summarize(n = n())
#> # A tsibble: 698 x 3 [1D]
#> # Key: gender [2]
#> gender week n
#> <fct> <date> <int>
#> 1 f 2014-04-07 1
#> 2 f 2014-04-21 1
#> 3 f 2014-04-25 1
#> 4 f 2014-04-26 1
#> 5 f 2014-04-27 1
#> 6 f 2014-05-01 2
#> 7 f 2014-05-03 1
#> 8 f 2014-05-04 1
#> 9 f 2014-05-06 2
#> 10 f 2014-05-07 2
#> # … with 688 more rows Created on 2019-12-16 by the reprex package (v0.3.0) |
Sound great! I really like the idea of relying on As far as I can tell, re-implementing old features should be easy:
Importantly, that also means people will be able to use standard |
This is a tricky one, since we aggregate data into the
$counts
matrix, but it would be nice to be able to combine plottingincidence
objects with agroup
for filling andfacet_grid()
using another criteria. it would get something similar to:where
aaa
bbb
andccc
are factors.This may need some rethinking of our internal data represention, so I appreciate it may be mid-to-long term changes.
The text was updated successfully, but these errors were encountered: