Handling large collections of datasets #125

diazrenata · 2019-05-11T13:08:10Z

If we go by route, the BBS data will really be many communities.
If we continue to handle each community separately (i.e. name the object and label everything with that name in the drake plans), it will be a lot of code and a lot of names.

I've already encountered this in BBS and made a temporary workaround:

when subsetting by route in the by-route branch, there needs to be some way of keeping track of which route & region the resulting abundance timeseries came from
I stuck those on as columns in the covariates table.

For a more systematic solution, my initial thought is we could allow datasets to exist as either the currently specified format or as a list of datasets in that format?
That would allow us to continue to refer to the whole lot of them as "bbs", but it would mean we'd need to build in some way of checking the format and handling them appropriately.

Alternatively, we could change the way we handle the individual communities, to accommodate wild cards/iterating over a list?

The text was updated successfully, but these errors were encountered:

ha0ye · 2019-05-29T16:02:28Z

For the sake of parallelization, I think it makes sense to allow for big/wide drake plans with separate targets for each "unit" of data.

I know there are ways of organizing drake plans that would help to organize the visualization - and we could always add custom columns to the drake plans with hierarchical info, such as "all these datasets are part of BBS", "these datasets have no higher-level grouping", that I think should propagate properly through to analysis targets.

If you figure out a good approach, that would be great! My dependency graph for MATSS-forecasting is getting hairy...

This was referenced May 12, 2019

By route and cleaning #124

Closed

Import BBS data by route #127

Merged

ha0ye added the enhancement New feature or request label Jun 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling large collections of datasets #125

Handling large collections of datasets #125

diazrenata commented May 11, 2019

ha0ye commented May 29, 2019

Handling large collections of datasets #125

Handling large collections of datasets #125

Comments

diazrenata commented May 11, 2019

ha0ye commented May 29, 2019