Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling large collections of datasets #125

Open
diazrenata opened this issue May 11, 2019 · 1 comment
Open

Handling large collections of datasets #125

diazrenata opened this issue May 11, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@diazrenata
Copy link
Member

If we go by route, the BBS data will really be many communities.
If we continue to handle each community separately (i.e. name the object and label everything with that name in the drake plans), it will be a lot of code and a lot of names.

I've already encountered this in BBS and made a temporary workaround:

  • when subsetting by route in the by-route branch, there needs to be some way of keeping track of which route & region the resulting abundance timeseries came from
  • I stuck those on as columns in the covariates table.

For a more systematic solution, my initial thought is we could allow datasets to exist as either the currently specified format or as a list of datasets in that format?
That would allow us to continue to refer to the whole lot of them as "bbs", but it would mean we'd need to build in some way of checking the format and handling them appropriately.

Alternatively, we could change the way we handle the individual communities, to accommodate wild cards/iterating over a list?

This was referenced May 12, 2019
@ha0ye
Copy link
Member

ha0ye commented May 29, 2019

For the sake of parallelization, I think it makes sense to allow for big/wide drake plans with separate targets for each "unit" of data.

I know there are ways of organizing drake plans that would help to organize the visualization - and we could always add custom columns to the drake plans with hierarchical info, such as "all these datasets are part of BBS", "these datasets have no higher-level grouping", that I think should propagate properly through to analysis targets.

If you figure out a good approach, that would be great! My dependency graph for MATSS-forecasting is getting hairy...
Screen Shot 2019-05-29 at 11 58 48 AM

@ha0ye ha0ye added the enhancement New feature or request label Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants