Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import BBS data by route #127

Merged
merged 37 commits into from
May 28, 2019
Merged

Import BBS data by route #127

merged 37 commits into from
May 28, 2019

Conversation

diazrenata
Copy link
Member

@diazrenata diazrenata commented May 13, 2019

Functions to

  • subset the BBS data to sufficiently long timeseries
  • extract each route and clean species labels, etc
  • add the community data for every route to datasets plans

The initial processing step (prepare_bbs_ts_data) does the cleaning and subsetting necessary to get the list of routes and regions for the plans, and saves the processed abundance table and the tables for making the plans in the directory with the data path. The remaining functions load those data tables as needed (trying to minimize the number of times we interact with the main dataframe - it's slow to load).

get_bbs_route_region_data pulls a specified route + region and does the final cleaning on it. I've broken the cleaning into two steps because this last step takes forever and crashed R for me a few times when I tried to do it on the whole dataframe at once.

build_bbs_datasets_plan makes a drake plan to get data for all the routes. (2500 of them)

build_datasets_plan now includes the option to include BBS, in which case it will use build_bbs_datasets_plan.

handles #125

@diazrenata diazrenata mentioned this pull request May 13, 2019
Copy link
Member

@ha0ye ha0ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general notes:

  • build_datasets_plan(), build_bbs_datasets_plan(), prepare_bbs_ts_data(), and get_bbs_route_region_data() all take a path argument, but the path argument isn't consistently passed between them, and so the default option is relied on.
  • some of the file paths are created using paste0; I think the recommend approach is to use file.path() so that we don't run into issues with forward-slash, back-slash confusion (across different operating systems)
  • one line 130, the call to get_bbs_route_region_data() doesn't use the path argument, and it probably needs !!, as on lines 93 and 94.
  • should the default end_yr in prepare_bbs_ts_data() be 2017? Maybe we can add the option to have the argument be NA, and have it be ignored?

@ha0ye
Copy link
Member

ha0ye commented May 20, 2019

More generally, it seems like get_bbs_route_region_data() is pretty slow, and I think this is because it's trying to load the entire bbs dataset. Since it seems like the new workflow is for prepare_bbs_ts_data() to be called first, maybe it should split the data into separate files for each route x region, which then speeds up get_bbs_route_region_data()?

@codecov-io
Copy link

Codecov Report

Merging #127 into master will decrease coverage by 19.09%.
The diff coverage is 0.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #127      +/-   ##
==========================================
- Coverage   89.18%   70.08%   -19.1%     
==========================================
  Files          13       14       +1     
  Lines         490      575      +85     
==========================================
- Hits          437      403      -34     
- Misses         53      172     +119
Impacted Files Coverage Δ
R/get_retriever_data.R 88.88% <0%> (-11.12%) ⬇️
R/bbs_cleaning_functions.R 0% <0%> (ø)
R/build_plans.R 65.95% <5.88%> (-34.05%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab5cc74...e02c118. Read the comment docs.

@codecov-io
Copy link

codecov-io commented May 27, 2019

Codecov Report

Merging #127 into master will decrease coverage by 19.46%.
The diff coverage is 0.81%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #127       +/-   ##
===========================================
- Coverage   89.18%   69.72%   -19.47%     
===========================================
  Files          13       14        +1     
  Lines         490      578       +88     
===========================================
- Hits          437      403       -34     
- Misses         53      175      +122
Impacted Files Coverage Δ
R/get_retriever_data.R 88.88% <0%> (-11.12%) ⬇️
R/bbs_cleaning_functions.R 0% <0%> (ø)
R/build_plans.R 63.26% <5.26%> (-36.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab5cc74...3c1be4a. Read the comment docs.

@diazrenata
Copy link
Member Author

@ha0ye, could you maybe take a look at this version of things? Hopefully it's a little more streamlined!

@ha0ye
Copy link
Member

ha0ye commented May 28, 2019

@diazrenata I just noticed that you have files in vignettes/.drake as part of your commit - did you mean to include those?

@diazrenata
Copy link
Member Author

diazrenata commented May 28, 2019 via email

@ha0ye ha0ye merged commit 9c89304 into master May 28, 2019
@ha0ye ha0ye mentioned this pull request Jun 7, 2019
@ha0ye ha0ye deleted the by-route branch July 11, 2019 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants