Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ability to rename columns? #68

Open
cboettig opened this issue May 24, 2018 · 3 comments
Open

ability to rename columns? #68

cboettig opened this issue May 24, 2018 · 3 comments

Comments

@cboettig
Copy link

This package is very exciting, great work! Our dataspice team was thrilled to see how nicely it already handles common requests on a dataspice.json file, e.g.:

library(roomba)
json <- jsonlite::read_json("https://raw.githubusercontent.com/ropenscilabs/dataspice/master/inst/metadata-tables/dataspice.json")

## Works nicely when all columns come from same level of nesting:
json %>% roomba(c("givenName", "familyName"))
json %>% roomba(c("value", "unitText", "description"))

It would be great if the cols argument could take a named list that could also rename the output columns on the fly:

json %>% roomba(c("value", units = "unitText", "description")),
@aedobbyn
Copy link
Collaborator

aedobbyn commented May 25, 2018

@cboettig nice idea!

The interesting thing about the workhorse function that roomba's wrapped around, dfs_idx is that we can even include a call to filter based on some value, e.g. "givenName" == "Bob". Right now we just check whether there is any value at all with has_good_stuff() inside roomba.

For this v1 we decided not to allow the user to filter in the roomba() call since we weren't sure how the syntax would work. Say you wanted "value" > 42 & "value" =< 100. (What we could do is allow the user to pass a list of all the conditions that they want to be true and then string them together, separated by &s if keep == all and |s if keep == any before passing the conditions into a revamped has_good_stuff.)

Renaming feels like a nice-to-have, but if we were to make the inputs to roomba() allow for more complexity it feels like the a filter might be more useful than a rename.

Would love to hear what you think about allowing for filtering and concatenating the conditionals!

@cboettig
Copy link
Author

Very cool. Yeah, it makes total sense to keep the syntax simple even at the cost of features -- we already have a bunch of feature-rich queries like jq that most of us find too complex for regular use.

So I agree that it doesn't make sense to start adding too many different optional arguments inside roomba() function. I do wonder though if a dplyr-esque pipeline syntax might be possible, e.g. a roomba() %>% filter() %>% rename() kind of deal. (might need a lazy-eval strategy where the piped commands are stored and combined into a single operation around a dfs_idx call?)

@aedobbyn
Copy link
Collaborator

Ah that's a really cool idea! We could even store all the operations the user wants to do and then carry them out in something like

roomba_plan <- 
  roomba(cols = c(x, y z)) %>% 
  filter(x > 4) %>% 
  rename(a = x)

df %>% roomba_execute(roomba_plan)

I suppose it doesn't matter perfomance-wise whether the rename happens in roomba_plan or after roomba_execute() but the filter I can see being useful to pass to dfs_idx rather than doing it after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants