Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbind_results() helper function #42

Merged
merged 12 commits into from
Mar 25, 2024
Merged

rbind_results() helper function #42

merged 12 commits into from
Mar 25, 2024

Conversation

JosiahParry
Copy link
Collaborator

@JosiahParry JosiahParry commented Mar 24, 2024

This PR adds a new function that takes a list of data.frames or sf objects and combines them as efficiently as possible.

Closes #38
Related R-ArcGIS/arcgislayers#167

This helps us with the pattern described in #38. There are two points I want to make. One of the annoying bits of the Esri REST API endpoints is that errors are not returned as a status code, but rather, the error is captured in the json body.

Avoiding httr2::resps_data()

httr2::resps_data() is a useful function, however, it uses vctrs::list_unchop() which is only marginally faster than do.call(rbind.data.frame, list()) whereas collapse::rowbind() is orders of magnitude faster. The approach introduced in #42 will use collapse, data.table, vctrs, then base R to ensure the fastest approach is used. These are incompatible with httr2::resps_data().

Errors are not captured 400 codes

One challenging part of the ArcGIS location service endpoints is that errors are not returned as a 400 code. Instead, they are a json response with the error as plain text json.

The approach that I think might be most streamlined is this:

  1. for each response process the results, if an error is encountered return NULL
  2. Combine the results (a la rbind_results() helper function #42)
  3. Use the NULL responses to add error information to the response as an attribute. How do we do this?

This approach can be wrapped in another utility function similar to resps_data() something like arc_resps_data() which would have a signature like so:

arc_resps_data <- function(resps, .f)  {
  rbind_results(lapply(resps, .f))
}

The attribute null_elements returns the indices of the errors. Then each function can either return the requests / response or...something?

@elipousson, would you mind reviewing if you have time?

@elipousson
Copy link

Taking a quick look now, looks good to me. Moving this into {arcgisutils} should definitely reduce the complexity of arc_select()! A couple of notes:

  • You may want to pass call with this check: check_data_frame(item, allow_null = TRUE, call = call)
  • If you wanted to stick with vctrs::vec_rbind(), just pass to vctrs::list_drop_empty() to drop the NULL elements
  • The description for rbind_results has an outdated reference to dplyr::bind_rows()
  • I like the idea of using a ptype to return columns with appropriate types when all elements are missing but, by this point in the flow, you don't have the layer metadata handy to access the fields reference. If you want that behavior, you may need to pass it along with the responses to rbind_results() (or maybe attach them as part of the attributes?). It adds some complexity to the object structure but could be worth doing for other reasons also, e.g. handling domain constraints.

I think the inherits_or_null call should handle this but it did take some figuring out how to make sure this worked with tables, feature layers, and layers when returnGeometry was set to FALSE. There are also endpoints that return KML, XML, or other data types which this function would presumably not be able to handle.

@JosiahParry
Copy link
Collaborator Author

Thank you very much for your feedback! I've modified the check_data_frame() call so that the call is passed along with it as well as the arg value. There is a new test for this as well.

I've fixed the documentation to reference vctrs::list_unchop() and I've added an unused argument .ptype to rbind_results() that is reserved for a future time.

@JosiahParry JosiahParry merged commit 25a7073 into main Mar 25, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

squish_df helper
2 participants