Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Demand API #53

Open
knapply opened this issue Oct 24, 2020 · 15 comments
Open

On Demand API #53

knapply opened this issue Oct 24, 2020 · 15 comments
Labels
enhancement New feature or request help wanted Extra attention is needed on demand

Comments

@knapply
Copy link
Collaborator

knapply commented Oct 24, 2020

As discussed in #52 (review)

How best to leverage simdjson's On Demand API?

@knapply knapply added enhancement New feature or request on demand labels Oct 24, 2020
@knapply
Copy link
Collaborator Author

knapply commented Oct 28, 2020

@jkeiser, @lemire Following up on your suggestion from here after I played with it a bit.

Are there any examples of iterating through arrays of numbers, strings, bools, etc.?

I can only seem to find examples accessing the scalar values in objects using their keys.

As an example (I was trying this out on some GeoJSON), how would we access the individual points in the following (after accessing the coordinates with something like auto coords = doc["coordinates"])?

{
  "type": "LineString", 
  "coordinates": [
    [1, 2], [3, 4], [5, 6]
  ]
}

For some R context, we would need to turn "coordinates" into a 3x2 matrix (no offense intended if you're R aficionados - I don't want to assume)...

geojson <- '{
  "type": "LineString", 
  "coordinates": [
    [1, 2], [3, 4], [5, 6]
  ]
}'

parsed <- RcppSimdJson::fparse(geojson)
parsed
#> $type
#> [1] "LineString"
#> 
#> $coordinates
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    3    4
#> [3,]    5    6

... and since R is column-major, we're really building the following (with attached dimension attributes):

as.vector(parsed$coordinates)
#> [1] 1 3 5 2 4 6

@jkeiser
Copy link

jkeiser commented Oct 28, 2020

I'd expect it to look like:

int i=0;
for (auto point : doc["coordinates"]) {
  int j=0;
  for (double val : point) {
    matrix[i][j] = val;
    j++;
  }
  i++;
}

@knapply
Copy link
Collaborator Author

knapply commented Oct 28, 2020

Thank you. Is there a way to get an array's size (so we can tell how big matrix needs to be)?

@lemire
Copy link
Collaborator

lemire commented Oct 28, 2020

@knapply rbind?

@eddelbuettel
Copy link
Owner

No, for two reasons. Those things tend to be performance killers as the underlying R data structures don't grow easily. So a general Rcpp pattern is to collect everything in C++ (hello old friend STL) and then convert at end. Plus, rbind is at the R level and we are not going 'up and down' all the time. (Unless I misread your suggestion ...)

@lemire
Copy link
Collaborator

lemire commented Oct 28, 2020

I was half kidding.

@eddelbuettel
Copy link
Owner

(And @lemire for example on of my all-time fave little tools in R is data.table::rbindlist(someList) which scans your list, allocates properly and then inserts. Runs circles around the standard do.call(rbind, someList).)

@eddelbuettel
Copy link
Owner

Hah! Point taken with a grin!

@lemire
Copy link
Collaborator

lemire commented Oct 28, 2020

:-)

@knapply
Copy link
Collaborator Author

knapply commented Oct 28, 2020

Edit... welp, I typed too slow 😬

If only. I think I over complicated the question with a matrix.

Let's say we have an array...

[1, 2, 3, 4, 5,  6]

... and we want to insert the data into a vector (R/Rcpp or STL) equivalent to this...

std::vector<double>{1, 2, 3, 4, 5,  6};

We would need to know the size of the array...

std::vector<double> out(std::size(array));     // how do we do this?

int i = 0;
for (double element : array) {
  out[i++] = element;
 }

@lemire
Copy link
Collaborator

lemire commented Oct 28, 2020

std::vector<double> out;
for (double element : array) { out.push_back( element); }

@knapply
Copy link
Collaborator Author

knapply commented Oct 28, 2020

... still half kidding? :D

(I can't tell if knowing the size would even be possible with the On Demand API, but it sorta make this a non-starter).

@lemire
Copy link
Collaborator

lemire commented Oct 28, 2020

The above code is standard C++, although there is room for optimizations....
https://lemire.me/blog/2012/06/20/do-not-waste-time-with-stl-vectors/

@jkeiser
Copy link

jkeiser commented Oct 29, 2020

Yeah, using push back is necessary, though we could probably expose an upper bound on the array size if you are willing to risk overallocation... still, I"d test the push_back method first, the cost of vector growth may well be lower than the cost of materializing a DOM.

@lemire
Copy link
Collaborator

lemire commented Oct 29, 2020

For performance, you almost always want to overallocate if only temporarily.

@eddelbuettel eddelbuettel added the help wanted Extra attention is needed label Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed on demand
Projects
None yet
Development

No branches or pull requests

4 participants