-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid round trip to disk in geojson_rw? #77
Comments
Good thinking. I'll have look too. Would be nice to avoid that extra time. |
Yeah, it's a killer on really big spatial objects. |
tried a bit, still not found a way to do it. asking around |
@hrbrmstr any thoughts on this one? how to collect the output of |
ooh. i think so. i'm pretty flat-out tomorrow (it's one of my 2x/wk commute to boston from Maine day) but lemme see how awake I am (just got done teaching my first college 2.5hr #rstats class and am fading fast :-) But I have an idea (and I've also started a side-write of |
thanks hugh, interesting, curious how https://github.com/thk686/rgdal2 fits in the mix |
oh wow. i had no idea he was doing that! I shall Use the Fork and see what On Thu, Feb 18, 2016 at 2:43 AM, Scott Chamberlain <[email protected]
|
um…what I'll post in a cpl hrs is, um…er…ugly. I might need assistance in making it cross platform and for getting other ideas. Perhaps even bringing Dirk into the convo. |
|
So, save the following block to
and, then run:
and then run:
and you should get:
I have many caveats in those comments and if @eddelbuettel cld take a peek, let me know how daft I am and offer some suggestions it'd be a Really Good Thing. As I say (I think) in the comments, traditional C++ |
|
I'm really not convinced
and
so it's substantially less. I suspect that'll be the case for many, if not all, |
works great! thanks so much. indeed, cross-platform is important. In your |
ugh. yeah (i'll fix) On Thu, Feb 18, 2016 at 10:22 AM, Scott Chamberlain <
|
Thanks so much for your help on this @hrbrmstr! I am on Windows here at work, I've given it a try. When I try to
It clearly doesn't recognize the |
As for the the |
thx, @ateucher. I've got a Windows VM back at the ranch (Tue & Thu are my commute days to Boston) I can debug it later (though I do import (oh, wait…i think i need to deal with using win32 and making a define to map I tried it with |
Thanks @hrbrmstr! I'm afraid my C++ knowledge is zero, but I'll see if I can track this down on my side too. |
@jeroenooms thoughts on the windows problem above #77 (comment) |
started incorporating on this branch https://github.com/ropensci/geojsonio/tree/geojsonrw-fix just to see how it all works - haven't incorporated into the there is a problem on check though:
Some discussion on it here http://stackoverflow.com/questions/28004717/r-check-doesnt-like-stdcout-c |
those are perfect Windows augmentations, @ateucher. I'm not worried abt that error msg. I can give verbiage for a CRAN submission with that there. I've not exhausted all other possibilities yet, tho. Since we're assuming I say "just" a bit too lightly there :-) TIL: |
thanks @ateucher using devtools::install_github("ropensci/geojsonio@geojsonrw-fix")
library("geojsonio")
library("rgdal")
cities <- readOGR(system.file("vectors", package = "rgdal")[1], "cities")
cwgr(cities[1:10,], "cities") |
@ateucher @hrbrmstr added a using it now in |
oh. wow. i need to get back to this. :-) On Thu, Apr 7, 2016 at 7:05 PM, Scott Chamberlain [email protected]
|
@hrbrmstr any chance you can take a look at this again? i can alternatively pull in Jeroen |
aye. back to 100% health and virtually caught up at work. On Tue, May 3, 2016 at 12:48 PM, Scott Chamberlain <[email protected]
|
With the main bottleneck of Although I do appreciate every effort in speeding up code, I do not really see the big bottleneck with the round-trip to disk on my system. In following benchmarks I tried to avoid the influence of disk-caches by flushing and also write/read large amounts of random data in between steps, but still it can not be out-ruled that disk-cache has positive impact (which in real life is good). input <- raster::getData('GADM', country='FRA', level=1)
system.time(rgdal::writeOGR(input, "/tmp/todisk", "", driver="GeoJSON"))
# user system elapsed
# 1.612 0.099 1.715
system.time(rgdal::writeOGR(input, "/tmp/ramdisk/toram", "", driver="GeoJSON"))
# user system elapsed
# 1.349 0.067 1.422
system.time(rgdal::writeOGR(input, "/vsistdout/", "", driver="GeoJSON"))
# user system elapsed
# 1.357 0.072 1.430
input <- raster::getData('GADM', country='CAN', level=3)
system.time(rgdal::writeOGR(input, "/tmp/todisk", "", driver="GeoJSON"))
# user system elapsed
# 53.917 1.628 56.806
system.time(rgdal::writeOGR(input, "/tmp/ramdisk/toram", "", driver="GeoJSON"))
# user system elapsed
# 54.672 0.892 55.656
system.time(rgdal::writeOGR(input, "/vsistdout/", "", driver="GeoJSON"))
# user system elapsed
# 54.266 0.923 55.282
For reading it does not make any difference either whether the data is already in RAM or if it is read from disk. fdsk <- "/tmp/todisk"
fram <- "/tmp/ramdisk/toram"
system.time(readChar(fdsk, file.info(fdsk)$size))
# user system elapsed
# 13.900 0.663 14.568
system.time(readChar(fram, file.info(fram)$size))
# user system elapsed
# 13.973 0.645 14.631
So the main speed-up which potentially can be saved is the overhead of needing to read-in the data after converting. I doubt it will be the full 14 seconds above though as it includes the converting of the captured output. |
Thanks for the detailed notes here @javrucebo - We'll definitely try avoiding writing to disk with |
Still a pressing issue? We can easily leverage sf to do this, I might need help with the structure-composing on the other side, but the sf-decomposing is no problem, and having a formalized pathway from sf to geojson-structures is something that would be really useful. |
Definitely want to avoid round tripping if possible. We've definitely sped this up a lot - if we can make it faster I'm all for it. Seems like @ateucher has done sf to geojson already, yes? So I think we need sp classes to geojson - We have some of the sp classes covered in the |
I haven't looked closely enough at the sp -> geojson in the geojson package yet, but I'm not convinced it deals properly with polygon holes in SpatialPolygons[DataFrames]s. But sf does Spatial -> sf without sp in |
@ateucher sounds good to use |
@ateucher ah, so by
you just meant borrow some code, not use |
Yes, it'd be good to use the sf approach, and possibly you could remove the dep on sp completely. but sf bundles with GDAL, GEOS and PROJ.4 so it's not exactly small. But fwiw it's very efficient to convert to sf, and it would allow you to remove those converters from your package if you relied on sf. I don't have enough understanding of geojsonio design to know what's best (I see import(sp) and rgdal is done holus!), but I do think you might be able away with copying the sf code to decompose sp and avoid the imports completely. (though the coercion stuff will take a bit of work, I would start by modifying the sp and rgdal imports to be granular, and then work on chopping them out. I'm not sure how much of the Rings stuff is handled by sf, but the to-DataFrame parts could be simpler, I think - and would be better in a separate package. That's sort of what spbabel was for, so you could give it anything Spatial* and then spbabel::sp() the resulting table/s and it would could hide a lot of stuff, but it's not clear that's going to help now. |
I made a start on the imports here, but to finish it might be more complicated than I'd like :) |
what does this mean? I think we'd still want to keep |
In my opinion, the goal would be to move I think we should be able to go from
Yup. |
Agree with you @ateucher - ideally all those pkgs in Suggests |
@sckott I just mean the entire package is imported "holus bolus", rather than just the functions needed. I'd like to make those imports granular so it's clear what's used and what isn't, which is a stepping stone to Suggests. |
ah, makes sense @mdsumner |
@ateucher i've replaced rgdal with sf - now that we're using |
I don't think so @sckott, pretty similar behaviour to rgdal in that sense... |
geojsonsf? |
I had a similar thought @mdsumner. I think I'll probably use it in rmapshaper too |
good idea @mdsumner |
There may be a way to avoid the round trip to disk in
geojson_rw
for convertingSpatial
objects togeojson
, but I haven't quite been able to make it work.The key: you can write to
stdout
withrgdal::writeOGR
:But I can't figure out a way to capture the output. I've tried various combinations of
sink()
,textConnection()
, andcapture.output()
but it's eluded me so far...The text was updated successfully, but these errors were encountered: