Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for remote datasets over http/ftp #101

Open
shashi4u opened this issue Oct 24, 2024 · 3 comments
Open

Feature Request: Support for remote datasets over http/ftp #101

shashi4u opened this issue Oct 24, 2024 · 3 comments

Comments

@shashi4u
Copy link

shashi4u commented Oct 24, 2024

First of all, thank you for your work.

Is there any plan to support remotely hosted datasets that are served over http / ftp.
I noticed in the code that it is scanning whole dataset to build a dictionary of tiles for each dataset at startup
This may not be possible with the remotely hosted dataset, so we can provide a pre-built dictionary of tiles as a json / xml which will be read at startup. This technique also helps speedup start time with locally hosted large datasets as well.

Example config:

datasets:

- name: remote_dataset
  path: http://xyz.com/dataset
  metadata: http://xyz.com/dataset/meta.json
@ajnisbet
Copy link
Owner

ajnisbet commented Nov 2, 2024

Hey good request! And you're correct that opentopodata does scan the entire directory on startup.

If you only have a few remote files, you can mount them as a local drive with rclone.

  • If your remote supports returning the entire directory tree in a single request, and you tinker with rclone's caching settings, you can achieve decent performance for some dataset-remote combinations.
  • But performance can be very slow for large datasets.
  • And also the combination of FUSE and docker can be a bit flaky: I briefly tried this in production at gpxz.io, but no longer use this setup.

A better way might be to build a VRT of your tiles.

  • A VRT is an xml file that lists the filepath and bounds of tiles.
  • But because it's only a single file, only this file will be read by opentopodata (not all the files linked to inside).
  • You can pre-build one using gdal's gdalbuildvrt command.
  • VRTs can link to files over http, ftp, and s3 as well as locally (or a combination of all 3)

I have some more notes here about remote mounts and VRTs: notes/cloud-storage and here about the cons of VRTS: #91


If you do try VRTs and it doesn't work for you, I'd love to hear why knot and know more about your dataset! Remote/large datasets are something I would like to support in opentopodata!

@shashi4u
Copy link
Author

shashi4u commented Nov 4, 2024

Thanks, I will try VRT setup and update you if I have any issues.

@shashi4u
Copy link
Author

shashi4u commented Nov 5, 2024

I tried the VRT with SRTM naming for mapzen dataset served over ftp
-mapzen-vrt/
|----N00/
|--------N00E000.vrt
|--------N00E001.vrt
...
It worked flawlessly

I used the following command to generate the VRTs

gdalbuildvrt data/mapzen-vrt/N00/N00E000.vrt ftp://remote-server/data/mapzen/N00/N00E000.tif

I haven't tested the single vrt for the whole dataset yet.
However after looking at #91, I may have to do some hacking to parse the VRT and extract paths to individual tiles.
I will update you on the progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants