-
Notifications
You must be signed in to change notification settings - Fork 22
OSM data sources
Is it possible to get all the OSM data we need without maintaining our own servers?
OSM data is public, but in a format that requires additional processing/loading to be useful to an application. Hosting the full planet data is prohibitive in terms of cost and maintenance.
Soundscape fetches OSM data from dedicated servers as map tiles in GeoJSON format. Soundscape uses a fixed granularity for the tiles, specifically zoom level 16 (approximately 20 tiles per square mile).
Soundscape tries to maintain data coverage within 500m of the user's current position, which corresponds to 9-16 tiles. When a user loads Soundscape in a new location, either through their real position or via Street Preview , the app immediately makes a flurry of tile requests to the server to populate its device-local cache.
See Soundscape documentation:
- Little geographic locality (not expecting many simultaneous users in the same area)
- Some degree of burstiness -- need good performance for 10+ simultaneous queries when a user is in a new area
- But can usually tolerate above-average lag/latency, since app interaction is not blocked by missing data
- Any one user needs only a tiny fraction of the data, which could fit on their device
- Could each user's device be its own tile server?
- Use public API servers
- Overpass servers -- hosts OSM data with OSM-specific query language
- Proof of concept: Overscape builds Overpass queries and translates the results into Soundscape-flavored GeoJSON
- May be rate-limited -- unacceptable performance for bursty requests (timeouts)
-
Bunting Labs has a free tier
- Can only filter features by one tag at a time?
- The free tier is limited to 10M OSM features/month. Back of the envelope: A single tile in a relatively dense area could have 100s of features, and a single location pulls in ~10 tiles. So, ~1,000 features for a person standing still, maybe 10,000 for taking a walk, meaning 1,000 user-walks per month. (NB: Given the heavy caching in the app, multiple walks in the same area wouldn't count toward the request limit)
- Overpass servers -- hosts OSM data with OSM-specific query language
- On a developer machine or a build server, create all possible GeoJSON files, and serve them as static content.
- Not really practical for all 4^16 = 4.2 billion tiles in the world (https://wiki.openstreetmap.org/wiki/Zoom_levels), but could be used to pre-populate tiles for popular metro areas.
- Proof of concept using GitHub Actions, with discussion: https://github.com/openscape-community/openscape/issues/8
- Primary drawbacks: space-inefficient relative to DB or PBF; handling regions individually leads to incomplete tiles along boundaries
- Use P2P/decentralized storage for static tile data.
- Some work has been done to get OSM on IPFS, but probably not reliable/performant enough to serve as a backend service.
- User is responsible for creating/downloading data they need and loading it onto their device.
- Various sources for.pdf files
- whole country/state from geofabrik, manual cropping/filtering through e.g. osmium/osmosis
- custom area from bbbike.org or hotosm
- Would require additional processing, e.g. to compute intersections
- Can pre-populate a Realm database to be queried by the app? https://stackoverflow.com/questions/48673370/best-way-to-build-a-pre-filled-realm-database
- A Realm DB can also be constructed outside of the app through e.g. realm-js
- Various sources for.pdf files
- App fetches published .pbf files for larger regions
- On-device .pbf file parsing?
- Some similarity to what is asked here, without the rendering: https://www.reddit.com/r/openstreetmap/comments/yvdj9w/can_i_render_tiles_directly_from_osmpbf_data/
- Use DuckDB to directly query bulk data hosted as static files.
- Is the query performance is good enough for an interactive application?
- Some (seemingly active) work has been done to use DuckDB against OSM PBF files: https://github.com/duckdblabs/duckdb_spatial
- On-device .pbf file parsing?
- GitHub Actions -- unlimited minutes for public repos
- Cloudflare Workers -- could translate tile server to serverless functions that issue DB queries or decompress pre-computed tiles
When a request is received, return the first of these to match:
- Cache of flat GeoJSON files (the results of previous requests, and/or pre-computed tiles for entire regions)
- Our PostGIS server populated with as much of the world loaded as we have space for
- PostGIS servers belonging to partners maintaining data for specific countries/regions
- As a last resort, public Overpass servers via Overscape
Overture Maps data: https://github.com/OvertureMaps/data