Pelias is a set of tools for importing OpenStreetMap data into Elasticsearch, and a simple server to handle queries and autocomplete suggestions.
- PostgreSQL: You'll need a postGIS-enabled database with OpenStreetMap data, imported with osm2pgsql. NOTE: The import process expects certain fields, so you'll need to use the style file here: config/osm2pgsql.style
- Elasticsearch: For Search download the latest version of Elasticsearch
- Redis: Geonames lookup for quattroshapes cross-referencing
- Sidekiq: Used for background processing (also uses Redis)
- Ruby >= 2
To get set up, run the following.
$ git clone [email protected]:mapzen/pelias.git && cd pelias
$ bundle
$ bundle exec rake synonyms:build
$ bundle exec rake index:create
Geonames provide nice alternative names and populations for locations. We cross-reference this data with quattroshapes in the next step in order to provide a better search experience.
$ bundle exec rake geonames:prepare
These are shapes for various administrative shapes. They are provided by the http://quattroshapes.com/ project.
NOTE: These tasks are enqueued via Sidekiq and must be run in isolated steps.
You can run them inline by using the environment variable ES_INLINE=1
.
$ bundle exec rake quattroshapes:prepare_all
$ bundle exec rake quattroshapes:populate_admin0 ES_INLINE=1
$ bundle exec rake quattroshapes:populate_admin1 ES_INLINE=1
$ bundle exec rake quattroshapes:populate_admin2 ES_INLINE=1
$ bundle exec rake quattroshapes:populate_local_admin ES_INLINE=1
$ bundle exec rake quattroshapes:populate_locality ES_INLINE=1
$ bundle exec rake quattroshapes:populate_neighborhood ES_INLINE=1
Assuming you've set up a postGIS-enabled database with OSM data, the following will add all streets and addresses to the index, reverse geocoding them into the above shapes.
$ bundle exec rake osm:populate_street
$ bundle exec rake osm:populate_address
$ bundle exec rake osm:populate_poi
$ unicorn
You should now be able to access the server at http://localhost:8080/suggest?query=party
The following is a brief synopsis of setting up this environment including: approximate times to complete each step, amount of data, number of documents, etc.
- PostgreSQL/PostGIS: 1 c3.8xlarge
- this is only to facilitate the fastest of initial load times into pelias
- Elasticsearch: 20 m3.2xlarge
- optimization work to be done to lessen on heap storage requirements
- assumes 80 shards, 1 replica per shard, half of physical memory allocated to ES for heap
- Sidekiq: 8 c1.medium
- only required for initial import to complete in a timely manner
- can be removed once complete or scaled back as required for updates on an ongoing basis
Using this hardware allocation, we also recommend the following during the initial data load:
- disable replication in elasticsearch
- set the index refresh interval to something north of an hour (or disable it altogether for the duration of the indexing process)
- in PostgreSQL, add the following index (this will take some time if you're working with a full planet installation):
CREATE INDEX limit_street_line ON planet_osm_line (name, highway);
Using the above architecture, we've observed the following load times:
- geonames + quattroshapes: roughly an hour
- osm: ~3 days
Documents in Elasticsearch upon completion of load:
- ~66 million
Unique data size on disk:
- ~300GB
- ~600GB with one replica
This is our search endpoint. This is used to search the index for addresses, POIs, etc.
/search?query=brooklyn
/search?query=brooklyn¢er=-74.08,40.77
/search?query=brooklyn&viewbox=-74.08,40.77,-73.9,40.67
/search?query=brooklyn&viewbox=-74.08,40.77,-73.9,40.67
This is an autocomplete suggestion endpoint. It provides search suggestions given text to look up.
/suggest?query=bro
/suggest?query=bro&size=5
This is the reverse geocoding endpoint. It takes lng
and lat
params and
returns GeoJSON corresponding to the given location.
/reverse?lng=1&lat=2
Check out our demo here: http://mapzen.com/pelias
MIT License. See included LICENSE