This repository has been archived by the owner on Apr 16, 2024. It is now read-only.
forked from openeventdata/es-geonames
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
915318b
commit 70c2da6
Showing
7 changed files
with
1,149 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
allCountries.zip | ||
allCountries.txt | ||
.idea | ||
.pyc | ||
*.DS_Store | ||
environment_variables.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# ES-GEONAMES transform and load with logstash | ||
The same work for the python script but with posibilities to create the index based in the pipeline defined by the developer in the file `logstash-pipeline.conf`. This approach is good not only to send the data to ES but also to convert the Geonames CSV data to any [output](https://www.elastic.co/guide/en/logstash/current/output-plugins.html) we need. See at the end of this readme a JSON output sample. This method requires [logstash installed](https://www.elastic.co/guide/en/logstash/current/installing-logstash.html). | ||
|
||
Some changes on the field names were performed, see the grok filter: | ||
|
||
` | ||
%{INT:GeonamesId} %{DATA:Name} %{DATA:ASCIIName} %{DATA:AlternateNames} %{DATA:Latitude} %{DATA:Longitude} %{DATA:FeatureClass} %{DATA:FeatureCode} %{DATA:CountryCode} %{DATA:CountryCode2} %{DATA:Admin1Code} %{DATA:Admin2Code} %{DATA:Admin3Code} %{DATA:Admin4Code} %{DATA:Population} %{DATA:Elevation} %{DATA:DEM} %{DATA:Timezone} %{GREEDYDATA:ModificationDate}" | ||
` | ||
|
||
According the pipeline, In some fields was improved the field type: | ||
|
||
- `AlternateNames`: this list will be transformed from "alternatenames: comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)" | ||
- `CountryCode2`: transformed from "cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters" | ||
|
||
|
||
As well as some new fields based on the codes list http://www.geonames.org/export/codes.html: | ||
|
||
- `FeatureClass` | ||
- `FeatureCode` | ||
- and `CountryCode3`: based on the dict from the script `geonames_elasticsearch_loader.py` | ||
|
||
### Usage | ||
|
||
On the logstash folder: | ||
- Copy `environment_variables.sample.sh` to `environment_variables.sh` with your environment vars. | ||
- Run the index creator `sh create_index.sh` | ||
- Start logstash `logstash -f logstash-pipeline.conf` it will take a ~5 to ~10 minutes to start due big pipeline _(anyways it should take less, elastic should be improving this)_ but as soon it starts it will take just a few minutes to finish the ingest (2K per second in my i7 8G machine). The output log will look like `2017-09-01T12:09:14.354Z %{host} %{message}` as an info output for you to know when it finishes. | ||
- Done, enjoy | ||
|
||
|
||
### Todo | ||
- Fine tune the mappings | ||
- Load [premium data](http://www.geonames.org/products/premium-data-polygons.html) for example places as polygons are very interesting data to load as [ geo-shapes](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/geo-shape.html). | ||
- Dockerize | ||
- Publish some Kibana stats | ||
|
||
|
||
### JSON output sample | ||
If looking for JSON output just comment the output > stout > `# codec => json`, sample: | ||
```javascript | ||
{ | ||
"Timezone":"Asia/Kabul", | ||
"ASCIIName":"Pushtah-ye Amir Kushtah'i", | ||
"Latitude":35.24667, | ||
"FeatureCode":"MT", | ||
"type":"place", | ||
"AlternateNames":[ | ||
"Poshteh-ye Amirkoshteh'i", | ||
"Poshteh-ye Amīrkoshteh’ī", | ||
"Pushtah-ye Amir Kushtah'i", | ||
"Pushtah-ye Amīr Kushtah’ī", | ||
"Pusta-i-Amirkusta'i", | ||
"Pusta-i-Amīṟkusta’i", | ||
"پشتۀ امیر کشته ئی" | ||
], | ||
"Longitude":64.67254, | ||
"FeatureClass":"T", | ||
"DEM":"1847", | ||
"Name":"Pushtah-ye Amīr Kushtah’ī", | ||
"GeonamesId":1424592, | ||
"@timestamp":"2017-09-01T09:17:31.824Z", | ||
"FeatureClassName":"mountain, hill, rock, etc", | ||
"ModificationDate":"2012-01-19T00:00:00.000Z", | ||
"FeatureCodeName":"mountain", | ||
"@version":"1", | ||
"Admin1Code":"07", | ||
"Population":0, | ||
"location":{ | ||
"lon":"64.67254", | ||
"lat":"35.24667" | ||
}, | ||
"CountryCode":"AF", | ||
"CountryCode3":"AFG" | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
echo "Setting env vars..." | ||
source environment_variables.sample.sh | ||
|
||
echo "Downloading Geonames gazetteer..." | ||
#wget http://download.geonames.org/export/dump/allCountries.zip | ||
echo "Unpacking Geonames gazetteer..." | ||
#unzip allCountries.zip | ||
|
||
#echo "Starting Docker container and data volume..." | ||
#sudo docker run -d -p 127.0.0.1:9200:9200 -v $PWD/geonames_index/:/usr/share/elasticsearch/data elasticsearch:5.1.2 | ||
#sleep 10s | ||
|
||
echo "Creating mappings for the fields in the Geonames index:" | ||
echo ${ES_GEONAMES_HOST}${ES_GEONAMES_INDEX} | ||
curl -XPUT ${ES_GEONAMES_HOST}${ES_GEONAMES_INDEX} -H 'Content-Type: application/json' -d @geonames_mapping.json | ||
|
||
echo "Done" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#!/usr/bin/env bash | ||
|
||
export ES_GEONAMES_HOST=http://localhost:9200/ | ||
export ES_GEONAMES_USER=elastic | ||
export ES_GEONAMES_PASSWORD=changeme | ||
export ES_GEONAMES_INDEX=geonames2 | ||
|
||
# dont modify this if you download the files in the default location: | ||
export ES_GEONAMES_FILE=$(pwd)\/allCountries.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
{ | ||
"mappings": { | ||
"place": { | ||
"properties": { | ||
"ModificationDate": { | ||
"type": "date", | ||
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSZ" | ||
}, | ||
"Latitude": { | ||
"type": "double" | ||
}, | ||
"Longitude": { | ||
"type": "double" | ||
}, | ||
"location": { | ||
"type": "geo_point" | ||
}, | ||
"GeonamesId": { | ||
"type": "integer" | ||
} | ||
} | ||
} | ||
} | ||
} |
Oops, something went wrong.