-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #10 from mdredze/jack
Carmen 2.0
- Loading branch information
Showing
41 changed files
with
521,859 additions
and
132 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Carmen | ||
|
||
A Python version of [Carmen](https://github.com/mdredze/carmen), | ||
a library for geolocating tweets. | ||
|
||
Given a tweet, Carmen will return `Location` objects that represent a | ||
physical location. | ||
Carmen uses both coordinates and other information in a tweet to make | ||
geolocation decisions. | ||
It's not perfect, but this greatly increases the number of geolocated | ||
tweets over what Twitter provides. | ||
|
||
To install, simply run: | ||
|
||
$ python setup.py install | ||
|
||
To run the Carmen frontend, see: | ||
|
||
$ python -m carmen.cli --help | ||
|
||
### Geonames Mapping | ||
|
||
Alternatively, `locations.json` can be swapped out to use Geonames IDs | ||
instead of arbitrary IDs used in the original version of Carmen. This | ||
JSON file can be found in `carmen/data/new.json`. | ||
|
||
Below are instructions on how mappings can be generated. | ||
|
||
First, we need to get the data. This can be found at | ||
http://download.geonames.org/export/dump/. The required files are | ||
`countryInfo.txt`, `admin1CodesASCII.txt`, `admin2Codes.txt`, and | ||
`cities1000.txt`. Download these files and move them into | ||
`carmen/data/dump/`. | ||
|
||
Next, we need to format our data. We can simply delete the comments in | ||
`countryInfo.txt`. Afterwards, run the following. | ||
|
||
$ python3 format_admin1_codes.py | ||
$ python3 format_admin2_codes.py | ||
|
||
Then, we need to set up a PostgreSQL database, as this allows finding | ||
relations between the original Carmen IDs and Geonames IDs significantly | ||
easier. To set up the database, create a PostgreSQL database named `carmen` | ||
and reun the following SQL script: | ||
|
||
$ psql -f carmen/sql/populate_db.sql carmen | ||
|
||
Now we can begin constructing the mappings from Carmen IDs to | ||
Geonames IDs. Run the following scripts. | ||
|
||
$ python3 map_cities.py > ../mappings/cities.txt | ||
$ python3 map_regions.py > ../mappings/regions.txt | ||
|
||
With the mappings constructed, we can finally attempt to convert the | ||
`locations.json` file into one that uses Geonames IDs. To do this, run | ||
the following. | ||
|
||
$ python3 rewrite_json.py | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
"""Carmen, a library for geolocating tweets.""" | ||
|
||
__version__ = '0.0.3' | ||
__version__ = '0.0.4' | ||
|
||
from .resolver import get_resolver |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.