-
Notifications
You must be signed in to change notification settings - Fork 14
RollerEtAl_EMNLP2012
This page explains the process of replicating the results of:
Stephen Roller, Michael Speriosu, Sarat Rallapalli, Benjamin Wing and Jason Baldridge. Supervised Text-based Geolocation Using Language Models on an Adaptive Grid. EMNLP 2012. Jeju, Korea.
The first step is to get the code. Check out or download the code from
https://github.com/utcompling/textgrounder/commits/emnlp-release-candidate-same-results
You'll need to set up your environment as per the directions (step 2-3 in README.txt). Specifically, you must set the $TEXTGROUNDER_DIR
variable to the root of the textgrounder source code, and add $TEXTGROUNDER_DIR/bin
to your $PATH
variable.
Next you'll need the data. For Geotext and Wikipedia, follow step 4 in README.txt
For the UtGeo data set, follow the README.txt in
http://www.cs.utexas.edu/~roller/research/kd/corpus/
As suggested by this document, it is highly encouraged you contact the first author ([email protected]) when you begin this process, as obtaining the full data set may be difficult.
Run textgrounder build-all
from the $TEXTGROUNDER_DIR
directory.
To run the program, you'll need
$ textgrounder -memory 30g geolocate-document --corpus $PATH_TO_CORPUS/$CORPUS_NAME (--kd| --kdbs $BUCKET_SIZE --kdsm (median|halfway) --cm (center|centroid) --eval-set (dev|test)
where median/halfway correspond to the Friedman/Midpoint methods of splitting.
For example, to run on UtGeo large and evaluate on the dev set, using only a KD tree bucket size of 500; Friedman splitting; and centroid cell prediction, I personally use:
$ textgrounder -memory 30g geolocate-document --corpus $SCRATCH/corpora/utgeo-large --kd --kdbs 500 --kdsm median --cm centroid --eval-set dev
Your settings will vary depending exactly on your setup and which method you wish to test.
Please contact Stephen Roller [email protected] for any questions pertaining to replicating results. This program can take some effort to get up and running, and so please feel free to ask for help.