py_heideltime is a python wrapper for the multilingual temporal tagger HeidelTime.
For more information about this temporal tagger, please visit the Heideltime Java standalone version: https://github.com/HeidelTime/heideltime
This wrapper has been developed by Jorge Mendes under the supervision of Professor Ricardo Campos in the scope of the Final Project of the Computer Science degree at the Polytechnic Institute of Tomar, Portugal.
Although there already exist some python models for Heideltime (in particular https://github.com/amineabdaoui/python-heideltime) all of them require a considerable intervention from the user side. In this project, we aim to overcome some of these limitations. Our aim was four-fold:
- To provide a multi-platform (windows, Linux, Mac Os);
- To make it user friendly not only in terms of installation but also in its usage;
- To make it lightweight without compromising its behavior;
- To give the possibility to choose the granularity of extracted dates.
In order to use py_heideltime you must have java JDK and perl installed in your machine for heideltime dependencies.
pip install git+https://github.com/JMendes1995/py_heideltime.git
If your user does not have permission executions on python lib folder, you should execute the following command:
sudo chmod 111 /usr/local/lib/<YOUR PYTHON VERSION>/dist-packages/py_heideltime/HeidelTime/TreeTaggerLinux/bin/*
from py_heideltime import heideltime
text = '''
Thurs August 31st - News today that they are beginning to evacuate the London children tomorrow. Percy is a billeting officer. I can't see that they will be much safer here.
'''
heideltime(text, language='English')
[('XXXX-08-31', 'August 31st'), ('PRESENT_REF', 'today'), ('XXXX-XX-XX', 'tomorrow')]
heideltime(text, language='English', document_type='news', document_creation_time='1939-08-31')
[('1939-08-31', 'August 31st'), ('1939-08-31', 'today'), ('1939-09-01', 'tomorrow')]
py_heideltime --help
Usage_examples: py_heideltime -t "August 31st" -l "English" or
py_heideltime -t "August 31st" -l "English" -td "News" -dct "1939-08-31"
Options:
-t, --text TEXT insert text, text should be surrounded by
quotes “” (e.g., “Thurs August 31st”)
-l, --language TEXT [required] Language text is required and
should be surrounded by quotes “”. Options:
English, Portuguese, Spanish, Germany,
Dutch, Italian, French (e.g., “English”).
[required]
-dg, --date_granularity TEXT Value of granularity should be surrounded by
quotes “”. Options: Year, Month, day (e.g.,
“Year”).
-dt, --document_type TEXT Type of the document text should be
surrounded by quotes “”. Options: “News” :
news-style documents; “Narrative” :
narrative-style documents (e.g., Wikipedia
articles); “Colloquial” : English colloquial
(e.g., Tweets and SMS); “Scientific” :
scientific articles (e.g., clinical trails)
-dct, --document_creation_time TEXT
Document creation date in the format YYYY-
MM-DD should be surrounded by quotes (e.g.,
“2019-05-30”). Note that this date will only
be taken into account when News or
Colloquial texts are specified.
-i, --input_file TEXT text path should be surrounded by quotes
(e.g., “text.txt”)
--help Show this message and exit.
This module is prepared to work with the following languages: English, Portuguese, Spanish, Germany, Dutch, Italian, French.
To use py_heideltime with other languages proceed as follows:
- Download from TreeTagger the parameter files
- gunzip < Downloaded file >
- Copy the extracted file to the module folder /py_heideltime/HeidelTime/TreeTagger< your system >/lib/
If you use HeidelTime (either through this package or another one) please cite the appropriate paper. In general, this would be:
Strötgen, Gertz: Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 2013. pdf bibtex
Other related papers may be found here:
https://github.com/HeidelTime/heideltime#Publications
Please check Time-Matters if you are interested in detecting the relevance (score) of dates in a text.