-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
26 lines (21 loc) · 1.37 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
README
______
TALC-sef is a pos-TAgged Literary Corpus, in Serbian, English and French developed at Université d'Artois and Université Lille 3.
In this corpus, >830,000 Serbian tokens were tagged with BTagger (Gesmundo & Samardzic, 2012), based on a reference corpus of >100,000 manually revised tokens. Tagging accuracy, with our ad hoc tagset (43 tags), and without lemmatization, is over 94% on average.
The corpus is described in our paper: http://www.lrec-conf.org/proceedings/lrec2014/summaries/755.html.
Should you use the tagging models provided, or any other file from the TALC-sef project, please cite:
@InProceedings{BALVET14.755,
author = {Antonio Balvet and Dejan Stosic and Aleksandra Miletic},
title = {TALC-Sef a Manually-revised POS-Tagged Literary Corpus in Serbian, English and French},
booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
year = {2014},
month = {may},
date = {26-31},
address = {Reykjavik, Iceland},
editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
publisher = {European Language Resources Association (ELRA)},
isbn = {978-2-9517408-8-4},
language = {english}
}
____________________________________
A. Balvet, D. Stosic and A. Miletic.