Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional paper in Readme and several earlier changes #1

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
26 changes: 23 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,37 @@ Sense-annotated corpora from the Semantic Processing Across Domains project. Thi

This repository contains three main folders:

1. `supersenses` constains the all-words supersense-annotated corpus.
1. `supersenses` contains the all-words supersense-annotated corpus.
2. It contains a folder `official_distribution` with the files used for training and testing in the noted articles, and a folder `all_annotations` with all the annotations generated by each annotator, previous to adjucation.
3. It is made up of six domains from the ClarinDK corpus plus the test section of the Danish Dependency Treebank (`DDT`).
1. `lexicalsample` constains the lexical-sample annotations for a regular, dictionary based sense inventory, and for a supersense-clustered inventory.
1. `active_learning` constains the resulting annotation of "Active Learning for Sense Annotation".
1. `lexicalsample` contains the lexical-sample annotations for a regular, dictionary based sense inventory, and for a supersense-clustered inventory.
1. `active_learning` contains the resulting annotation of "Active Learning for Sense Annotation".



The following publications make use or document the construction of this resource.

```
@inproceedings{pedersen-etal-2016-semdax,
title = "The {S}em{D}a{X} Corpus ― Sense Annotations with Scalable Sense Inventories",
author = "Pedersen, Bolette and
Braasch, Anna and
Johannsen, Anders and
Alonso, H{\'e}ctor Mart{\'\i}nez and
Nimb, Sanni and
Olsen, Sussi and
S{\o}gaard, Anders and
S{\o}rensen, Nicolai Hartvig",
booktitle = "Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}'16)",
month = may,
year = "2016",
address = "Portoro{\v{z}}, Slovenia",
publisher = "European Language Resources Association (ELRA)",
url = "https://aclanthology.org/L16-1136",
pages = "842--847",
abstract = "We launch the SemDaX corpus which is a recently completed Danish human-annotated corpus available through a CLARIN academic license. The corpus includes approx. 90,000 words, comprises six textual domains, and is annotated with sense inventories of different granularity. The aim of the developed corpus is twofold: i) to assess the reliability of the different sense annotation schemes for Danish measured by qualitative analyses and annotation agreement scores, and ii) to serve as training and test data for machine learning algorithms with the practical purpose of developing sense taggers for Danish. To these aims, we take a new approach to human-annotated corpus resources by double annotating a much larger part of the corpus than what is normally seen: for the all-words task we double annotated 60{\%} of the material and for the lexical sample task 100{\%}. We include in the corpus not only the adjucated files, but also the diverging annotations. In other words, we consider not all disagreement to be noise, but rather to contain valuable linguistic information that can help us improve our annotation schemes and our learning algorithms.",
}

@inproceedings{olsenetal2015,
title={Coarse-Grained Sense Annotation of Danish across Textual Domains},
author={Olsen, Sussi and Pedersen, Bolette Sandford Mart{\i}nez Alonso, H{\'e}ctor and Johannsen, Anders},
Expand Down
2 changes: 2 additions & 0 deletions changelog.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
2019-01-07
Added SemDaX supersenses converted for ELEXIS.
Loading