-
Notifications
You must be signed in to change notification settings - Fork 2
Description of the dataset
RConijn edited this page Apr 4, 2018
·
5 revisions
The dataset consist of three corpora (OASTM, PMC, BAWE), annotated by three different tools (AntMover, AWA, AWA3). These tools label the sentences with 0 or more rhetorical moves. The OASTM documents are annotated by all tools. The PMC and BAWE documents are annotated by AntMover and AWA only.
In total there are 5186 documents, spread over the three corpora:
- 110 OASTM documents
- 2315 PMC documents
- 2761 BAWE documents
In total, the documents contain 820,305 sentences. On average there are 158 (S.D. = 156) sentences per document.
The database consists of six tables, see EER diagram below.