Description of the dataset

Introduction

The dataset consist of three corpora (OASTM, PMC, BAWE), annotated by three different tools (AntMover, AWA, AWA3). These tools label the sentences with 0 or more rhetorical moves. The OASTM documents are annotated by all tools. The PMC and BAWE documents are annotated by AntMover and AWA only.

Size of the database

In total there are 5186 documents, spread over the three corpora:

110 OASTM documents
2315 PMC documents
2761 BAWE documents

In total, the documents contain 820,305 sentences. On average there are 158 (S.D. = 156) sentences per document.

EER Diagram

The database consists of six tables, see EER diagram below.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description of the dataset

Introduction

Size of the database

EER Diagram

Clone this wiki locally