Skip to content

Description of the dataset

RConijn edited this page Apr 4, 2018 · 5 revisions

Introduction

The dataset consist of three corpora (OASTM, PMC, BAWE), annotated by three different tools (AntMover, AWA, AWA3). These tools label the sentences with 0 or more rhetorical moves. The OASTM documents are annotated by all tools. The PMC and BAWE documents are annotated by AntMover and AWA only.

Size of the database

In total there are 5186 documents, spread over the three corpora:

  • 110 OASTM documents
  • 2315 PMC documents
  • 2761 BAWE documents

In total, the documents contain 820,305 sentences. On average there are 158 (S.D. = 156) sentences per document.

EER Diagram

The database consists of six tables, see EER diagram below. EER Diagram

Clone this wiki locally