data format for labels #1

gabriben · 2021-01-14T13:17:51Z

Hi,

I'd like to use your dataset to reproduce some results in the ML-NET paper, but I am having trouble understanding how the label text files should be read.

Thank you

sb895 · 2021-03-04T11:20:13Z

Hi Gabriben,

Apologies, only just saw your message.

There are two folders, labels and text.

the "text" contains files that have PubMed Abstracts, split one sentence per line (already tokenized). The file names are the PubMed IDs.

The "labels" contains corresponding labels for each text file (both will be named with the same PubMed ID).
The file format is as follows: they contain multiple labels per sentence.

The are sentence labels are separated by "<", and the multi-labels for each sentence is separated by "AND".

Hope that helps. let me know otherwise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data format for labels #1

data format for labels #1

gabriben commented Jan 14, 2021

sb895 commented Mar 4, 2021

data format for labels #1

data format for labels #1

Comments

gabriben commented Jan 14, 2021

sb895 commented Mar 4, 2021