Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data format for labels #1

Open
gabriben opened this issue Jan 14, 2021 · 1 comment
Open

data format for labels #1

gabriben opened this issue Jan 14, 2021 · 1 comment

Comments

@gabriben
Copy link

Hi,

I'd like to use your dataset to reproduce some results in the ML-NET paper, but I am having trouble understanding how the label text files should be read.

Thank you

@sb895
Copy link
Owner

sb895 commented Mar 4, 2021

Hi Gabriben,

Apologies, only just saw your message.

There are two folders, labels and text.

the "text" contains files that have PubMed Abstracts, split one sentence per line (already tokenized). The file names are the PubMed IDs.

The "labels" contains corresponding labels for each text file (both will be named with the same PubMed ID).
The file format is as follows: they contain multiple labels per sentence.

The are sentence labels are separated by "<", and the multi-labels for each sentence is separated by "AND".

Hope that helps. let me know otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants