Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About valid and test set result #75

Open
paulthemagno opened this issue Nov 7, 2019 · 1 comment
Open

About valid and test set result #75

paulthemagno opened this issue Nov 7, 2019 · 1 comment

Comments

@paulthemagno
Copy link

I have fine-tuned a BERT-NER model and on the eval_result.txt I got these values:
P=0.608764
R=0.588080
F=0.594982

In my understanding these results come from the dev dataset (valid). While on the test set I got

processed 40982 tokens with 4577 phrases; found: 4645 phrases; correct: 4158.
accuracy:  98.22%; precision:  89.52%; recall:  90.85%; FB1:  90.18
              LOC: precision:  92.54%; recall:  92.54%; FB1:  92.54  1394
             MISC: precision:  81.21%; recall:  82.31%; FB1:  81.76  676
              ORG: precision:  84.54%; recall:  88.56%; FB1:  86.51  1255
              PER: precision:  95.30%; recall:  95.45%; FB1:  95.38  1320

I'd like to understand the mismatch respecting the conll standard evaluation script.

@huanghonggit
Copy link

@paulthemagno Is the accuracy problem solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants