Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Parser results #2

Open
dhaivat1729 opened this issue Feb 8, 2022 · 2 comments
Open

Reproducing Parser results #2

dhaivat1729 opened this issue Feb 8, 2022 · 2 comments
Assignees

Comments

@dhaivat1729
Copy link

dhaivat1729 commented Feb 8, 2022

Hi team,

Thank you for this amazing work and making the code available online. I was able to train the tagger and parser, and could gather results for both of them. For Parser, I was able to train and evaluate the model but I could only get LAS and UAS metrics. Would it be possible to share a way to get precision, recall and F1 score for the parser? I built my own evaluation script however, I couldn't get the numbers to match with reported numbers. So I may be doing something wrong there.

I also wanted to clarify something in this file. In lines 224, 239, 272 and 292, we are writing an ID of the token to the file. However, for each recipe, the token must start from 1 according to CoNLL-U notations. But, when I look at the generated file, the IDs go from 1 to 3827 for the tagger. If we feed this generated tagger results to the Parser, it will treat entire file as a single recipe. So we may have to reset ID variable i in read_prediction.py for each recipe. Let me know if there is something wrong in my understanding.

@alexanderkoller
Copy link
Contributor

@TheresaSchmidt Can you take this one?

@dhaivat1729
Copy link
Author

@TheresaSchmidt would it be possible to share all the folds used for training? Since the table reports numbers of Yamakata'20, is it safe to assume that 10-fold validation is performed? I would like to ensure that the 1 fold available on GitHub is not biased, resulting in performance drop. I am still assuming that I am probably doing something wrong with my evaluation. So any direction to calculate Precision, recall and F1 for the parser will be highly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants