-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perplexity.py
& perplexity_mlm.py
#23
Comments
Hi @fengyx8, Thanks for raising the issue! You're right--the For MLMs, we computed pseudo-perplexities. I believe I intended to clean up the pseudo-perplexity evaluation script before adding it to the repository (but had forgotten)! I've just added the rough experiment script we used (250539d). The code is brittle in some places, but it should give you an idea of the logic to evaluate the perplexities for MLMs (see lines 668 -- 708). Note, I didn't implement batched evaluation, so the code should throw an error if you try to run it with a batch size greater than one. Hope this helps! Happy to answer any more questions. |
Thank you for your prompt reply and for updating the code!
Could you advise on how I might accurately reproduce the findings presented in the publication? Thanks again! :) |
Hi @fengyx8 , Just to first confirm, are you using the package versions provided in |
Hi @ncmeade! The package versions I am using are consistent with those specified in
I used the For computing the |
Hi @fengyx8, sorry for the delayed response! Could you try with the The data used for computing the perplexities should have a substantial impact. However, I see with your results you still obtain the same relative ordering of the BERT, INLP, and SentenceDebias. |
Hi, authors!
I tried to reproduce the evaluation results of perplexity on
WikiText-2
over the baseline models such as thebert-base-uncased
,INLP
, andSentence-Debias
, but I failed.For the
bert-base-uncased
, I got the final perplexity:2059538.375
, which is significantly different from the reported result4.469
in your paper. My script is:For the
INLP
, andSentence-Debias
, I failed to get any outputs with the script:By the way, I noticed that in the
batch_jobs/perplexity.sh
, it actually exists two distinct files for evaluating perplextiy:experiments/perplexity.py
andexperiments/perplexity_mlm.py
. Maybe the former file is used for evaluating thegpt
based models, while the latter one is for thebert
based models.There's no
experiments/perplexity_mlm.py
in this repository, and may you share it with us?Thank you for your open source code!
The text was updated successfully, but these errors were encountered: