Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issues with HF GPT2 Models #4

Open
aitorormazabal opened this issue Nov 24, 2022 · 0 comments
Open

Potential issues with HF GPT2 Models #4

aitorormazabal opened this issue Nov 24, 2022 · 0 comments

Comments

@aitorormazabal
Copy link

Hello,

I am using the GPT2 models available in HF, and running into a few issues. Firstly, there seems to be an issue with the tokenizer. Trying to calculate perplexity using the evaluate module, as follows:

from evaluate import load
perplexity = load("perplexity", module_type="metric")
results = perplexity.compute(predictions=["Hola, como estas?"], model_id="PlanTL-GOB-ES/gpt2-base-bne", device="cpu")

Gives the following error:

 ...
  File "/ikerlariak/aormazabal024/PhD/Poetry-Generation/demo/poetry-env-traganarru/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self`

This seems to be related to the special tokens for <pad>, <s>, </s> and <unk> not being properly set (but are used by the evaluate module), as the only special token added in the tokenizer is <|endoftext|>. One can manually fix it for the local snapshot:

tokenizer.pad_token = '<pad>'
tokenizer.bos_token = '</s>'
tokenizer.eos_token = '</s>'
tokenizer.unk_token = '<unk>'
tokenizer.save_pretrained('[snapshot-path]')

However, even after fixing this, I am getting quite high perplexities compared to the 10-13 reported in the paper for all sentences I am trying (assuming per-word-perplexity is reported). Is it possible there was an issue when converting from fairseq to HF, and are the original fairseq models available somewhere to compare? Or maybe I am making a mistake when calculating the ppl, was there any tokenization done to the text apart from BPE (i.e. replacing newlines with , which is pretty standard in fairseq)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant