Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A glitch in WN18RR data #13

Open
navdeepkjohal opened this issue May 4, 2022 · 1 comment
Open

A glitch in WN18RR data #13

navdeepkjohal opened this issue May 4, 2022 · 1 comment

Comments

@navdeepkjohal
Copy link

Dear Authors,

I found a little glitch in the WN18RR data updated by you. Although the data/wn18rr/entity.dict mentions 40943 entities, the actual entities which are a part of train.txt files are only 40559. Hence there are 40943-40559 = 384 entities that do not occur in the train.txt data but only are a part of the valid.txt and test.txt data and the model is doing zero-shot inference for these entities at the validation/test time, which might have adversarially affected the performance of your model. For instance, entity id: 14501545, does not occur in train.txt although it has been mentioned in the entities.dict file.

Apologies if I missed something, or my interpretation is wrong.

Best
Navdeep

@mnqu
Copy link
Collaborator

mnqu commented May 31, 2022

Thanks for the information!

We also noticed this point, where some entities only appeared in the valid and test sets. But all previous works used this dataset for evaluation, and thus we also used this dataset for fair comparison, also this dataset was not ideal.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants