A glitch in WN18RR data #13

navdeepkjohal · 2022-05-04T16:02:47Z

Dear Authors,

I found a little glitch in the WN18RR data updated by you. Although the data/wn18rr/entity.dict mentions 40943 entities, the actual entities which are a part of train.txt files are only 40559. Hence there are 40943-40559 = 384 entities that do not occur in the train.txt data but only are a part of the valid.txt and test.txt data and the model is doing zero-shot inference for these entities at the validation/test time, which might have adversarially affected the performance of your model. For instance, entity id: 14501545, does not occur in train.txt although it has been mentioned in the entities.dict file.

Apologies if I missed something, or my interpretation is wrong.

Best
Navdeep

mnqu · 2022-05-31T16:54:17Z

Thanks for the information!

We also noticed this point, where some entities only appeared in the valid and test sets. But all previous works used this dataset for evaluation, and thus we also used this dataset for fair comparison, also this dataset was not ideal.

Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A glitch in WN18RR data #13

A glitch in WN18RR data #13

navdeepkjohal commented May 4, 2022

mnqu commented May 31, 2022

A glitch in WN18RR data #13

A glitch in WN18RR data #13

Comments

navdeepkjohal commented May 4, 2022

mnqu commented May 31, 2022