Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotation inconformity problems #16

Open
alexnotes opened this issue Nov 3, 2018 · 4 comments
Open

annotation inconformity problems #16

alexnotes opened this issue Nov 3, 2018 · 4 comments

Comments

@alexnotes
Copy link

There exist some false annotation problems. The annotation in weiboNER_2nd_conll.* files is consistent with weiboNER.conll.*. It seems like that there exist tens of falsely annotated problems. Can you fix it?

Besides, Can you provide the word level annotated data? It should look like that:

口腔	O
溃疡	O
加上	O
这	O
玩意	O
~	O
酸酸	O
甜甜	O
好	O
滋味	O
。	O

What's more, can you provide the annotated data with conll 2003 format? The second column is POS column. Like this:

美国 NS -X- B-LOC
总统 N -X- O
特使 N -X- O
格尔巴德 NR -X- B-PER
表示 V -X- O
, W -X- O
他 R -X- O
同 P -X- O
克罗地亚 NS -X- B-LOC
总统 N -X- O
图季曼 NR -X- B-PER
的 U -X- O
会谈 VN -X- O
是 V -X- O
“ W -X- O
积极 A -X- O
和 C -X- O
有益 A -X- O
的 U -X- O
” W -X- O
, W -X- O
@alexnotes
Copy link
Author

In annotated data, there exist unrecognized characters. For example, line 1875 in weiboNER_2nd_conll.dev file.

@VioletPeng
Copy link
Collaborator

We don't have these annotations. You can generate word segmentation and POS using any software you like.

@alexnotes
Copy link
Author

For the other two questions, can you fix it?

@VioletPeng
Copy link
Collaborator

Hi @kisslotus,

Thank you for pointing out the possible missing annotations in the data. We collect the data using crowdsourcing, although we have 3 people annotate the same sentence, local mistakes are still inevitable. We encourage you to make corrections to the dataset and send pull request. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants