Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pynlpir/Data/FieldDict.pdat 有问题,和我们pynlpir的词性标注信息不对应 #101

Open
dusmart opened this issue Nov 17, 2017 · 0 comments

Comments

@dusmart
Copy link

dusmart commented Nov 17, 2017

  1. pynlpir中的NLPIR包中的 Data/FieldDict.pdat 文件导致了这个问题,建议替换 Data/FieldDict.pdat 为其他版本的 Data/FieldDict.pdat。该文件下载地址:http://ictclas.nlpir.org/upload/20170314140452_ICTCLAS2016%E5%88%86%E8%AF%8D%E7%B3%BB%E7%BB%9F%E4%B8%8B%E8%BD%BD%E5%8C%85.zip

  2. 替换该文件是Issue 20和Issue 82和Issue 98的临时解决办法。长期办法应该是更新POS_MAP,不过费时费力,而且新版的词性标注信息可能过于详细。

  3. 由pip安装的pynlpir受到这个bug影响,由于该bug导致pynlpir难以使用,建议尽快更新pip中的pynlpir,替换文件 Data/FieldDict.pdat 以解决该问题。

测试代码:

import pynlpir
pynlpir.open()
s = '新华社报道:感谢您的开源项目。'
print (pynlpir.segment(s))

使用pip安装的pynlpir执行以上代码的结果是

part of speech not recognized: 'gacn'
[('新华社', None), ('报道', 'verb'), (':', 'punctuation mark'), ('感谢', 'verb'), ('您', 'pronoun'), ('的', 'particle'), ('开源', 'verb'), ('项目', 'noun'), ('。', 'punctuation mark')]

替换文件Data/FieldDict.pdat后执行以上代码的结果是

[('新华社', 'noun'), ('报道', 'verb'), (':', 'punctuation mark'), ('感谢', 'verb'), ('您', 'pronoun'), ('的', 'particle'), ('开源', 'verb'), ('项目', 'noun'), ('。', 'punctuation mark')]
@dusmart dusmart changed the title python端不应该提供分词的粒度控制,应该删除POS_MAP python端应该根据POS_MAP调用C++接口中的NLPIR_SetPOSmap函数 Nov 17, 2017
@dusmart dusmart changed the title python端应该根据POS_MAP调用C++接口中的NLPIR_SetPOSmap函数 We need to call function SetPOSmap in __init__.py. This fix Issue 20 in the right way. Nov 17, 2017
@dusmart dusmart changed the title We need to call function SetPOSmap in __init__.py. This fix Issue 20 in the right way. We need to call function SetPOSmap in __init__.py. And add more license file. This fix Issue 20 in the right way. Nov 17, 2017
@dusmart dusmart changed the title We need to call function SetPOSmap in __init__.py. And add more license file. This fix Issue 20 in the right way. We need to update file pynlpir/Data/FieldDict.pdat. This fix Issue 20 in the right way. Nov 17, 2017
@dusmart dusmart changed the title We need to update file pynlpir/Data/FieldDict.pdat. This fix Issue 20 in the right way. pip安装的pynlpir和develop分支使用了新版的NLPIR的包,但是没有提供新的POS__MAP。 Nov 17, 2017
@dusmart dusmart changed the title pip安装的pynlpir和develop分支使用了新版的NLPIR的包,但是没有提供新的POS__MAP。 pynlpir/Data/FieldDict.pdat 有问题,和我们pynlpir的词性标注信息不对应 Nov 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant