Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

分词结果不一致 #120

Open
piekey1994 opened this issue Dec 4, 2018 · 0 comments
Open

分词结果不一致 #120

piekey1994 opened this issue Dec 4, 2018 · 0 comments

Comments

@piekey1994
Copy link

我有个文件,逐行进行分词,刚好第1595个句子分词后的list长度是48个词

如果我直接读第1595个句子进行分词,长度就是50个词

打印彼此的结果,会发现直接对这个句子进行分词的话,有个人名莱迪格会被分词成三个字。但逐行逐行的进行分词,到那一句的时候,莱迪格就不会被拆分成三个字,所以会少两个词。

为什么分词的结果会有出入呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant