We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我有个文件,逐行进行分词,刚好第1595个句子分词后的list长度是48个词
如果我直接读第1595个句子进行分词,长度就是50个词
打印彼此的结果,会发现直接对这个句子进行分词的话,有个人名莱迪格会被分词成三个字。但逐行逐行的进行分词,到那一句的时候,莱迪格就不会被拆分成三个字,所以会少两个词。
为什么分词的结果会有出入呢
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我有个文件,逐行进行分词,刚好第1595个句子分词后的list长度是48个词
如果我直接读第1595个句子进行分词,长度就是50个词
打印彼此的结果,会发现直接对这个句子进行分词的话,有个人名莱迪格会被分词成三个字。但逐行逐行的进行分词,到那一句的时候,莱迪格就不会被拆分成三个字,所以会少两个词。
为什么分词的结果会有出入呢
The text was updated successfully, but these errors were encountered: