Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

添加分词规则 #1082

Open
Jacksonary opened this issue Nov 28, 2024 · 1 comment
Open

添加分词规则 #1082

Jacksonary opened this issue Nov 28, 2024 · 1 comment

Comments

@Jacksonary
Copy link

Jacksonary commented Nov 28, 2024

我主要在使用 ik_max_word,也定义了自己的词库,但我想自己定义一些规则来参与到分词行为中,比如我想自定义一个开闭区间{{和}},被他包裹的内容需要被视作一个整体,比如文本"前缀{{待匹配文本}}后缀",可以切出 “待匹配文本” 这个词条,而不是切成 “待匹配” 和 “文本” 这样的内容,或者你也可以按你的规则切,但需要保留“待匹配文本”这个整体,其中开闭区间中的内容是不确定的,所以没法通过词库来定义。简单说有没有方式可以让ik始终认为满足一些条件的文本就一定是一个词条

@kin122
Copy link

kin122 commented Jan 17, 2025

暂时没有做到这么复杂,词典树基本是文本和存储的词典去一一对应

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants