搜索关键字“珀”，在ik_smart模式下缺找不到含有“琥珀”二字的文档 #271

t0ny-peng · 2016-09-05T02:09:52Z

复现很简单，在某个索引的某个字段中创建一个type: string, analyzer: ik_smart的字段，假设名为description。然后填入这么一条数据：主要经营缅甸琥珀蜜蜡各类产品翡翠各类成品及半成品 18k金镶嵌成品低中高价位齐全产品款式大量库存。
通过ik_smart分词器查看其分词结果：（截取部分）

...
{
      "token": "琥珀",
      "start_offset": 6,
      "end_offset": 8,
      "type": "CN_WORD",
      "position": 3
},
...

可以看出琥珀被分为一个词了。那么再进行以下查询，仅查找“珀”字：

{
    "query": {
        "match": {
            "feature": "珀"
        }
    }
}

结果缺找不到该字。我认为原因应该是，ik_smart将“琥珀”认为是一个词后，为这个词进行了索引。那么搜索“珀”字自然找不到这条数据。
通过测试，发现自带standard分词器是可以通过搜索“珀”字，搜索到含有“琥珀”的数据的。原因明显是因为standard分词器会拆分每个汉字。

不知道各位都怎么解决这个问题，谢谢。

The text was updated successfully, but these errors were encountered:

medcl · 2016-09-05T03:05:01Z

试试ik_max_word，要么就加使用一个更细的分词的字段

t0ny-peng · 2016-09-05T03:11:27Z

@medcl 谢谢回复，我试过了ik_max_word，但是还是将“琥珀”分成了一个词，并没有更细粒度的拆分。
请问你指的“再加一个更细的分词的字段”，是将同一数据另生成一个使用默认分词器的字段是么？可是这样的话，就没办法体现ik的强力功能了。不知是我理解有误么？😐

t0ny-peng · 2016-09-05T03:12:17Z

这确实是个很矛盾的事情，一方面想尽可能合理的进行分词，一方面想搜索单字。

medcl · 2016-09-05T03:24:57Z

可以使用multifield，一个字段一个分词不可能都实现你的需求

t0ny-peng · 2016-09-05T03:32:04Z

好的我去学习一下，谢谢。

ScsUndefined · 2016-09-23T12:33:58Z

字典里加个“珀“？

dcais · 2017-01-06T05:39:20Z

推荐创建索引的时候尽量分词分细一点。比如analyzer设置成ik_max_word. 然后搜索的时候用ik_smart.

t0ny-peng · 2017-01-14T15:04:59Z

@davidcai19840412 that's what I'm doing now

xupengrun · 2018-02-28T07:14:16Z

这个问题有个简单的方法，扩展词典中加入单字词典，索引分词采用ik_max_word模式会将单字索引，search_analyzer采用ik_smart,可以解决这个单字漏索引的问题

robzai · 2018-03-28T23:55:18Z

@xupengrun 请问单字词典可以在哪里找？有没有详细的教程？谢谢

medcl · 2018-04-02T02:22:53Z

https://github.com/medcl/elasticsearch-analysis-ik/blob/master/config/extra_single_word_low_freq.dic
https://github.com/medcl/elasticsearch-analysis-ik/blob/master/config/extra_single_word_full.dic
https://github.com/medcl/elasticsearch-analysis-ik/blob/master/config/extra_single_word.dic
@robzai 这几个都是

medcl mentioned this issue Jan 5, 2017

[ 晋玮经商务咨询（上海）有限公司 ] 的ik_max_word分词结果有问题 #311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

搜索关键字“珀”，在ik_smart模式下缺找不到含有“琥珀”二字的文档 #271

搜索关键字“珀”，在ik_smart模式下缺找不到含有“琥珀”二字的文档 #271

t0ny-peng commented Sep 5, 2016 •

edited

Loading

medcl commented Sep 5, 2016

t0ny-peng commented Sep 5, 2016

t0ny-peng commented Sep 5, 2016

medcl commented Sep 5, 2016

t0ny-peng commented Sep 5, 2016

ScsUndefined commented Sep 23, 2016

dcais commented Jan 6, 2017

t0ny-peng commented Jan 14, 2017

xupengrun commented Feb 28, 2018

robzai commented Mar 28, 2018

medcl commented Apr 2, 2018

搜索关键字“珀”，在ik_smart模式下缺找不到含有“琥珀”二字的文档 #271

搜索关键字“珀”，在ik_smart模式下缺找不到含有“琥珀”二字的文档 #271

Comments

t0ny-peng commented Sep 5, 2016 • edited Loading

medcl commented Sep 5, 2016

t0ny-peng commented Sep 5, 2016

t0ny-peng commented Sep 5, 2016

medcl commented Sep 5, 2016

t0ny-peng commented Sep 5, 2016

ScsUndefined commented Sep 23, 2016

dcais commented Jan 6, 2017

t0ny-peng commented Jan 14, 2017

xupengrun commented Feb 28, 2018

robzai commented Mar 28, 2018

medcl commented Apr 2, 2018

t0ny-peng commented Sep 5, 2016 •

edited

Loading