Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用ik分词搜索 id:xiaoming666888 搜不到,但搜索id:xiaoming and id:666888 是可以搜索到的 #258

Open
kubbo opened this issue Aug 9, 2016 · 3 comments

Comments

@kubbo
Copy link

kubbo commented Aug 9, 2016

id 字段使用 ik 分词,对 于 xiaoming666888 在 ik_max_word 分词器下分词效果如下:
$curl "http://localhost/_analyze?analyzer=ik_max_word&pretty&text=xiaoming666888"

{
  "tokens" : [ {
    "token" : "xiaoming666888",
    "start_offset" : 0,
    "end_offset" : 14,
    "type" : "LETTER",
    "position" : 0
  }, {
    "token" : "xiaoming",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "666888",
    "start_offset" : 8,
    "end_offset" : 14,
    "type" : "ARABIC",
    "position" : 2
  } ]
}

对于查询请求:

POST /test/_search
{ "filter" : {
            "and" : [
                {
                    "term" : { "id" : "xiaoming" }
                },
                {
                    "term":{"id":"666888"}
                }
            ]
        }}

上面是可以召回 xiaoming666888 但下面的查询却不能:

GET /test/_search?q=id:'xiaoming666888'&default_operator=AND&analyzer=ik_max_word

ik_max_word 不应该将 id 拆分成 xiaoming 与 666888 通过 and 进行过滤 么? 请问上面两个查询 语句在 ES 里是否有区别 ?

@nathan-zhu
Copy link

nathan-zhu commented Aug 15, 2016

这个我也遇到过,数字和拼音或汉字貌似是会被分开的,加了特定分词也不行,仍在研究如何解决。
或者把xiaoming666888作为短语类型搜索下看看

@ScsUndefined
Copy link

@nathan-zhu
英文和阿拉伯数字以及汉字都是属于不同的书写体,有的分词器,比如 icu ,startdard 都是基于 uncode 来切分问题,所以是不是和书写体有关?

@medcl
Copy link
Member

medcl commented Jan 5, 2017

我在本地没有重现呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants