美文网首页
七、安装IK分词器

七、安装IK分词器

作者: 茶铺里的水 | 来源:发表于2017-11-22 18:29 被阅读25次
为什么要用IK分词器?
1. 不使用IK分词器

执行如下URL
http://localhost:9200/_analyze?text=中华人民共和国国歌

结果如下:

{
  "tokens": [
    {
      "token": "中",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<IDEOGRAPHIC>",
      "position": 0
    },
    {
      "token": "华",
      "start_offset": 1,
      "end_offset": 2,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "人",
      "start_offset": 2,
      "end_offset": 3,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "民",
      "start_offset": 3,
      "end_offset": 4,
      "type": "<IDEOGRAPHIC>",
      "position": 3
    },
    {
      "token": "共",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<IDEOGRAPHIC>",
      "position": 4
    },
    {
      "token": "和",
      "start_offset": 5,
      "end_offset": 6,
      "type": "<IDEOGRAPHIC>",
      "position": 5
    },
    {
      "token": "国",
      "start_offset": 6,
      "end_offset": 7,
      "type": "<IDEOGRAPHIC>",
      "position": 6
    },
    {
      "token": "国",
      "start_offset": 7,
      "end_offset": 8,
      "type": "<IDEOGRAPHIC>",
      "position": 7
    },
    {
      "token": "歌",
      "start_offset": 8,
      "end_offset": 9,
      "type": "<IDEOGRAPHIC>",
      "position": 8
    }
  ]
}
2. 使用IK分词器

执行如下URL
http://localhost:9200/_analyze?analyzer=ik_max_word&text=中华人民共和国国歌

结果如下:

{
  "tokens": [
    {
      "token": "中华人民共和国",
      "start_offset": 0,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "中华人民",
      "start_offset": 0,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "中华",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "华人",
      "start_offset": 1,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "人民共和国",
      "start_offset": 2,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 4
    },
    {
      "token": "人民",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 5
    },
    {
      "token": "共和国",
      "start_offset": 4,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 6
    },
    {
      "token": "共和",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 7
    },
    {
      "token": "国",
      "start_offset": 6,
      "end_offset": 7,
      "type": "CN_CHAR",
      "position": 8
    },
    {
      "token": "国歌",
      "start_offset": 7,
      "end_offset": 9,
      "type": "CN_WORD",
      "position": 9
    }
  ]
}
IK分词器安装
  1. 下载IK分词器压缩包
  2. 到es的bin目录下执行

.\plugin install file:/D:/elasticsearch-analysis-ik-1.10.6.zip

  1. 需要重新生成索引
  2. 使用JavaAPI创建mapping
    @Test
    public void createMapping()throws Exception{
        client.admin().indices().prepareCreate(index).execute().actionGet();
        XContentBuilder builder=XContentFactory.jsonBuilder()
                .startObject()
                .startObject(type)
                .startObject("properties")
                .startObject("articleTitle").field("type", "string").field("store", "yes").field("analyzer","ik").field("index","analyzed").endObject()
                .startObject("articleContent").field("type", "string").field("store", "yes").field("analyzer","ik").field("index","analyzed").endObject()
                .endObject()
                .endObject()
                .endObject();
        PutMappingRequest mapping = Requests.putMappingRequest(index).type(type).source(builder);
        client.admin().indices().putMapping(mapping).actionGet();
    }

相关文章

网友评论

      本文标题:七、安装IK分词器

      本文链接:https://www.haomeiwen.com/subject/jryovxtx.html