为什么要用IK分词器?
1. 不使用IK分词器
执行如下URL
http://localhost:9200/_analyze?text=中华人民共和国国歌
结果如下:
{
"tokens": [
{
"token": "中",
"start_offset": 0,
"end_offset": 1,
"type": "<IDEOGRAPHIC>",
"position": 0
},
{
"token": "华",
"start_offset": 1,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position": 1
},
{
"token": "人",
"start_offset": 2,
"end_offset": 3,
"type": "<IDEOGRAPHIC>",
"position": 2
},
{
"token": "民",
"start_offset": 3,
"end_offset": 4,
"type": "<IDEOGRAPHIC>",
"position": 3
},
{
"token": "共",
"start_offset": 4,
"end_offset": 5,
"type": "<IDEOGRAPHIC>",
"position": 4
},
{
"token": "和",
"start_offset": 5,
"end_offset": 6,
"type": "<IDEOGRAPHIC>",
"position": 5
},
{
"token": "国",
"start_offset": 6,
"end_offset": 7,
"type": "<IDEOGRAPHIC>",
"position": 6
},
{
"token": "国",
"start_offset": 7,
"end_offset": 8,
"type": "<IDEOGRAPHIC>",
"position": 7
},
{
"token": "歌",
"start_offset": 8,
"end_offset": 9,
"type": "<IDEOGRAPHIC>",
"position": 8
}
]
}
2. 使用IK分词器
执行如下URL
http://localhost:9200/_analyze?analyzer=ik_max_word&text=中华人民共和国国歌
结果如下:
{
"tokens": [
{
"token": "中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
{
"token": "中华人民",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "中华",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 2
},
{
"token": "华人",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 3
},
{
"token": "人民共和国",
"start_offset": 2,
"end_offset": 7,
"type": "CN_WORD",
"position": 4
},
{
"token": "人民",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 5
},
{
"token": "共和国",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
},
{
"token": "共和",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 7
},
{
"token": "国",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 8
},
{
"token": "国歌",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 9
}
]
}
IK分词器安装
- 下载IK分词器压缩包
- 到es的bin目录下执行
.\plugin install file:/D:/elasticsearch-analysis-ik-1.10.6.zip
- 需要重新生成索引
- 使用JavaAPI创建mapping
@Test
public void createMapping()throws Exception{
client.admin().indices().prepareCreate(index).execute().actionGet();
XContentBuilder builder=XContentFactory.jsonBuilder()
.startObject()
.startObject(type)
.startObject("properties")
.startObject("articleTitle").field("type", "string").field("store", "yes").field("analyzer","ik").field("index","analyzed").endObject()
.startObject("articleContent").field("type", "string").field("store", "yes").field("analyzer","ik").field("index","analyzed").endObject()
.endObject()
.endObject()
.endObject();
PutMappingRequest mapping = Requests.putMappingRequest(index).type(type).source(builder);
client.admin().indices().putMapping(mapping).actionGet();
}
网友评论