ES中安装中文/拼音分词器（IK+pinyin）

作者: nextbang | 来源:发表于2017-01-18 14:58 被阅读3519次

ES作为最强大的全文检索工具（没有之一），中英文分词几乎是必备功能，下面简单说明下分词器安装步骤（详细步骤网上很多，本文只提供整体思路和步骤）：

1. 下载中文/拼音分词器

IK中文分词器：https://github.com/medcl/elasticsearch-analysis-ik
拼音分词器：https://github.com/medcl/elasticsearch-analysis-pinyin
(竟然都是同一个作者的杰作，还有mmseg和简繁转换的类库，依然默默 watch)

2. 安装

通过releases找到和es对应版本的zip文件，或者source文件（自己通过mvn package打包）；当然也可以下载最新master的代码。
进入elasticsearch安装目录/plugins；mkdir pinyin；cd pinyin；
cp 刚才打包的zip文件到pinyin目录；unzip解压
部署后，记得重启es节点

3. 配置

** settings配置 **

PUT  my_index/_settings 
"index" : {
        "number_of_shards" : "3",
        "number_of_replicas" : "1",
        "analysis" : {
          "analyzer" : {
            "default" : {
              "tokenizer" : "ik_max_word"
            },
            "pinyin_analyzer" : {
              "tokenizer" : "my_pinyin"
            }
          },
          "tokenizer" : {
            "my_pinyin" : {
              "keep_separate_first_letter" : "false",
              "lowercase" : "true",
              "type" : "pinyin",
              "limit_first_letter_length" : "16",
              "keep_original" : "true",
              "keep_full_pinyin" : "true"
            }
          }
        }
      }

** mapping 配置 **

PUT my_index/index_type/_mapping
"ep" : {
        "_all" : {
          "analyzer" : "ik_max_word"
        },
        "properties" : {
            "name" : {
                "type" : "text",
                "analyzer" : "ik_max_word",
                "include_in_all" : true,
                "fields" : {
                    "pinyin" : {
                        "type" : "text",
                        "term_vector" : "with_positions_offsets",
                        "analyzer" : "pinyin_analyzer",
                        "boost" : 10.0
                      }
                 }
            }
      }
}

4. 测试

通过_analyze测试下分词器是否能正常运行：

GET my_index/_analyze
{
    "text":["刘德华"],
    "ananlyzer":"pinyin_analyzer"
}

向index中put中文数据：

POST my_index/index_type -d'
{
"name":"刘德华"
}
'

中文分词测试（通过查询字符串）
curl http://localhost:9200/my_index/index_type/_search?q=name:刘
curl http://localhost:9200/my_index/index_type/_search?q=name:刘德

拼音测试（通过查询字符串）
curl http://localhost:9200/my_index/index_type/_search?q=name.pinyin:liu
curl http://localhost:9200/my_index/index_type/_search?q=name.pinyin:ldh
curl http://localhost:9200/my_index/index_type/_search?q=name.pinyin:de+hua

网友评论

ES集群

本文标题：ES中安装中文/拼音分词器（IK+pinyin）

本文链接：https://www.haomeiwen.com/subject/bptbbttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

ES中安装中文/拼音分词器（IK+pinyin）

1. 下载中文/拼音分词器

2. 安装

3. 配置

4. 测试

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

ES集群