Elasticsearch集成ik分词器

作者: geekAppke | 来源:发表于2018-12-20 11:08 被阅读7次

Elasticsearch集成ik分词器
elk--笔记6-安装ik分词器
elasticsearch ik分词器安装
elasticsearch安装与使用
Elasticsearch安装中文分词器IK
ES-ik分词器安装
ELK实现全文检索
ElasticSearch学习笔记3--IK分词器插件
Elasticsearch插件之分词ik
三十、Elasticsearch安装中文分词IK

爬取data数据 → java-api(数据清洗、转换成document) → 上传es分布式集群(倒排索引) → 搜索


1、爬虫，来获取网站的html数据
    nutch，Python（主流），wget（c语言）
    安装：yum install wget
    堵塞运行：wget -o /tmp/wget.log -P /root/data  --no-parent --no-verbose -m -D news.cctv.com   -N --convert-links --random-wait -A html,HTML,shtml,SHTML http://news.cctv.com
    动态查看：tail -f /tmp/wget.log
2、数据抽取：从网页中抽取数据
    news.cctv.com/2017/10/06/ARTIZbHyClb2f7DMTDr1uDO9171006.shtml
3、把抽取出来的数据同ES建立索引
4、搜索


拿到网页不是马上做倒排索引，先把无用数据去掉

集群安装ik分词器

1.关闭es集群
2.共享模式下`hadoop`用户在`plugins`下创建同步创建 `ik目录`

3.把ik分词器压缩包(2.2.1)，上传到node002 的`ik`目录
4.使用`hadoop`用户解压 unzip -d
5.修改ik插件描述符的配置 `vi plugin-descriptor.properties`，版本变为2.2.1 （/version 搜索）
# plugins with the incorrect elasticsearch.version.
elasticsearch.version=2.2.1

6.将ik分发给别的节点：[hadoop@node002 plugins]~ scp -r ik/ hadoop@node003:`pwd`
7.重启动集群