本文参见的狂神的视频:https://www.bilibili.com/video/BV17a4y1x7zq?p=19&spm_id_from=pageDriver
ElasticSearch:https://mirrors.huaweicloud.com/elasticsearch/?C=N&O=D
logstash:https://mirrors.huaweicloud.com/logstash/?C=N&O=D
kibana:https://mirrors.huaweicloud.com/kibana/?C=N&O=D
elasticsearch-analysis-ik:https://github.com/medcl/elasticsearch-analysis-ik/releases
cerebro:https://github.com/lmenezes/cerebro/releases
1.es(ElasticSearch):作用于搜索,像百度搜索出来的关键字会高亮显示,那么我们也可以通过ElasticSearch技术使我们的网站达到这样的效果
2.Doug Cutting:hadoop,lucene的创造者,lucene只是一套检索工具包,不包含搜索引擎工具,ElasticSearch是基于lucene做了一套封装和增强
ELK:ElasticSearch logstash kibana
elasticsearch,head那户藏启动
ealsticsearch 的bin目录下面有一个.bat的启动文件,点击既可以启动,head下载之后是一个前端项目,,运行需要先安装nodejs和python,然后在head的项目目录下使用cnpm install安装插件,再使用npm run start启动,这个使用连接会出跨域问题,将elasticsearch关闭将elasticsearch.yml添加如下所示,然后重新启动即可,连接之后可以创建索引,相当于数据库
http.cors.enabled: true
http.cors.allow-origin: "*"
image.png
logstash将数据进行清洗传输到elasticsearch,然后kibana进行展示
下载kibana,然后启动,然后点击扳手(🔧,Dev tools),进行查询,elasticsearch里面的数据为document相当于一条条数据,里面还有fields字段
倒排索引:比如有两篇文章,将两篇文章里面的所有的关键字(分词器)进行排列,然后用户进行搜索信息,搜索结果会根据用户搜索的信息在那篇文章出现点次数或则权重进行一个排序,
IK分词器:下载之后解压到elasticsearch里面(需要用一个文件夹包住),然后重启,闪退多数是因为版本不同,可以在日志里面查看,再elasticsearch/bin使用elasticsearch-plugin list查看添加的插件,
ik分词器的两种模式ik_smart,ik_max_word,下面是简单的分词检索语法
GET _analyze
{
"analyzer": "ik_smart",
"text": "我超级喜欢狂神说"
}
GET _analyze
{
"analyzer": "ik_max_word",
"text": "我超级喜欢狂神说"
}
添加自己的分词:
1.找到IKAnalyzer.cfg.xml在里面添加自己的分词文件
image.png
2.当前目录创建自己的分词文件
image.png
3.全部重启测试
Restful风格:
数据类型:
字符串类型:text,keyword
数值类型:long,integer,short,byte,double,float,half float,scaled float
日期类型:date
布尔值类型:boolean
二进制类型:binary
//创建一个索引,并向其中添加了一条数据
PUT /test1/type1/1
{
"name": "狂神说",
"age": 23
}
//创建一条索引
PUT /test3
{
"mappings": {
"properties": {
"name" : {
"type": "text"
},
"age": {
"type": "long"
},
"birthday": {
"type": "date"
}
}
}
}
//get请求可以获取具体信息
GET test3
//修改数据最暴力的方法添加这条数据,有一个缺点,如果更新的数据少添加了一个字段,那么数据就缺失了
//使用post进行修改
POST /test4/_doc/1/_update
{
"doc": {
"name": "法外狂徒"
}
}
//删除索引,其他的根据这个探索
DELETE test3
关于文档的操作
PUT /moxuan/user/1
{
"name": "狂神说",
"age": 23,
"desc": "天下之大,唯舞独尊",
"tags": ["技术宅","温暖","直男"]
}
简单的索搜
GET moxuan/user/_search?q=name:张三
花式搜索
image.png//筛选字段
GET moxuan/user/_search
{
"query": {
"match": {
"name": "三"
}
},
"_source": ["name","desc"]
}
//查询结果根据某个字段进行排序,降序,升序asc
GET moxuan/user/_search
{
"query": {
"match": {
"name": "三"
}
},
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
//分页查询
GET moxuan/user/_search
{
"query": {
"match": {
"name": "三"
}
},
"from": 0,
"size": 1
}
//bool多条件查询,must所有条件都需要符合,should为or查询
GET moxuan/user/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "三"
}
},
{
"match": {
"age": "57"
}
}
]
}
}
}
//使用filter对数据进行过滤,条件查询,下列表示年龄大于100,lt小于,gte:equals
GET moxuan/user/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "三"
}
}
],
"filter": {
"range": {
"age": {
"gt": 10,
"lt": 57
}
}
}
}
}
}
//枚举查询,多个条件使用空格进行查询,满足其中一个条件即可查出,这时可以根据分俗话进行筛选
GET moxuan/user/_search
{
"query": {
"match": {
"tags": "男 技术"
}
}
}
精确查询
term查询是直接通过倒排索引指定的词条进行精确查找的
关于分词:
term:直接查询精确的
match:会使用分词器解析(线分析文档,然后根据分析的文档进行查询)
//keyword,不会进行分析,text会进行分析
GET _analyze
{
"analyzer": "standard",
"text": "狂神说java"
}
通过以上测试,总结以下就是,keyword不会被分词器解析,然后解析之后,根据搜索的关键字取匹配分词器解析出来的词,有则筛选出来,没有out
//搜索结果高亮显示
GET moxuan/user/_search
{
"query":{
"match": {
"name": "狂神"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
//自定义高亮条件
GET moxuan/user/_search
{
"query":{
"match": {
"name": "狂神"
}
},
"highlight": {
"pre_tags": "<p class='key' style='color:red'>",
"post_tags": "</p>",
"fields": {
"name": {}
}
}
}
SpringBoot集成ElasticSearch
首先添加依赖
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>7.6.1</version>
</dependency>
配置elasticsearch配置类
@Configuration
public class ElasticSearchClient {
@Bean
public RestHighLevelClient restHighLevelClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost("127.0.0.1",9200,"http"))
);
return client;
}
}
开始我们的第一个测试demo
//找对象
//添加到容器里面
//springboot可以进行一波源码分析
@SpringBootTest
class DemoApplicationTests {
@Autowired
@Qualifier("restHighLevelClient")
private RestHighLevelClient client;
@Test //创建索引
void createIndex() throws IOException {
//1.创建索引的请求,并没有执行
CreateIndexRequest request = new CreateIndexRequest("kuang_index");
//2.执行请求
CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
System.out.println(response);
}
@Test //获取索引
void getIndex() throws IOException {
//1.获取索引的请求,并没有执行
GetIndexRequest request = new GetIndexRequest("kuang_index");
//2.执行请求
boolean response = client.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(response);
}
@Test //删除索引
void delIndex() throws IOException {
//1.获取索引的请求,并没有执行
DeleteIndexRequest request = new DeleteIndexRequest("kuang_index");
//2.执行请求
AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT);
System.out.println(response);
}
@Test //添加文档
void addDoc() throws IOException {
//1.创建对象
User user = new User("狂神说",23);
//创建请求
IndexRequest request = new IndexRequest("kuang_index");
//规则 put/kuang_index/_doc/1
request.id("1");
request.timeout(TimeValue.timeValueSeconds(1));
//将我们的数据放入请求
IndexRequest source = request.source(JSON.toJSONString(user), XContentType.JSON);
//客户端发送请求,获取相应结果
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
System.out.println(response);
System.out.println(response.status());
}
@Test //获取文档
void getDoc() throws IOException {
GetRequest request = new GetRequest("kuang_index","1");
//不会取返回的_source的上下文
/* 判断文档是否存在
request.fetchSourceContext(new FetchSourceContext(false));
request.storedFields("_none_");
boolean exists = client.exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
*/
GetResponse response = client.get(request, RequestOptions.DEFAULT);
System.out.println(response.getSourceAsString()); //打印文档的内容
System.out.println(response);//返回的内容和命令行是一样的
}
@Test //更新文档
void updateDoc() throws IOException {
UpdateRequest request = new UpdateRequest("kuang_index","1");
request.timeout("1s");
User user = new User("牛牛牛",22);
request.doc(JSON.toJSONString(user),XContentType.JSON);
UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
System.out.println(response.status());
}
@Test //删除文档
void delDoc() throws IOException {
DeleteRequest request = new DeleteRequest("kuang_index","1");
DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
System.out.println(response.status());
}
@Test //删除文档
void bulk() throws IOException {
BulkRequest request = new BulkRequest();
request.timeout("2s");
ArrayList<User> list = new ArrayList<>();
list.add(new User("moxuan1",23));
list.add(new User("moxuan2",22));
list.add(new User("moxuan3",21));
for (int i = 0; i < list.size(); i++) {
request.add(
new IndexRequest("kuang_index")
.id(""+(i+1))
.source(JSON.toJSONString(list.get(i)),XContentType.JSON));
}
BulkResponse responses = client.bulk(request, RequestOptions.DEFAULT);
System.out.println(responses.status());
}
//查询
//SearchRequest搜索请求
//SearchSourceBuilder搜索请求构造
@Test
void query() throws Exception {
SearchRequest searchRequest = new SearchRequest("kuang_index");
//构建搜索条件
SearchSourceBuilder searchBuilder = new SearchSourceBuilder();
//查询条件,我们可以使用QueryBuilders工具来实现
//QueryBuilders.termQuery精确匹配和匹配所有
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "moxuan1");
//MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
searchBuilder.query(termQueryBuilder);
searchBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
searchRequest.source(searchBuilder);
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
for (SearchHit hit : response.getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
}
项目实战:
导入基本依赖
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>7.6.1</version>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.79</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency>
#######1.爬取数据解析网页数据
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.2</version>
</dependency>
爬去代码
@Component
public class HtmlParseUtil {
public static void main(String[] args) throws Exception {
//parseJD("数学").forEach(System.out::println);
}
public ArrayList<Content> parseJD(String keyword) throws Exception {
//网页请求地址
String url = "https://search.jd.com/Search?keyword=" + keyword;
//解析网页,返回的网页代码
Document document = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 5.1; zh-CN) AppleWebKit/535.12 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/535.12").timeout(30000).get();
//System.out.println(document);
//所有js里可以使用的代码这里也可以进行使用
Element element = document.getElementById("J_goodsList");
Elements li = element.getElementsByTag("li");
//System.out.println(elements);
ArrayList<Content> goodList = new ArrayList<>();
//获取所有的列标签
for (Element el : li) {
String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
String price = el.getElementsByClass("p-price").eq(0).text();
String title = el.getElementsByClass("p-name").eq(0).text();
Content content = new Content(title, img, price);
goodList.add(content);
}
return goodList;
}
}
将数据存储到elasticsearch
@Autowired
private RestHighLevelClient restHighLevelClient;
public Boolean parseContent(String keyword) throws Exception {
ArrayList<Content> contents = new HtmlParseUtil().parseJD(keyword);
//把数据放入es里面
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.timeout("2s");
for (Content content : contents) {
bulkRequest.add(
new IndexRequest("jd_goods")
.source(JSON.toJSONString(content), XContentType.JSON));
}
BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
return !bulk.hasFailures();
}
前端获取数据并且高亮显示
public List<Map<String,Object>> searchPage(String keyword, int pageNo, int pageSize) throws Exception {
SearchRequest searchRequest = new SearchRequest("jd_goods");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.from(pageNo);
sourceBuilder.size(pageSize);
TermQueryBuilder termQueryBuilder = new TermQueryBuilder("title", keyword);
sourceBuilder.query(termQueryBuilder);
sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
//高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
highlightBuilder.requireFieldMatch(false);//多个高亮显示
highlightBuilder.preTags("<span style='color:red'>");
highlightBuilder.postTags("</span>");
sourceBuilder.highlighter(highlightBuilder);
searchRequest.source(sourceBuilder);
SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
ArrayList<Map<String,Object>> results = new ArrayList<>();
for (SearchHit hit : response.getHits()) {
//解析高亮的字段
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
HighlightField title = highlightFields.get("title");
Map<String, Object> sourceAsMap = hit.getSourceAsMap(); //原来的结果
//解析高亮的字段
if(title!=null){
Text[] fragments = title.fragments();
String n_title = "";
for (Text text : fragments) {
n_title += text;
}
sourceAsMap.put("title", n_title); //高亮字段替换原来的内容
}
results.add(sourceAsMap);
}
return results;
}
前端简单的代码,仅为demo
<template>
<div>
<input type="text" v-model="keyword"><button @click="searchKey">搜索</button>
<div id="app">
<div class="item" v-for="item in results">
<img v-bind:src="item.img"/>
<p>价格:{{item.price}}</p>
<div v-html="item.title"></div>
</div>
</div>
</div>
</template>
<script>
export default {
name: "index",
data: function () {
return {
keyword: 'java',
pageNo: 1,
pageSize: 8,
results: []
}
},
methods:{
searchKey(){
var url = "http://localhost:8081/search/" + this.keyword + "/" + this.pageNo + "/" + this.pageSize;
this.$http.get(url)
.then(res=>{
console.log(res)
this.results = res.data;
})
}
}
}
</script>
<style scoped>
img{
width: 400px;
height: 300px;
}
.item{
width: 400px;
height: 350px;
float: left;
margin: 40px;
}
</style>
到此结束,由于本记录为个人学习打卡记录,写的比较草望见谅
网友评论