1、terms 相当于sql中的in,多值搜索
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"tag_cnt": 1
}
},
{
"terms": {
"tag": ["java",”c++”]
}
}
]
}
}
}
}
}
2、搜索浏览量在30~60之间的帖子
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"range": {
"view_cnt": {
"gt": 30,
"lt": 60
}
}
}
}
}
}
gte lte包含
3、搜索标题中包含java或elasticsearch的
GET /forum/article/_search
{
"query": {
"match": {
"title": "java elasticsearch"
}
}
}
如果想同时满足,可以加operator
GET /forum/article/_search
{
"query": {
"match": {
"title": {
"query": "java elasticsearch",
"operator": "and"
}
}
}
}
在es内部,会转换成下面的去执行
{
"bool": {
"must": [
{ "term": { "title": "java" }},
{ "term": { "title": "elasticsearch" }}
]
}
}
使用minimum_should_match,必须至少匹配其中的多少个关键字,才能作为结果返回,可以用数字也可以使用百分比
GET /forum/article/_search
{
"query": {
"match": {
"title": {
"query": "java elasticsearch hadoop spark,",
"minimum_should_match": “75”%
}
}
}
}
也可以使用下面的should写法,当然,上面的语法,在es内部都会处理成下面的去查询
GET /forum/article/_search
{
"query": {
"bool": {
"should": [
{"match": {
"title": "java"
}},
{"match": {
"title": "elasticsearch"
}},
{
"match": {
"title": "hadoop"
}
},
{
"match": {
"title": "spark"
}
}
],
"minimum_should_match":3
}
}
}
4、boost的搜索条件权重控制
指定查询的内容中,什么排在最前面,默认是一样的,数值是你所查询中的条件个数+1,这样才会最优在前
GET /forum/article/_search
{
"query": {
"bool": {
"should": [
{"match": {
"title": {
"query":"java"
}
}},
{"match": {
"title": {"query":"elasticsearch"}
}},
{
"match": {
"title": {"query":"hadoop"}
}
},
{
"match": {
"title":{"query":"spark",
"boost":5}
}
}
]
}
}
}
5、多shard场景下relevance score不准确问题大揭秘
如果你的一个index有多个shard的话,可能搜索结果会不准确
图片1.png解决:
(1)生产环境下,数据量大,尽可能实现均匀分配
(2)测试环境下,将索引的primary shard设置为1个,number_of_shards=1
(3)测试环境下,搜索附带search_type=dfs_query_then_fetch参数,会将local IDF取出来计算从新计算,但是生产环境下,不推荐这个参数,因为性能很差。
GET /forum/article/_search?search_type=dfs_query_then_fetch
6、dis_max实现best fields策略进行多字段搜索
(1)****搜索title或content中包含java或solution的帖子
GET /forum/article/_search
{
"query": {
"bool": {
"should": [
{"match": { "title": "java solution"}},
{"match": {
"content": "java solution"
}}
]
}
}
}
{
"_index": "forum",
"_type": "article",
"_id": "4",
"_score": 0.7120095,
"_source": {
"articleID": "QQPX-R-3956-#aD8",
"userID": 2,
"hidden": true,
"postDate": "2017-01-02",
"tag": [
"java",
"elasticsearch"
],
"tag_cnt": 2,
"view_cnt": 80,
"title": "this is java, elasticsearch, hadoop blog",
"content": "elasticsearch and hadoop are all very good solution, i am a beginner"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "5",
"_score": 0.56008905,
"_source": {
"articleID": "DHJK-B-1395-#Ky5",
"userID": 3,
"hidden": false,
"postDate": "2017-03-01",
"tag": [
"elasticsearch"
],
"tag_cnt": 1,
"view_cnt": 10,
"title": "this is spark blog",
"content": "spark is best big data solution based on scala ,an programming language similar to java"
}
},
(2)结果分析
期望的是doc5,结果是doc2,doc4排在了前面
计算每个document的relevance score:每个query的总分数,乘以matched query数量,除以总query数量
算一下doc4的分数
{ "match": { "title": "java solution" }},针对doc4,是有一个分数的
{ "match": { "content": "java solution" }},针对doc4,也是有一个分数的
所以是两个分数加起来,比如说,1.1 + 1.2 = 2.3
matched query数量 = 2
总query数量 = 2
2.3 * 2 / 2 = 2.3
算一下doc5的分数
{ "match": { "title": "java solution" }},针对doc5,是没有分数的
{ "match": { "content": "java solution" }},针对doc5,是有一个分数的
所以说,只有一个query是有分数的,比如2.3
matched query数量 = 1
总query数量 = 2
2.3 * 1 / 2 = 1.15
doc5的分数 = 1.15 < doc4的分数 = 2.3
(3)best fields策略,dis_max
best fields策略,就是说,搜索到的结果,应该是某一个field中匹配到了尽可能多的关键词,被排在前面;而不是尽可能多的field匹配到了少数的关键词,排在了前面
dis_max语法,直接取多个query中,分数最高的那一个query的分数即可
{ "match": { "title": "java solution" }},针对doc4,是有一个分数的,1.1
{ "match": { "content": "java solution" }},针对doc4,也是有一个分数的,1.2
取最大分数,1.2
{ "match": { "title": "java solution" }},针对doc5,是没有分数的
{ "match": { "content": "java solution" }},针对doc5,是有一个分数的,2.3
取最大分数,2.3
然后doc4的分数 = 1.2 < doc5的分数 = 2.3,所以doc5就可以排在更前面的地方,符合我们的需要
GET /forum/article/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "java solution" }},
{ "match": { "content": "java solution" }}
]
}
}
}
(4)、tie_breaker参数优化dis_max搜索效果
dis_max只取某一个query最大的分数,完全不考虑其他query的分数,可能会导致排序不一对,可以使用tie_breaker将其他query的分数也考虑进去
GET /forum/article/_search
{
"query": {
"dis_max": {
"queries": [{"match": {
"title": "java solution"
}},{"match": {
"content": "java solution"
}}],"tie_breaker": 0.3
}
}
}
7、multi_match语法实现dis_max+tie_breaker
GET /forum/article/_search
{
"query": {
"multi_match": {
"query": "java solution",
"fields": ["title^2","content"],
"type": "best_fields",
"tie_breaker": 0.3,
"minimum_should_match":"70%"
}
}
}
网友评论