美文网首页elasticsearch玩转大数据Java学习笔记
十九、Elasticsearch基于slop参数实现近似匹配

十九、Elasticsearch基于slop参数实现近似匹配

作者: 编程界的小学生 | 来源:发表于2017-07-17 11:33 被阅读64次

    1、基本语法

    GET forum/article/_search
    {
      "query": {
        "match_phrase": {
          "title": {
            "query": "java spark",
            "slop" : 1
          }
        }
      }
    }
    

    2、slop的含义

    query string,搜索文本中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop

    3、slop举例

    一个query string经过几次移动之后可以匹配到一个document,然后设置slop

    doc:hello world, java is very good, spark is also very good.

    用match phrase去搜索java spark,是搜不到的

    如果我们指定了slop,那么久允许java spark进行移动,来尝试与doc进行匹配。

    java is very good spark is
    java spark
    java --》 spark
    java --》 --》 spark
    java --》 --》 --》 spark

    从表格中可以发现,我第一次移动了1位,spark到了very的位置,移动了三次后,恰巧到了对应的spark位置。所以这里slop就是3。因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了。

    slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上,一个query string terms,最多可以移动几次去尝试跟一个doc匹配上。这里slop设置大于等于3就ok。

    直接match_phrase搜索肯定是搜不到了,那么怎么才能搜到呢?

    GET /forum/article/_search
    {
        "query": {
            "match_phrase": {
                "title": {
                    "query": "java spark",
                    "slop":  3
                }
            }
        }
    }
    

    指定slop为大于等于3的数字就行了。原因我们已经在表格中体现了。

    4、slop搜索下,关键词离得越近,relevance score分数就越高

    GET /forum/article/_search
    {
      "query": {
        "match_phrase": {
          "content": {
            "query": "java best",
            "slop": 15
          }
        }
      }
    }
    

    结果:

    {
      "took": 3,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0.65380025,
        "hits": [
          {
            "_index": "forum",
            "_type": "article",
            "_id": "2",
            "_score": 0.65380025,
            "_source": {
              "articleID": "KDKE-B-9947-#kL5",
              "userID": 1,
              "hidden": false,
              "postDate": "2017-01-02",
              "tag": [
                "java"
              ],
              "tag_cnt": 1,
              "view_cnt": 50,
              "title": "this is java blog",
              "content": "i think java is the best programming language",
              "sub_title": "learned a lot of course",
              "author_first_name": "Smith",
              "author_last_name": "Williams",
              "new_author_last_name": "Williams",
              "new_author_first_name": "Smith"
            }
          },
          {
            "_index": "forum",
            "_type": "article",
            "_id": "5",
            "_score": 0.07111243,
            "_source": {
              "articleID": "DHJK-B-1395-#Ky5",
              "userID": 3,
              "hidden": false,
              "postDate": "2017-03-01",
              "tag": [
                "elasticsearch"
              ],
              "tag_cnt": 1,
              "view_cnt": 10,
              "title": "this is spark blog",
              "content": "spark is best big data solution based on scala ,an programming language similar to java spark",
              "sub_title": "haha, hello world",
              "author_first_name": "Tonny",
              "author_last_name": "Peter Smith",
              "new_author_last_name": "Peter Smith",
              "new_author_first_name": "Tonny"
            }
          }
        ]
      }
    }
    

    若有兴趣,欢迎来加入群,【Java初学者学习交流群】:458430385,此群有Java开发人员、UI设计人员和前端工程师。有问必答,共同探讨学习,一起进步!
    欢迎关注我的微信公众号【Java码农社区】,会定时推送各种干货:


    qrcode_for_gh_577b64e73701_258.jpg

    相关文章

      网友评论

        本文标题:十九、Elasticsearch基于slop参数实现近似匹配

        本文链接:https://www.haomeiwen.com/subject/vgsfkxtx.html