美文网首页
elasticsearch 学习笔记1

elasticsearch 学习笔记1

作者: 第八共同体 | 来源:发表于2017-12-09 22:06 被阅读0次

    1.几个基本概念介绍

    一个 Elasticsearch 集群可以包含多个索引,相应的每个索引可以包含多个类型。这些不同的类型存储着多个文档,每个文档又有多个属性

    • 索引(index)相当于关系型数据库中的dbname.
    • 类型 (type) 相当于关系型数据库中的Table
    • 文档(document)相当于关系型数据库中的记录
    • 属性,相当于关系型数据库中的字段

    2.实例介绍

    对于雇员目录,我们将做如下操作:

    • 每个雇员索引一个文档,包含该雇员的所有信息。
    • 每个文档都将是employee类型
    • 该类型位于索引megacorp内。
    • 该索引保存在我们的 Elasticsearch 集群中

    注:一下实例中的命令都是curl的简写形式,
    具体省略的包括curl -XPUT 'http://192.168.0.103:9200/megacorp/employee/1'

    PUT /megacorp/employee/1
    {
        "first_name" : "John",
        "last_name" :  "Smith",
        "age" :        25,
        "about" :      "I love to go rock climbing",
        "interests": [ "sports", "music" ]
    }
    

    注意,路径 /megacorp/employee/1 包含了三部分的信息:

    • megacorp 索引名称
    • employee 类型名称
    • 1 特定雇员的ID

    让我们增加更多的员工信息到目录中:

    PUT /megacorp/employee/2
    {
        "first_name" :  "Jane",
        "last_name" :   "Smith",
        "age" :         32,
        "about" :       "I like to collect rock albums",
        "interests":  [ "music" ]
    }
    

    3.检索文档

    指定id参数会检索指定id的文档

    GET /megacorp/employee/1
    {
      "_index" :   "megacorp",
      "_type" :    "employee",
      "_id" :      "1",
      "_version" : 1,
      "found" :    true,
      "_source" :  {
          "first_name" :  "John",
          "last_name" :   "Smith",
          "age" :         25,
          "about" :       "I love to go rock climbing",
          "interests":  [ "sports", "music" ]
      }
    }
    

    4.轻量检索

    直接加_search,返回结果包括了所有三个文档,放在数组 hits 中。一个搜索默认返回十条结果。返回结果不仅告知匹配了哪些文档,还包含了整个文档本身:显示搜索结果给最终用户所需的全部信息。

    GET /megacorp/employee/_search
    {
       "took":      6,
       "timed_out": false,
       "_shards": { ... },
       "hits": {
          "total":      3,
          "max_score":  1,
          "hits": [
             {
                "_index":         "megacorp",
                "_type":          "employee",
                "_id":            "3",
                "_score":         1,
                "_source": {
                   "first_name":  "Douglas",
                   "last_name":   "Fir",
                   "age":         35,
                   "about":       "I like to build cabinets",
                   "interests": [ "forestry" ]
                }
             },
             {
                "_index":         "megacorp",
                "_type":          "employee",
                "_id":            "1",
                "_score":         1,
                "_source": {
                   "first_name":  "John",
                   "last_name":   "Smith",
                   "age":         25,
                   "about":       "I love to go rock climbing",
                   "interests": [ "sports", "music" ]
                }
             },
             {
                "_index":         "megacorp",
                "_type":          "employee",
                "_id":            "2",
                "_score":         1,
                "_source": {
                   "first_name":  "Jane",
                   "last_name":   "Smith",
                   "age":         32,
                   "about":       "I like to collect rock albums",
                   "interests": [ "music" ]
                }
             }
          ]
       }
    }
    

    高亮搜索,在_search后加上搜索参数q=''

    GET /megacorp/employee/_search?q=last_name:Smith
    {
       ...
       "hits": {
          "total":      2,
          "max_score":  0.30685282,
          "hits": [
             {
                ...
                "_source": {
                   "first_name":  "John",
                   "last_name":   "Smith",
                   "age":         25,
                   "about":       "I love to go rock climbing",
                   "interests": [ "sports", "music" ]
                }
             },
             {
                ...
                "_source": {
                   "first_name":  "Jane",
                   "last_name":   "Smith",
                   "age":         32,
                   "about":       "I like to collect rock albums",
                   "interests": [ "music" ]
                }
             }
          ]
       }
    }
    

    5.查询表达式搜索

    领域特定语言DSL指定了使用一个 JSON 请求。我们可以像这样重写之前的查询所有 Smith 的搜索

    GET /megacorp/employee/_search
    {
        "query" : {
            "match" : {
                "last_name" : "Smith"
            }
        }
    }
    

    更复杂一些的查询:

    GET /megacorp/employee/_search
    {
        "query" : {
            "bool": {
                "must": {
                    "match" : {
                            "last_name" : "smith"
                     }
                },
                "filter": {
                    "range" : {
                               "age" : { "gt" : 30 }
                     }
                }
            }
        }
    }
    

    现在结果只返回了一个雇员,叫 Jane Smith,32 岁。

    {
       ...
       "hits": {
          "total":      1,
          "max_score":  0.30685282,
          "hits": [
             {
                ...
                "_source": {
                   "first_name":  "Jane",
                   "last_name":   "Smith",
                   "age":         32,
                   "about":       "I like to collect rock albums",
                   "interests": [ "music" ]
                }
             }
          ]
       }
    }
    

    7.全文搜索

    以上介绍的都是简单的查询,现在尝试下稍微高级点儿的全文搜索

    GET /megacorp/employee/_search
    {
        "query" : {
            "match" : {
                "about" : "rock climbing"
            }
        }
    }
    

    举个更加具体的Python示例:

    def main():
        # es = Elasticsearch(es_hosts)
        es = Elasticsearch(es_hosts, http_auth=es_auth)
        #res = es.get(index=index_name, doc_type=doc_type, id='AWA7wc7KqvKHwbnCSiGf')['_source']
        # res = es.search(index=index_name, body={"query": {"match_all": {}}})
        res = es.search(index=index_name, body={"query": {"match": {"consignee": "张前程"}}})
        pprint(res)
    
    
    if __name__ == '__main__':
        main()
    

    上述示例会查询出所有consignee字段匹配到张、前、程当个其中一个或多个的结果。

    8.短语搜索

    找出一个属性中的独立单词是没有问题的,但有时候想要精确匹配一系列单词或者短语。比如, 我们想执行这样一个查询,仅匹配同时包含 “rock”“climbing” ,并且二者以短语 “rock climbing” 的形式紧挨着的雇员记录。

    GET /megacorp/employee/_search
    {
        "query" : {
            "match_phrase" : {
                "about" : "rock climbing"
            }
        }
    }
    

    还是上面那个【python】示例,

    def main():
        # es = Elasticsearch(es_hosts)
        es = Elasticsearch(es_hosts, http_auth=es_auth)
        #res = es.get(index=index_name, doc_type=doc_type, id='AWA7wc7KqvKHwbnCSiGf')['_source']
        # res = es.search(index=index_name, body={"query": {"match_all": {}}})
        # res = es.search(index=index_name, body={"query": {"match": {"consignee": "张前程"}}})
        res = es.search(index=index_name, body={"query": {"match_phrase": {"consignee": "张前程"}}})
        pprint(res)
    
    
    if __name__ == '__main__':
        main()
    

    上述示例仅仅匹配consignee字段为张前程的结果

    9.高亮搜索

    许多应用都倾向于在每个搜索结果中高亮部分文本片段,以便让用户知道为何该文档符合查询条件。在 Elasticsearch 中检索出高亮片段也很容易

    GET /megacorp/employee/_search
    {
        "query" : {
            "match_phrase" : {
                "about" : "rock climbing"
            }
        },
        "highlight": {
            "fields" : {
                "about" : {}
            }
        }
    }
    

    上述【python】示例的版本

    def main():
        # es = Elasticsearch(es_hosts)
        es = Elasticsearch(es_hosts, http_auth=es_auth)
        #res = es.get(index=index_name, doc_type=doc_type, id='AWA7wc7KqvKHwbnCSiGf')['_source']
        # res = es.search(index=index_name, body={"query": {"match_all": {}}})
        # res = es.search(index=index_name, body={"query": {"match": {"consignee": "张前程"}}})
        res = es.search(index=index_name, body={"query": {"match_phrase": {"consignee": "张前程"}}, "highlight":{"fields":{"consignee":{}}}})
        pprint(res)
    
    
    if __name__ == '__main__':
        main()
    

    查询结果中还多了一个叫做 highlight 的部分。


    image.png

    当然了,我们还可以指定标签

    def main():
        # es = Elasticsearch(es_hosts)
        es = Elasticsearch(es_hosts, http_auth=es_auth)
        #res = es.get(index=index_name, doc_type=doc_type, id='AWA7wc7KqvKHwbnCSiGf')['_source']
        # res = es.search(index=index_name, body={"query": {"match_all": {}}})
        # res = es.search(index=index_name, body={"query": {"match": {"consignee": "张前程"}}})
        res = es.search(index=index_name, body={"query": {"match_phrase": {"consignee": "张前程"}}, "highlight":{"pre_tags" : ["<span>"],
            "post_tags" : ["</span>"],"fields":{"consignee":{}}}})
        pprint(res)
    
    
    if __name__ == '__main__':
        main()
    

    还有更多高级用法,可查看高亮

    相关文章

      网友评论

          本文标题:elasticsearch 学习笔记1

          本文链接:https://www.haomeiwen.com/subject/dugpixtx.html