美文网首页
ElasticSearch_笔记

ElasticSearch_笔记

作者: bboymonk | 来源:发表于2017-06-08 11:46 被阅读0次

    索引:

    在Elasticsearch中存储数据的行为就叫做索引(indexing),不过在索引之前,我们需要明确数据应该存储在哪里。文档归属于一种类型(type),而这些类型存在于索引(index)中
    数据库与ElasticSearch对比图:
    DB -> Databases -> Tables -> Rows -> Columns
    Elasticsearch -> Indices -> Types -> Documents -> Fields

    Elasticsearch集群可以包含多个索引(indices)(数据库),每一个索引可以包含多个类型(types)(表),每一个类型包含多个文档(documents)(行),然后每个文档包含多个字段(Fields)(列)。

    索引含义的区分:

    你可能已经注意到索引(index)这个词在Elasticsearch中有着不同的含义,所以有必要在此做一下区分:
    索引(名词) 如上文所述,一个索引(index)就像是传统关系数据库中的数据库,它是相关文档存储的地方,index的复数是indices 或indexes。
    索引(动词) 「索引一个文档」表示把一个文档存储到索引(名词)里,以便它可以被检索或者查询。这很像SQL中的INSERT关键字,差别是,如果文档已经存在,新的文档将覆盖旧的文档。
    倒排索引 传统数据库为特定列增加一个索引,例如B-Tree索引来加速检索。Elasticsearch和Lucene使用一种叫做倒排索引(inverted index)的数据结构来达到相同目的。

    默认情况下,文档中的所有字段都会被索引(拥有一个倒排索引),只有这样他们才是可被搜索的。

    为每个员工的文档(document)建立索引,每个文档包含了相应员工的所有信息。
    每个文档的类型为employee。
    employee类型归属于索引megacorp。
    megacorp索引存储在Elasticsearch集群中。
    我们能通过一个命令执行完成的操作:

    PUT /megacorp/employee/1
    {
        "first_name" : "John",
        "last_name" :  "Smith",
        "age" :        25,
        "about" :      "I love to go rock climbing",
        "interests": [ "sports", "music" ]
    }
    

    我们看到path:/megacorp/employee/1包含三部分信息:
    megacorp 索引名
    employee 类型名
    1 这个员工的ID
    请求实体(JSON文档),包含了这个员工的所有信息。他的名字叫“John Smith”,25岁,喜欢攀岩。

    注意:

    如果让ID自动增长,可以像下面这样:
    (原来是把文档存储到某个ID对应的空间,现在是把这个文档添加到某个_type下)

    POST /website/blog/
    {
      "title": "My second blog entry",
      "text":  "Still trying this out...",
      "date":  "2014/01/01"
    }
    

    检索:

    GET /website/blog/123?pretty 
    

    pretty的意思是对输出的格式进行美化。
    返回:

    {
      "_index": "website",
      "_type": "blog",
      "_id": "123",
      "_version": 2,
      "found": true,
      "_source": {
        "title": "My first blog entry",
        "text": "Just trying this out...",
        "date": "2014/01/01"
      }
    }
    
    GET /website/blog/123?_source=title,text 
    

    对_source字段里的属性进行过滤。
    返回:

    {
      "_index": "website",
      "_type": "blog",
      "_id": "123",
      "_version": 2,
      "found": true,
      "_source": {
        "text": "Just trying this out...",
        "title": "My first blog entry"
      }
    }
    
    GET /website/blog/123/_source 
    

    只查询_source字段。
    返回:

    {
      "title": "My first blog entry",
      "text": "Just trying this out...",
      "date": "2014/01/01"
    }
    

    =========================================================================================

    GET /website/blog/_search 
    

    检索全部信息
    返回:

    {
      "took": 3,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
          {
            "_index": "website",
            "_type": "blog",
            "_id": "123",
            "_score": 1,
            "_source": {
              "title": "My first blog entry",
              "text": "Just trying this out...",
              "date": "2014/01/01"
            }
          }
        ]
      }
    }
    
    GET /megacorp/employee/_search?q=first_name:Jane 
    

    检索全部信息并且first_name是Jane。
    返回:

    {
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 0.2876821,
        "hits": [
          {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
              "first_name": "Jane",
              "last_name": "Smith",
              "age": 32,
              "about": "I like to collect rock albums",
              "interests": [
                "music"
              ]
            }
          }
        ]
      }
    }
    

    使用DSL语句查询

    GET /megacorp/employee/_search
    {
        "query" : {
            "match" : {
                "last_name" : "Smith"
            }
        }
    }
    

    返回:

    {
      "took": 3,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0.2876821,
        "hits": [
          {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
              "first_name": "Jane",
              "last_name": "Smith",
              "age": 32,
              "about": "I like to collect rock albums",
              "interests": [
                "music"
              ]
            }
          },
          {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
              "first_name": "John",
              "last_name": "Smith",
              "age": 25,
              "about": "I love to go rock climbing",
              "interests": [
                "sports",
                "music"
              ]
            }
          }
        ]
      }
    }
    

    全文检索,会检索about属性里包含love字符的员工。

    GET /megacorp/employee/_search
    {
        "query" : {
            "match" : {
                "about" : "love"
            }
        }
    }
    

    返回:

    {
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 0.26742277,
        "hits": [
          {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 0.26742277,
            "_source": {
              "first_name": "John",
              "last_name": "Smith",
              "age": 25,
              "about": "I love to go rock climbing",
              "interests": [
                "sports",
                "music"
              ]
            }
          }
        ]
      }
    }
    

    短语搜索

    目前我们可以在字段中搜索单独的一个词,这挺好的,但是有时候你想要确切的匹配若干个单词或者短语(phrases)。例如我们想要查询同时包含"rock"和"climbing"(并且是相邻的)的员工记录。
    要做到这个,我们只要将match查询变更为match_phrase查询即可:

    GET /megacorp/employee/_search
    {
        "query" : {
            "match_phrase" : {
                "about" : "rock climbing"
            }
        }
    }
    

    返回:

    {
       ...
       "hits": {
          "total":      1,
          "max_score":  0.23013961,
          "hits": [
             {
                ...
                "_score":         0.23013961,
                "_source": {
                   "first_name":  "John",
                   "last_name":   "Smith",
                   "age":         25,
                   "about":       "I love to go rock climbing",
                   "interests": [ "sports", "music" ]
                }
             }
          ]
       }
    }
    

    高亮我们的搜索

    很多应用喜欢从每个搜索结果中高亮(highlight)匹配到的关键字,这样用户可以知道为什么这些文档和查询相匹配。在Elasticsearch中高亮片段是非常容易的。
    让我们在之前的语句上增加highlight参数:

    GET /megacorp/employee/_search
    {
        "query" : {
            "match_phrase" : {
                "about" : "rock climbing"
            }
        },
        "highlight": {
            "fields" : {
                "about" : {}
            }
        }
    }
    

    返回:

    {
       ...
       "hits": {
          "total":      1,
          "max_score":  0.23013961,
          "hits": [
             {
                ...
                "_score":         0.23013961,
                "_source": {
                   "first_name":  "John",
                   "last_name":   "Smith",
                   "age":         25,
                   "about":       "I love to go rock climbing",
                   "interests": [ "sports", "music" ]
                },
                "highlight": {
                   "about": [
                      "I love to go <em>rock</em> <em>climbing</em>" <1>
                   ]
                }
             }
          ]
       }
    }
    

    相关文章

      网友评论

          本文标题:ElasticSearch_笔记

          本文链接:https://www.haomeiwen.com/subject/gpwzfxtx.html