Mapping Analysis

作者: 潘大的笔记 | 来源:发表于2019-08-07 20:15 被阅读0次

Mapping Analysis
Useful online resources
CNV calling | DELLY
ES index 里创建多个type 异常
ElasticSearch系列三:初识搜索引擎
ElasticSearch学习(一):Mapping的使用
015.Elasticsearch Mapping介绍
HR如何学习猎头Mapping的技能
在ElasticSearch中什么是mapping，自定义一个索
Mapping

映射和分析

索引方式不同
代表“精确值”（包括string）的字段
代表“全文”的字段

精确值VS全文

精确值：就是字面上的意思，精确
全文：文本数据，通常指非结构化的数据
查询全文数据通常是“该文档匹配查询的程度有多大？”
全文查询，ES的做法是首先分析文档，之后根据结果创建倒排索引

倒排索引

倒排索引由文档中所有不重复词的列表构成，对于其中每个词，有一个包含它的文档列表。
github:https://github.com/elasticsearch-cn/elasticsearch-definitive-guide/blob/cn/052_Mapping_Analysis/35_Inverted_index.asciidoc

分析与分析器

github:https://github.com/elasticsearch-cn/elasticsearch-definitive-guide/blob/cn/052_Mapping_Analysis/40_Analysis.asciidoc

测试分析器
analyze：查看文本是如何被分析的，需要指定分析器和要分析的文本

GET /_analyze
{
  "analyzer": "standard",
  "text": "Text to analyze"
}

结果中每一个元素代表一个单独的词条

{
   "tokens": [
      {
         "token":        "text",
         "start_offset": 0,
         "end_offset":   4,
         "type":         "<ALPHANUM>",
         "position":     1
      },
      {
         "token":        "to",
         "start_offset": 5,
         "end_offset":   7,
         "type":         "<ALPHANUM>",
         "position":     2
      },
      {
         "token":        "analyze",
         "start_offset": 8,
         "end_offset":   15,
         "type":         "<ALPHANUM>",
         "position":     3
      }
   ]
}

position：词条出现的位置
start_offset：字符起始位置
end_offset：字符结束位置

映射

核心简单域类型：
1、字符串：string可视为全文或精确值
2、整数：byte, short, integer, long
3、浮点数：float, double
4、布尔型：boolean
5、日期：date
查看索引gb中类型tweet的映射：

GET /gb/_mapping/tweet

自定义域映射
1、全文字符串域和精确值字符串域的区别
2、使用特定语言分析器
3、优化域以适应部分匹配
4、指定自定义数据格式
5、还有很多……
默认string类型域会被认为包含全文，string域映射的两个重要属性index, analyzer

index属性的三个值
analyzed（默认）:首先分析字符串，然后索引它。也就是全文索引这个域。
not_analyzed：索引精确值
no：不索引这个域。这个域不会被搜索到

analyzer：指定分析器
standard/whitespace/simple/english

可以增加一个存在的映射，但是不能修改存在的域映射

复杂核心域类型

多值域，就是数组这样的或者一个JSON
空域不会被索引，null或[]或[null]
多层级对象（JSON原生数据类是对象，或者哈希，map等）

{
    "tweet":            "Elasticsearch is very flexible",
    "user": {
        "id":           "@johnsmith",
        "gender":       "male",
        "age":          26,
        "name": {
            "full":     "John Smith",
            "first":    "John",
            "last":     "Smith"
        }
    }
}

内部对象的映射

{
  "gb": {
    "tweet": { (1)
      "properties": {
        "tweet":            { "type": "string" },
        "user": { (2)
          "type":             "object",
          "properties": {
            "id":           { "type": "string" },
            "gender":       { "type": "string" },
            "age":          { "type": "long"   },
            "name":   { (2)
              "type":         "object",
              "properties": {
                "full":     { "type": "string" },
                "first":    { "type": "string" },
                "last":     { "type": "string" }
              }
            }
          }
        }
      }
    }
  }
}

内部对象如何被索引？

{
    "tweet":            [elasticsearch, flexible, very],
    "user.id":          [@johnsmith],
    "user.gender":      [male],
    "user.age":         [26],
    "user.name.full":   [john, smith],
    "user.name.first":  [john],
    "user.name.last":   [smith]
}

可以通过全路径来引用，例如user.name.last还可以加上type，例如tweet.user.name.first

内部对象数组

{
    "followers": [
        { "age": 35, "name": "Mary White"},
        { "age": 26, "name": "Alex Jones"},
        { "age": 19, "name": "Lisa Smith"}
    ]
}

处理后结果

{
    "followers.age":    [19, 26, 35],
    "followers.name":   [alex, jones, lisa, smith, mary, white]
}

这里存在一个问题，丢失了相关性

网友评论

本文标题：Mapping Analysis

本文链接：https://www.haomeiwen.com/subject/bnnduqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！