美文网首页
7.1-对象及 Nested 对象

7.1-对象及 Nested 对象

作者: 落日彼岸 | 来源:发表于2020-04-08 15:07 被阅读0次

    数据的关联关系

    • 真实世界中有很多重要的关联关系

      • 博客 / 作者 / 评论

      • 银⾏账户有多次交易记录

      • 客户有多个银⾏账户

      • ⽬录⽂件有多个⽂件和⼦⽬录

    关系型数据库的范式化设计

    关系型数据库的范式化设计
    1NF – 消除⾮主属性对键的部分函数依赖
    2NF – 消除⾮主要属性对键的传递函数依赖
    3NF – 消除主属性对键的传递函数依赖
    BCNF –主属性不依赖于主属性
    • 范式化设计(Normalization)的主要⽬标是“减少不必要 的更新”

    • 副作⽤:⼀个完全范式化设计的数据库会经常⾯临 “查询缓慢”的问题

    • 数据库越范式化,就需要 Join 越多的表

    • 范式化节省了存储空间,但是存储空间却越来越便宜

    • 范式化简化了更新,但是数据“读”取操作可能更多

    Denormalization

    • 反范式化设计

      • 数据 “Flattening”,不使⽤关联关系,⽽是在⽂档中保存冗余的数据拷⻉
    • 优点:⽆需处理 Joins 操作,数据读取性能好

      • Elasticsearch 通过压缩 _source 字段,减少磁盘空间的开销
    • 缺点:不适合在数据频繁修改的场景

      • ⼀条数据(⽤户名)的改动,可能会引起很多数据的更新

    在 Elasticsearch 中处理关联关系

    • 关系型数据库,⼀般会考虑 Normalize 数据;在 Elasticsearch,往往考虑 Denormalize 数据

      • Denormalize 的好处:读的速度变快 / ⽆需表连接 / ⽆需⾏锁
    • Elasticsearch 并不擅⻓处理关联关系。我们⼀般采⽤以下四种⽅法处理关联

      • 对象类型

      • 嵌套对象(Nested Object)

      • ⽗⼦关联关系(Parent / Child )

      • 应⽤端关联

    案例 1:博客和其作者信息

    • 对象类型

      • 在每⼀博客的⽂档中都保留作者的信息

      • 如果作者信息发⽣变化,需要修改相关的 博客⽂档

    # 插入一条 Blog 信息
    PUT blog/_doc/1
    {
      "content":"I like Elasticsearch",
      "time":"2019-01-01T00:00:00",
      "user":{
        "userid":1,
        "username":"Jack",
        "city":"Shanghai"
      }
    }
    
    • 通过⼀条查询即可获取到博客和作者信息
    # 查询 Blog 信息
    POST blog/_search
    {
      "query": {
        "bool": {
          "must": [
            {"match": {"content": "Elasticsearch"}},
            {"match": {"user.username": "Jack"}}
          ]
        }
      }
    }
    
    res:
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "blog",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "content" : "I like Elasticsearch",
          "time" : "2019-01-01T00:00:00",
          "user" : {
            "userid" : 1,
            "username" : "Jack",
            "city" : "Shanghai"
          }
        }
      }
    ]
    
    

    案例 2:包含对象数组的⽂档

    DELETE my_movies
    
    # 电影的Mapping信息
    PUT my_movies
    {
          "mappings" : {
          "properties" : {
            "actors" : {
              "properties" : {
                "first_name" : {
                  "type" : "keyword"
                },
                "last_name" : {
                  "type" : "keyword"
                }
              }
            },
            "title" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        }
    }
    
    
    # 写入一条电影信息
    POST my_movies/_doc/1
    {
      "title":"Speed",
      "actors":[
        {
          "first_name":"Keanu",
          "last_name":"Reeves"
        },
    
        {
          "first_name":"Dennis",
          "last_name":"Hopper"
        }
    
      ]
    }
    
    # 查询电影信息
    POST my_movies/_search
    {
      "query": {
        "bool": {
          "must": [
            {"match": {"actors.first_name": "Keanu"}},
            {"match": {"actors.last_name": "Hopper"}}
          ]
        }
      }
    
    }
    res:
    
    "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 0.723315,
        "hits" : [
          {
            "_index" : "my_movies",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.723315,
            "_source" : {
              "title" : "Speed",
              "actors" : [
                {
                  "first_name" : "Keanu",
                  "last_name" : "Reeves"
                },
                {
                  "first_name" : "Dennis",
                  "last_name" : "Hopper"
                }
              ]
            }
          }
        ]
      }
    
    
    

    为什么会搜到不需要的结果?

    • 存储时,内部对象的边界并没有考虑在内,JSON 格式被处理成扁平式键值对的结构

    • 当对多个字段进⾏查询时,导致了意外的搜索结果

    • 可以⽤ Nested Data Type 解决这个问题

    image.png

    什么是 Nested Data Type

    • Nested 数据类型:允许对象数组中的 对象被独⽴索引

    • 使⽤ nested 和 properties 关键字,将所有 actors 索引到多个分隔的⽂档

    • 在内部, Nested ⽂档会被保存在两个 Lucene ⽂档中,在查询时做 Join 处理

    DELETE my_movies
    # 创建 Nested 对象 Mapping
    PUT my_movies
    {
          "mappings" : {
          "properties" : {
            "actors" : {
              "type": "nested",
              "properties" : {
                "first_name" : {"type" : "keyword"},
                "last_name" : {"type" : "keyword"}
              }},
            "title" : {
              "type" : "text",
              "fields" : {"keyword":{"type":"keyword","ignore_above":256}}
            }
          }
        }
    }
    

    嵌套查询

    • 在内部, Nested ⽂档会被保存在两个 Lucene ⽂档中,会在查询时做 Join 处理
    image.png
    # Nested 查询
    POST my_movies/_search
    {
      "query": {
        "bool": {
          "must": [
            {"match": {"title": "Speed"}},
            {
              "nested": {
                "path": "actors",
                "query": {
                  "bool": {
                    "must": [
                      {"match": {
                        "actors.first_name": "Keanu"
                      }},
    
                      {"match": {
                        "actors.last_name": "Hopper"
                      }}
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
    
    

    嵌套聚合

    # 普通 aggregation不工作
    POST my_movies/_search
    {
      "size": 0,
      "aggs": {
        "NAME": {
          "terms": {
            "field": "actors.first_name",
            "size": 10
          }
        }
      }
    }
    
    # Nested Aggregation
    POST my_movies/_search
    {
      "size": 0,
      "aggs": {
        "actors": {
          "nested": {
            "path": "actors"
          },
          "aggs": {
            "actor_name": {
              "terms": {
                "field": "actors.first_name",
                "size": 10
              }
            }
          }
        }
      }
    }
    
    res:
    "aggregations" : {
        "actors" : {
          "doc_count" : 2,
          "actor_name" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Dennis",
                "doc_count" : 1
              },
              {
                "key" : "Keanu",
                "doc_count" : 1
              }
            ]
          }
        }
      }
    

    本节知识点

    • 在 Elasticsearch 中,往往会 Denormalize 数据的⽅式建模(使⽤对象的⽅式)

      • 好处是:读写的速度变快 / ⽆需表连接 / ⽆需⾏锁
    • 如果⽂档的更新并不频繁,可以在⽂档中使⽤对象

    • 当对象包含了多值对象时

      • 可以使⽤嵌套对象(Nested Object)解决查询正确性的问题

    课程demos

    DELETE blog
    # 设置blog的 Mapping
    PUT /blog
    {
      "mappings": {
        "properties": {
          "content": {
            "type": "text"
          },
          "time": {
            "type": "date"
          },
          "user": {
            "properties": {
              "city": {
                "type": "text"
              },
              "userid": {
                "type": "long"
              },
              "username": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
    
    
    # 插入一条 Blog 信息
    PUT blog/_doc/1
    {
      "content":"I like Elasticsearch",
      "time":"2019-01-01T00:00:00",
      "user":{
        "userid":1,
        "username":"Jack",
        "city":"Shanghai"
      }
    }
    
    
    # 查询 Blog 信息
    POST blog/_search
    {
      "query": {
        "bool": {
          "must": [
            {"match": {"content": "Elasticsearch"}},
            {"match": {"user.username": "Jack"}}
          ]
        }
      }
    }
    
    
    DELETE my_movies
    
    # 电影的Mapping信息
    PUT my_movies
    {
          "mappings" : {
          "properties" : {
            "actors" : {
              "properties" : {
                "first_name" : {
                  "type" : "keyword"
                },
                "last_name" : {
                  "type" : "keyword"
                }
              }
            },
            "title" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        }
    }
    
    
    # 写入一条电影信息
    POST my_movies/_doc/1
    {
      "title":"Speed",
      "actors":[
        {
          "first_name":"Keanu",
          "last_name":"Reeves"
        },
    
        {
          "first_name":"Dennis",
          "last_name":"Hopper"
        }
    
      ]
    }
    
    # 查询电影信息
    POST my_movies/_search
    {
      "query": {
        "bool": {
          "must": [
            {"match": {"actors.first_name": "Keanu"}},
            {"match": {"actors.last_name": "Hopper"}}
          ]
        }
      }
    
    }
    
    DELETE my_movies
    # 创建 Nested 对象 Mapping
    PUT my_movies
    {
          "mappings" : {
          "properties" : {
            "actors" : {
              "type": "nested",
              "properties" : {
                "first_name" : {"type" : "keyword"},
                "last_name" : {"type" : "keyword"}
              }},
            "title" : {
              "type" : "text",
              "fields" : {"keyword":{"type":"keyword","ignore_above":256}}
            }
          }
        }
    }
    
    
    POST my_movies/_doc/1
    {
      "title":"Speed",
      "actors":[
        {
          "first_name":"Keanu",
          "last_name":"Reeves"
        },
    
        {
          "first_name":"Dennis",
          "last_name":"Hopper"
        }
    
      ]
    }
    
    # Nested 查询
    POST my_movies/_search
    {
      "query": {
        "bool": {
          "must": [
            {"match": {"title": "Speed"}},
            {
              "nested": {
                "path": "actors",
                "query": {
                  "bool": {
                    "must": [
                      {"match": {
                        "actors.first_name": "Keanu"
                      }},
    
                      {"match": {
                        "actors.last_name": "Hopper"
                      }}
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
    
    
    # Nested Aggregation
    POST my_movies/_search
    {
      "size": 0,
      "aggs": {
        "actors": {
          "nested": {
            "path": "actors"
          },
          "aggs": {
            "actor_name": {
              "terms": {
                "field": "actors.first_name",
                "size": 10
              }
            }
          }
        }
      }
    }
    
    
    # 普通 aggregation不工作
    POST my_movies/_search
    {
      "size": 0,
      "aggs": {
        "NAME": {
          "terms": {
            "field": "actors.first_name",
            "size": 10
          }
        }
      }
    }
    

    相关阅读

    相关文章

      网友评论

          本文标题:7.1-对象及 Nested 对象

          本文链接:https://www.haomeiwen.com/subject/dlgaphtx.html