MySQL：如何对待分布偏移的数据

作者: 重庆八怪 | 来源:发表于2020-07-27 22:37 被阅读0次

MySQL：如何对待分布偏移的数据
数据存在不同系统，怎么统一做分析？
MYSQL的一些总结
分库分表，读写分离，数据库架构
ElasticSearch技术资料
by-梯度消失梯度爆炸
主从复制、双主复制及半同步复制、以及基于SSL的复制
高性能Mysql主从架构的复制原理及配置详解
一文了解Mysql读写分离原理及配置
旧瓶新酒怎么喝？协变量偏移（covariate shift）之评

作者微信:gp_22389860

btr_estimate_n_rows_in_range_low 函数用于非唯一访问的行数评估也就是在执行计划生成阶段会去访问实际的数据，尽可能的得到正确的执行计划，这个过程叫做dive。其和参数eq_range_index_dive_limit有关，如果参数设置1那么将禁用dive功能，使用统计数据进行执行计划的生成。但某些时候MySQL 5.7根据统计数据并不能得到正确的执行计划，因此才有了dive，但是dive会产生额外的代价，这个过程是需要实际访问数据块的。

常用的比如非唯一索引的数据访问，还有> <等这些范围扫描都是需要用到这个函数。下面来简单的看一下。其输入的参数包含了index 索引的相关的信息，范围的起始tuple和结束tuple首先通过btr_cur_search_to_nth_level函数进行定位获取起点记录
然后通过btr_cur_search_to_nth_level函数定位到终点记录通过cursor获取到path，假设起点的path为path1 终点的path为path2path为一个数组，其中保存了查询的路径，然后会循环的从顶层节点到叶子节点进行判断，其中包含了

/** A slot in the path array. We store here info on a search path down the
tree. Each slot contains data on a single level of the tree. */
struct btr_path_t {
    /* Assume a page like:
    records:             (inf, a, b, c, d, sup)
    index of the record:    0, 1, 2, 3, 4, 5
    */

    /** Index of the record where the page cursor stopped on this level
    (index in alphabetical order). Value ULINT_UNDEFINED denotes array
    end. In the above example, if the search stopped on record 'c', then
    nth_rec will be 3. */
    ulint   nth_rec;

    /** Number of the records on the page, not counting inf and sup.
    In the above example n_recs will be 4. */
    ulint   n_recs;

    /** Number of the page containing the record. */
    ulint   page_no;

    /** Level of the page. If later we fetch the page under page_no
    and it is no different level then we know that the tree has been
    reorganized. */
    ulint   page_level;
};

最终获取最后的估计得行数，如果它们起点和终点的行数过多，那么只会精确统计10个块的数据（N_PAGES_READ_LIMIT），这个在btr_estimate_n_rows_in_range_low调用的btr_estimate_n_rows_in_range_on_level函数中，剩下就是取得平均值了大概为：

        n_rows = n_rows_on_prev_level
            * n_rows / n_pages_read;

n_rows_on_prev_level就是实际的指向的块数，因为每一个上层节点记录指向叶子节点的一个block，n_pages_read就是大概10个块，n_rows就是10个块包含的行数。最终得到的n_rows就是一个大概的统计值了。这还是很有代表性的。
最终我们看到对于2层结构的索引，可能需要访问10(采样块)+2(leaf)+1(root)个块，这是有一定代价的。默认MySQL是开启了这个功能的，因此我们会发现如下的表可以得到正确的执行计划,比如性别列索引，其中30行，29行为男性，1行为女性，下面是执行计划示例，可以看到禁用dive后不能得到正确的执行计划：

mysql> set eq_range_index_dive_limit=100;
Query OK, 0 rows affected (0.00 sec)
mysql> desc select * from testdvi3 where sex='M';
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | testdvi3 | NULL       | ALL  | sex           | NULL | NULL    | NULL |   30 |    96.67 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (2.74 sec)
mysql> desc select * from testdvi3 where sex='W';
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | testdvi3 | NULL       | ref  | sex           | sex  | 9       | const |    1 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (2.00 sec)
mysql> set eq_range_index_dive_limit=1;
Query OK, 0 rows affected (0.00 sec)
mysql> desc select * from testdvi3 where sex='W';
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | testdvi3 | NULL       | ref  | sex           | sex  | 9       | const |   15 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> desc select * from testdvi3 where sex='M';
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | testdvi3 | NULL       | ref  | sex           | sex  | 9       | const |   15 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+------+---------+-------+------+----------+-------+

网友评论

本文标题：MySQL：如何对待分布偏移的数据

本文链接：https://www.haomeiwen.com/subject/pkztrktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

MySQL：如何对待分布偏移的数据

相关文章