MySQL:优化器统计数据可能过旧

作者: 重庆八怪 | 来源:发表于2022-04-28 22:34 被阅读0次

问题

最近看到一个问题就是一个大表有大约2000万行的数据，其统计数据还是1500万，且收集的时间为去年，其实我们知道一般来讲我们表的数据更改超过10%会重新收集统计数据，统计数据在非dive（eq_range_index_dive_limit参数相关）的情况下就会使用统计数据作为优化器判断执行计划的标准

何时进行统计数据收集

我们这里只看默认配置下的持久化(PERSISTENT)统计数据的收集方式。
实际上在每次更新的时候都会将这个表的modify计数器增加1，然后判断是否需要收集，则收集的逻辑大约如下：

    if (dict_stats_is_persistent_enabled(table)) {
        if (counter > n_rows / 10 /* 10% */
            && dict_stats_auto_recalc_is_enabled(table)) {

            dict_stats_recalc_pool_add(table);
            table->stat_modified_counter = 0;
        }
        return;
    }

其他条件都比较固定，因为我们是默认的参数设置，那么只有counter 这样一个计数器，其来自于innodb的结构dict_table_t.stat_modified_counter。如果counter 的量超过总量的当前统计数据行数的1/10，则需要收集统计数据，会推给我们的后台收集线程进行统计数据收集。

问题

但是问题是这个counter计数器，会在table share过期后重新加载进来的时候归0，那么

flush table(flush table with read lock)，手动关闭了全部的table share
table_definition_cache设置过小，淘汰table share

都会导致这个计数器变为0，而FTWRL是我们大部分备份都需要的，当然8.0的xtrbackup不需要了，使用了lock instance for backup和访问ps.LOG_status 来代替FTWRL，降低备份时锁的粒度。具体参考

WL#9451 Backup Lock
WL#9452 Log Position Lock

我们可以想象一下，如果是一个2000万行的表，每天修改量为50万，每天备份一次，那么这个表可能永远也不会收集统计数据，因为每天晚上这个表的修改量计数器counter 就会归为0。我也把这个给官方提了一下，我觉得往往性能问题都是大表，这种问题可能导致SQL执行计划的错误，但是这并不是BUG，而是需要一个新功能。如下：
https://bugs.mysql.com/bug.php?id=107145

当然是否还有其他情况导致统计数据过旧的，我暂不知道，如果有请后台告知。谢谢。

因此

可能我们需要考虑在空闲期，对那种频繁修改的大表进行统计数据进行评估，查看mysql.innodb_table_stats的last_update字段即可。对于长期没有更改的统计数据进行重点关注手动analyze table一下（注意在空闲期做最保险）。

测试（8.0.28）

session 1:

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.28    |
+-----------+

mysql> create table test(id int);
Query OK, 0 rows affected (0.34 sec)

mysql> insert into test values(10);
Query OK, 1 row affected (0.00 sec)

mysql> insert into test select * from test;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into test select * from test;
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> insert into test select * from test;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into test select * from test;
Query OK, 8 rows affected (0.00 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> insert into test select * from test;
Query OK, 16 rows affected (0.01 sec)
Records: 16  Duplicates: 0  Warnings: 0

mysql> insert into test select * from test;
Query OK, 32 rows affected (0.00 sec)
Records: 32  Duplicates: 0  Warnings: 0

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|       64 |
+----------+
1 row in set (0.01 sec)

session 2:

mysql> select * from innodb_table_stats where table_name='test' and database_name='testup';
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update         | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| testup        | test       | 2022-04-27 23:08:11 |     64 |                    1 |                        0 |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
1 row in set (0.00 sec)

session 1:

every 5 rows insert then flush tables; 

mysql> insert into test values(10),(10),(10),(10),(10);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> flush tables;
Query OK, 0 rows affected (0.01 sec)

....

mysql> insert into test values(10),(10),(10),(10),(10);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> flush tables;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test values(10),(10),(10),(10),(10);
Query OK, 5 rows affected (0.00 sec)
Records: 5  Duplicates: 0  Warnings: 0

mysql> flush tables;
Query OK, 0 rows affected (0.00 sec)

now we have 139 rows.

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|      139 |
+----------+
1 row in set (0.01 sec)

session 2:


mysql> select * from innodb_table_stats where table_name='test' and database_name='testup';
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| database_name | table_name | last_update         | n_rows | clustered_index_size | sum_of_other_index_sizes |
+---------------+------------+---------------------+--------+----------------------+--------------------------+
| testup        | test       | 2022-04-27 23:08:11 |     64 |                    1 |                        0 |
+---------------+------------+---------------------+--------+----------------------+--------------------------+

如此，我们看到虽然表有139行数据，但是统计数据只有64行，当然这完全是为了重现问题而已，实际上自然的淘汰也会导致这种线上，如上所述。

debug建议

如果需要debug建议以dict_stats_update函数的下面位置为起点。

        t = dict_stats_table_clone_create(table);

        dberr_t err = dict_stats_fetch_from_ps(t);

        t->stats_last_recalc = table->stats_last_recalc;
        t->stat_modified_counter = 0; //设置计数器为0

网友评论

本文标题：MySQL:优化器统计数据可能过旧

本文链接：https://www.haomeiwen.com/subject/pwqjyrtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！