美文网首页
搞事情:MySQL性能问题两头堵

搞事情:MySQL性能问题两头堵

作者: HowUger | 来源:发表于2020-01-20 21:10 被阅读0次

    尴尴尬尬

    作为一个菜鸡数据库运维(AKA:删库大魔头),MySQL性能问题几乎成了我们最恐惧的梦魇。无论是zabbix还是prometheus+grafana,即便加载了再多percona的“套娃”,监控数据依然只是个五彩斑斓但看着却非常闹心的趋势图,程式化的性能指标真很难帮助我们准确得找出问题根因。而面对MySQL性能问题,如何快速有效的定位和分析几乎成为了每一位菜鸡绕不过的挑战。

    简单粗暴且有效

    面对挑战,我们需要的应该是“一把40米长的大刀”......虽然这话听起来有些戏谑,但真实的情况就是我们正需要这种简单粗暴且有效的方法论,而我喜欢把这种方法论称为:两头堵
    即从系统进程入手,尝试分析找到问题症结。

    先堵:系统

    系统层面的分析思路是基于进程调用栈的信息分析,这个领域大家肯定听说过火焰图
    作为一只菜鸡中的战斗鸡,火焰图有这么几个知识点:

    啥原理

    1. 硬件中的PMU单元(performance monitor unit)
      瞎解释:PMU可以简单理解为硬件中的一个“事件计数阀门”,允许软件针对某类硬件事件设置阀门数值,随后处理器开始对该类事件计数。当计数值超过预制数值后,阀门关闭产生中断。
      举板栗: cache miss 达到某个值后,PMU 便能产生相应的中断。捕获这些中断,就可以考察程序对这些硬件特性的利用效率。

    2. Tracepoint
      瞎解释:即在内核源代码中的一些 hook,在特定的代码被运行时触发,这一特性可以被各种 trace或debug 工具所利用。

    用啥整

    大概都知道Perf

    怎么整

    Perf的使用说明、指导文档、安装指南一类的东西网上已经有太多太多了,此处不再赘述。
    有道是:内事不决问度娘,外事不决问Go爹!

    重点看“平头”

    即看火焰图顶层的哪个函数占据的宽度最大,这就是问题所在。而常见的火焰图类型包括On-CPU、Off-CPU和Memory等等。

    1. On-CPU
      cpu占用过高,执行中的时间通常又分为用户态时间user和系统态时间sys。
    2. Off-CPU
      cpu过低,利用率不高。等待下一轮CPU,或者等待I/O、锁、换页等等。
    3. Memory
      程序如果出现内存泄漏,同样也可以使用内存级别火焰图快速分析问题。

    上个满大街都是的示图

    CPU-MySQL-Crop-500

    二堵:进程

    这个层面的分析思路其实就是基于MySQL的资源消耗分布展开的,从多个角度分析可能存在的性能瓶颈。

    CPU

    大消耗场景
    大多数MySQL进程CPU消耗较大都是因为慢事务造成的,具体场景分为:

    • SQL扫全表
    • SQL扫描数据量过大
    • 内存排序
    • 磁盘排序
    • 锁争用等待
    • 等待磁盘IO

    SQL执行状态
    短平快的通过Show Processlist查看MySQL中正在执行的SQL,通过SQL执行状态值即可初步识别问题SQL

    • Sending data:长期处于此状态,可能是查询未命中有效索引,导致SQL执行慢得一批
    • Copying to tmp table:长期处于此状态,可能是临时结果集过大
    • Copying to tmp table to disk:而出现此状态时,说明临时结果集过大,并超出临时内存大小已经需要落盘
    • Sorting resule:长期处于此状态,说明排序操作会引起过多资源消耗
    • Locked:这个如果我也要解释,就只能证明我脑残的认为其他人也脑残
    • Using filesort:同Sorting result

    SQL定位
    以上的操作只是简单的缩小问题SQL范围,而最终在面对大并发排序至CPU狂飙的场景时,可以通过top sql快速定位耗时和消耗分布,进而为问题定位找到有力证据

    • 统计耗时最长的SQL Top10
    13:16:21  HowUger@HowUger_MySQL_t01:mysql> SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL
                                              _TEXT FROM performance_schema.events_statements_history_long order by
                                              Duration desc limit 10;
    
    
    • 根据event_id查看SQL的具体耗时阶段
    13:16:47 HowUger@HowUger_MySQL_t01:mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Du
                                              ration FROM performance_schema.events_stages_history_long WHERE NESTIN
                                              G_EVENT_ID=1576;
    +--------------------------------+----------+
    | Stage                          | Duration |
    +--------------------------------+----------+
    | stage/sql/starting             | 0.000097 |
    | stage/sql/checking permissions | 0.000014 |
    | stage/sql/Opening tables       | 0.000019 |
    | stage/sql/init                 | 0.000026 |
    | stage/sql/System lock          | 0.000009 |
    | stage/sql/optimizing           | 0.000006 |
    | stage/sql/statistics           | 0.000014 |
    | stage/sql/preparing            | 0.000012 |
    | stage/sql/Sorting result       | 0.000007 |
    | stage/sql/executing            | 0.000005 |
    | stage/sql/Sending data         | 0.000009 |
    | stage/sql/Creating sort index  | 0.028948 |
    | stage/sql/end                  | 0.000005 |
    | stage/sql/query end            | 0.000006 |
    | stage/sql/closing tables       | 0.000008 |
    | stage/sql/freeing items        | 0.000019 |
    | stage/sql/cleaning up          | 0.000001 |
    +--------------------------------+----------+
    
    • 统计排序耗时SQL Top10
    13:25:22 HowUger@HowUger_MySQL_t01:mysql> SELECT DIGEST_TEXT, COUNT_STAR,SUM_SORT_ROWS,TRUNCATE(AVG_TIMER_WAIT/1
                                              000000000000,6) as AVG_TIMER_WAIT,SUM_ROWS_EXAMINED,SUM_CREATED_TMP_TA
                                              BLES,SUM_CREATED_TMP_DISK_TABLES,SUM_ROWS_SENT,FIRST_SEEN, LAST_SEEN F
                                              ROM performance_schema.events_statements_summary_by_digest ORDER BY SU
                                              M_SORT_ROWS DESC limit 10 \G
    ***************************[ 1. row ]***************************
    DIGEST_TEXT                 | SELECT TABLE_NAME , COLUMN_NAME FROM `information_schema` . `columns` WHERE `table_schema` = ? ORDER BY TABLE_NAME , `ordinal_position`
    COUNT_STAR                  | 5
    SUM_SORT_ROWS               | 1430
    AVG_TIMER_WAIT              | 0.001987
    SUM_ROWS_EXAMINED           | 2860
    SUM_CREATED_TMP_TABLES      | 5
    SUM_CREATED_TMP_DISK_TABLES | 5
    SUM_ROWS_SENT               | 1430
    FIRST_SEEN                  | 2020-01-17 14:23:56
    LAST_SEEN                   | 2020-01-20 14:44:44
    ...
    

    Memory

    初入行时经常遇到的问题是:MySQL进程OOM,然后被系统Kill了
    这对于新人菜鸡来说真的很不友好,而且我们往往又没有什么定位思路。

    随着工作逐步深入,我们会发现:MySQL的驻留内存问题其实主要集中在下面两个场景中:

    • 独享线程内存
    • 全局共享内存

    作为高级玩家的你看到这里可能会说:这还用你说!

    是的,MySQL的驻留内存其实就是由这两部分组成。

    Mysql Server Memory Usage= Sum of Global Buffers + (number of Connection * Per thread memory variables)

    独享线程内存
    先看一看独享线程内存相关的内存参数配置,有以下这么几个:

    thread_stack
    sort_buffer_size
    join_buffer_size
    read_buffer_size
    read_rnd_buffer_size
    net_buffer_length
    bulk_insert_buffer_size
    tmp_table_size
    ...

    面对内存高耗场景,我们可以使用下面的SQL快速找出哪条线程在搞事情:

    • 统计线程的内存消耗
    12:30:05 HowUger@HowUger_MySQL_t01:sys> select b.thd_id, b.user, current_count_used,current_allocated, current_avg_alloc, current_max_alloc,total_allocated,current_statement from memory_by_thread_by_current
                                            _bytes a,session b where a.thread_id = b.thd_id limit 5;
    +--------+--------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+-------------------------------------------------------------------+
    | thd_id | user                     | current_count_used | current_allocated | current_avg_alloc | current_max_alloc | total_allocated | current_statement                                                 |
    +--------+--------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+-------------------------------------------------------------------+
    | 61     | test_user@192.168.11.234 | 0                  | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         | <null>                                                            |
    | 91     | test_user@192.168.11.228 | 0                  | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         | <null>                                                            |
    | 157    | HowUger@HowUger_Cli_vm01 | 0                  | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         | select b.thd_id, b.user, curre ... a.thread_id = b.thd_id limit 5 |
    | 26     | sql/event_scheduler      | 0                  | 0 bytes           | 0 bytes           | 0 bytes           | 0 bytes         | <null>                                                            |
    +--------+--------------------------+--------------------+-------------------+-------------------+-------------------+-----------------+-------------------------------------------------------------------+
    
    • 一个隐藏的配置坑:net_buffer_length
      这个值是一个根据需求可动态调整的配置,最大可到为max_allowed_packet。所以,这就是为什么我们总是强调SQL的结果集要尽量小,同时别把max_allowed_packet配置的过大。
      因为你真的会遇到,一个线程吃掉max_allowed_packet等值大小内存的情况。

    全局共享内存
    关于全局共享内存,其实可以罗列很多很多内存参数配置,例如

    thread_cache_size
    table_open_cache
    table_definition_cache
    binlog_cache_size
    innodb_log_buffer_size
    innodb_buffer_pool_size
    innodb_additional_mem_pool_size
    back_log
    ...

    但是,这部分最主要的内存消耗是在:innodb_buffer_pool_size!

    MySQL 的 Buffer Pool机制是比较特别的,通常情况下Buffer Pool驻留内存会逐渐增大,且最终无限接近配置文件中的innodb_buffer_pool_size大小。

    即使我们找到了占用buffer较多的表,并且TrunCate了它们,但是被吃掉的Buffer Pool也很难被吐出来(淡然5.7以后可以动态调整这个倒霉参数了)。

    • 统计Buffer Pool内存占用表Top10**
    12:35:03 HowUger@HowUger_MySQL_t01:sys> select * from innodb_buffer_stats_by_table order by pages desc limit 10;
    +---------------+--------------------+------------+------------+-------+--------------+-----------+-------------+
    | object_schema | object_name        | allocated  | data       | pages | pages_hashed | pages_old | rows_cached |
    +---------------+--------------------+------------+------------+-------+--------------+-----------+-------------+
    | InnoDB System | SYS_TABLES         | 45.66 MiB  | 42.14 MiB  | 2922  | 0            | 495       | 146593      |
    | testserv      | variable           | 960.00 KiB | 823.27 KiB | 60    | 0            | 60        | 3181        |
    | testserv      | testcase           | 928.00 KiB | 599.99 KiB | 58    | 0            | 25        | 1560        |
    | mysql         | help_topic         | 496.00 KiB | 392.08 KiB | 31    | 0            | 31        | 515         |
    | mysql         | help_keyword       | 144.00 KiB | 80.22 KiB  | 9     | 0            | 9         | 512         |
    | mysql         | innodb_index_stats | 128.00 KiB | 61.33 KiB  | 8     | 0            | 8         | 569         |
    | InnoDB System | SYS_COLUMNS        | 96.00 KiB  | 52.12 KiB  | 6     | 0            | 5         | 804         |
    | mysql         | help_relation      | 80.00 KiB  | 35.44 KiB  | 5     | 0            | 5         | 1397        |
    | testserv      | t_generallock      | 48.00 KiB  | 419 bytes  | 3     | 0            | 3         | 6           |
    | InnoDB System | SYS_FOREIGN        | 32.00 KiB  | 0 bytes    | 2     | 0            | 2         | 0           |
    +---------------+--------------------+------------+------------+-------+--------------+-----------+-------------+
    

    如您所见所想,最好的控制就是量身定制的buffer pool size,不大不小刚刚好。

    IO

    对于MySQL的IO性能,主要由每张表的访问频率和具体读写数据组成。

    MySQL内的逻辑IO请求
    即统计业务在对应表上的访问频率

    • 上SQL
    13:15:58 HowUger@HowUger_MySQL_t01:performance_schema> SELECT object_schema AS tb_schema,            object_name AS table_name,            count_star AS rows_io_total,            count_read AS rows_read,
                                                                    count_write AS rows_write,            count_fetch AS rows_fetchs,            count_insert AS rows_inserts,            count_update AS rows_upd
                                                           ates,            count_delete AS rows_deletes,             CONCAT(ROUND(sum_timer_fetch / 3600000000000000, 2), 'h') AS fetch_latency,             CONC
                                                           AT(ROUND(sum_timer_insert / 3600000000000000, 2), 'h') AS insert_latency,             CONCAT(ROUND(sum_timer_update / 3600000000000000, 2), 'h') AS upd
                                                           ate_latency,             CONCAT(ROUND(sum_timer_delete / 3600000000000000, 2), 'h') AS delete_latency     FROM table_io_waits_summary_by_table        O
                                                           RDER BY sum_timer_wait DESC limit 10 ;
    +-----------+-------------------+---------------+-----------+------------+-------------+--------------+--------------+--------------+---------------+----------------+----------------+----------------+
    | tb_schema | table_name        | rows_io_total | rows_read | rows_write | rows_fetchs | rows_inserts | rows_updates | rows_deletes | fetch_latency | insert_latency | update_latency | delete_latency |
    +-----------+-------------------+---------------+-----------+------------+-------------+--------------+--------------+--------------+---------------+----------------+----------------+----------------+
    | testserv  | testcase          | 6015          | 6014      | 1          | 6014        | 0            | 1            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | testserv  | variable          | 13718         | 13603     | 115        | 13603       | 0            | 115          | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | testserv  | aw_name_select    | 45105         | 45105     | 0          | 45105       | 0            | 0            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | testserv  | t_generallock     | 8             | 7         | 1          | 7           | 1            | 0            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | sys       | sys_config        | 1             | 1         | 0          | 1           | 0            | 0            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | testserv  | aw_instance_back  | 0             | 0         | 0          | 0           | 0            | 0            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | testserv  | aw_instance_new   | 0             | 0         | 0          | 0           | 0            | 0            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | test_1220 | t_test01          | 0             | 0         | 0          | 0           | 0            | 0            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | testserv  | aw_lib_repository | 0             | 0         | 0          | 0           | 0            | 0            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    | testserv  | basic_aw_back     | 0             | 0         | 0          | 0           | 0            | 0            | 0            | 0.00h         | 0.00h          | 0.00h          | 0.00h          |
    +-----------+-------------------+---------------+-----------+------------+-------------+--------------+--------------+--------------+---------------+----------------+----------------+----------------+
    
    

    MySQL内的物理IO请求
    即统计MySQL相关物理文件的实际IO读写情况

    • 上SQL
    13:16:13 HowUger@HowUger_MySQL_t01:sys> select * from io_global_by_file_by_bytes limit 10;
    +----------------------------------------+------------+------------+-----------+-------------+---------------+-----------+------------+-----------+
    | file                                   | count_read | total_read | avg_read  | count_write | total_written | avg_write | total      | write_pct |
    +----------------------------------------+------------+------------+-----------+-------------+---------------+-----------+------------+-----------+
    | @@datadir/master-bin.000003            | 39115      | 764.94 MiB | 20.03 KiB | 1           | 1 bytes       | 1 bytes   | 764.94 MiB | 0.00      |
    | @@datadir/ibtmp1                       | 0          | 0 bytes    | 0 bytes   | 7232        | 124.81 MiB    | 17.67 KiB | 124.81 MiB | 100.00    |
    | @@datadir/ibdata1                      | 328        | 7.14 MiB   | 22.29 KiB | 446         | 13.28 MiB     | 30.49 KiB | 20.42 MiB  | 65.03     |
    | @@datadir/mysql/proc.MYD               | 3180       | 9.35 MiB   | 3.01 KiB  | 0           | 0 bytes       | 0 bytes   | 9.35 MiB   | 0.00      |
    | @@datadir/testserv/aw_name_select.ibd  | 166        | 2.64 MiB   | 16.29 KiB | 4           | 64.00 KiB     | 16.00 KiB | 2.70 MiB   | 2.31      |
    | @@datadir/testserv/variable.ibd        | 63         | 1.03 MiB   | 16.76 KiB | 52          | 832.00 KiB    | 16.00 KiB | 1.84 MiB   | 44.07     |
    | @@datadir/testserv/testcase.ibd        | 61         | 1.00 MiB   | 16.79 KiB | 1           | 16.00 KiB     | 16.00 KiB | 1.02 MiB   | 1.54      |
    | @@datadir/ib_logfile0                  | 7          | 68.50 KiB  | 9.79 KiB  | 267         | 600.00 KiB    | 2.25 KiB  | 668.50 KiB | 89.75     |
    | @@datadir/mysql/help_topic.ibd         | 38         | 656.00 KiB | 17.26 KiB | 0           | 0 bytes       | 0 bytes   | 656.00 KiB | 0.00      |
    | @@datadir/mysql/help_keyword.ibd       | 13         | 256.00 KiB | 19.69 KiB | 0           | 0 bytes       | 0 bytes   | 256.00 KiB | 0.00      |
    +----------------------------------------+------------+------------+-----------+-------------+---------------+-----------+------------+-----------+
    

    当访问频率与物理文件读写量的统计信息结合后,找到IO问题点也就不难了。

    挖坑学习两不误

    关于MySQL内存,可学习的东西还有很多。而从源码角度解读,将有助于菜鸡更准确的理解MySQL的内存分配机制。

    相关文章

      网友评论

          本文标题:搞事情:MySQL性能问题两头堵

          本文链接:https://www.haomeiwen.com/subject/eaqrzctx.html