美文网首页
一步一步从Linux线程到MySQL慢sql定位

一步一步从Linux线程到MySQL慢sql定位

作者: 若有所思11 | 来源:发表于2020-08-28 15:01 被阅读0次

    中午了,正在吃着黄焖鸡外卖,突然手机短信声音响了,一看一台mysql数据库服务器的cpu,IO使用率查过告警阀值了,都快到100%。赶紧放下黄焖鸡,快速登录到有问题的mysql数据库服务器。

    下面模拟一下当时排查的步骤,使用的mysql5.7.26版本
    用iostat看一下IO情况

    [mysql@localhost ~]$ iostat -mxt 1
    Linux 3.10.0-1062.9.1.el7.x86_64 (localhost.localdomain)        08/23/2020      _x86_64_        (1 CPU)
    
    08/23/2020 03:32:41 PM
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               5.49    0.00   78.02    6.59    0.00    9.89
    
    Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    sda               0.00     0.00 3460.44    0.00    54.07     0.00    32.00     0.89    0.26    0.26    0.00   0.26  89.45
    scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    dm-0              0.00     0.00 3460.44    0.00    54.07     0.00    32.00     0.90    0.26    0.26    0.00   0.26  89.56
    dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
    

    用top看看一下cpu使用情况,看看什么进程导致cpu高

    top - 15:37:21 up  8:50,  5 users,  load average: 0.45, 0.19, 0.12
    Tasks: 144 total,   1 running, 143 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 99.0 us,  1.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem :  2175376 total,   663640 free,   599256 used,   912480 buff/cache
    KiB Swap:  2097148 total,  2096884 free,      264 used.  1417460 avail Mem 
    
       PID USER      PR   NI    VIRT    RES        SHR S      %CPU %MEM     TIME+ COMMAND                                                                              
     40474 mysql     20   0 1266400 324264   9496 S    96.7 14.9   3:42.00 mysqld                                                                               
     40916 root      20   0  204492  12108   4304 S       2.7  0.6   1:22.59 iotop                                                                                
         1   root      20   0  125520   3716   2344 S  0.0  0.2   0:05.60 systemd                                                                              
         2   root      20   0       0      0      0 S  0.0  0.0   0:00.01 kthreadd                                                                             
         4   root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H                                                                         
         6  root      20   0       0      0      0 S  0.0  0.0   0:01.45 ksoftirqd/0                       
    

    果然是mysql进程导致cpu使用高,知道了是mysql数据库导致的,可是还是不知道什么原因,如果是新手,这个时候是不是有点慌,无从下手的感觉,如果是老手,肯定知道大概率是慢sql的问题,其它的原因概率比较低,例如mysql的bug,业务qps暴增等等。

    于是用show看看mysql线程情况

    mysql> show full processlist;
    +----+------+-----------+--------+---------+------+----------+-----------------------+
    | Id | User | Host      | db     | Command | Time | State    | Info                  |
    +----+------+-----------+--------+---------+------+----------+-----------------------+
    |  8 | root | localhost | sbtest | Query   |    0 | starting | show full processlist |
    | 10 | root | localhost | NULL   | Sleep   |   28 |          | NULL                  |
    +----+------+-----------+--------+---------+------+----------+-----------------------+
    2 rows in set (0.00 sec)
    

    在实际的生产上,结果可能有好几十,上百个结果,甚至有上千,要从这么多的线程里排查出导致cpu高的那个,简直是折磨人,这个不仅仅要经验足,还得对跑的业务非常熟悉,知道那些业务,那些大表可能会导致cpu高,你想想公司里有多少人满足这个条件,估计屈指可数。

    那么有没有更直接,更直观的工具进行精确定位呢,在mysql5.7以后的版本,performance_schema.threads这个性能视图中,就有linxu系统的的线程ID,有了这个ID,就把mysql和操作系统精密的关联在一起了,下面就带大家,一步一步从Linux线程到MySQL慢sql精确定位

    1.找到mysql服务的进程PID
    [mysql@localhost ~]$ ps -ef|grep -i mysqld|grep -v grep
    mysql     39463  39441  0 14:21 pts/0    00:00:00 /bin/sh /u02/mysql/bin/mysqld_safe --defaults-file=/u02/conf/my3308.cnf
    mysql     40474  39463  7 14:21 pts/0    00:07:57 /u02/mysql/bin/mysqld --defaults-file=/u02/conf/my3308.cnf --basedir=/u02/mysql --datadir=/u02/data/3308 --plugin-dir=/u02/mysql/lib/plugin --log-error=/u02/log/3308/error.log --open-files-limit=65535 --pid-file=/u02/run/3308/mysqld.pid --socket=/u02/run/3308/mysql.sock --port=3308
    

    在这里mysql的PID为:40474

    2.用top找到cpu消耗最大的线程ID
    [mysql@localhost ~]$ top -H -p 40474
    Threads:  33 total,   1 running,  32 sleeping,   0 stopped,   0 zombie
    %Cpu(s):100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem :  2175376 total,   670468 free,   592304 used,   912604 buff/cache
    KiB Swap:  2097148 total,  2096884 free,      264 used.  1424364 avail Mem 
    
       PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                              
     40624 mysql     20   0 1266400 324264   9496 R 99.3 14.9   5:14.97 mysqld                                                                               
     40474 mysql     20   0 1266400 324264   9496 S  0.0 14.9   0:05.29 mysqld                                                                               
     40475 mysql     20   0 1266400 324264   9496 S  0.0 14.9   0:00.00 mysqld                                                                               
     40476 mysql     20   0 1266400 324264   9496 S  0.0 14.9   0:00.00 mysqld                                                                               
     40477 mysql     20   0 1266400 324264   9496 S  0.0 14.9   0:00.00 mysqld 
    

    消耗cpu资源最高的线程PID为:40624

    3.根据系统的线程PID,精确定位慢sql

    连接mysql数据库,使用以下sql语句,进行慢sql的精确定位

    mysql> select PROCESSLIST_ID,THREAD_OS_ID,PROCESSLIST_USER,PROCESSLIST_HOST,PROCESSLIST_DB,PROCESSLIST_COMMAND,PROCESSLIST_INFO from performance_schema.threads where THREAD_OS_ID=40624;
    +----------------+--------------+------------------+------------------+----------------+---------------------+-------------------------------------------+
    | PROCESSLIST_ID | THREAD_OS_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB | PROCESSLIST_COMMAND | PROCESSLIST_INFO                          |
    +----------------+--------------+------------------+------------------+----------------+---------------------+-------------------------------------------+
    |              8 |        40624 | root             | localhost        | sbtest         | Query               | select count(*) from sbtest1 a ,sbtest1 b |
    +----------------+--------------+------------------+------------------+----------------+---------------------+-------------------------------------------+
    1 row in set (0.00 sec)
    

    “select count(*) from sbtest1 a ,sbtest1 b”这条sql语句就是导致cpu暴增的元凶,找到之后,该怎么办,当然是先快速恢复业务,将这个select语句的连接kill掉。

    [mysql@localhost ~]$ /u02/mysql/bin/mysql -uroot -proot --socket=/u02/run/3308/mysql.sock
    mysql: [Warning] Using a password on the command line interface can be insecure.
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 12
    Server version: 5.7.26-log Source distribution
    
    Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
    
    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.
    
    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
    mysql>kill 8;
    

    到这里,整个cpu使用率100%导致业务响应速度变慢,定位到慢sql,恢复业务的全过程,这个技能你get了吗

    相关文章

      网友评论

          本文标题:一步一步从Linux线程到MySQL慢sql定位

          本文链接:https://www.haomeiwen.com/subject/prelsktx.html