MySQL刷脏页造成查询性能抖动

作者: 肥兔子爱豆畜子 | 来源:发表于2021-12-23 10:06 被阅读0次

MySQL更新数据，更新内存buffer pool里的数据，然后写redo log，之后就返回了。这样在异步线程把内存里的数据刷到磁盘之前，这个内存页与磁盘上的数据存在不一致，称为脏页。

刷脏页主要分4种时机：

数据库正常关闭，需要把内存里的脏页都同步到磁盘。
MySQL判断当前系统处于空闲状态，会用异步线程刷脏页。
redo log空间不够了，这时候数据库写都没法进行，需要把一部分脏页同步到磁盘，这样便可以释放redo log空间。
Buffer Pool的内存页空间不够了，需要置换一部分内存页出去，按照LRU算法，如果是干净页则直接清掉，但如果是脏页则需要先把数据同步到磁盘才行。

其中3、4两种情况如果发生在查询繁忙时间，由于刷脏页比较占用io资源，对业务查询会造成比较大的影响，反映到现象上就是平时一些比较正常的查询很快，但不时地会发生查询缓慢的“性能抖动”现象。

应对方法

设置InnoDB后台线程每秒可用I/O操作数

首先需要让InnoDB引擎知晓当前数据库所在主机的io能力，也就是主机的IOPS到底是多少，数据库参数是：

mysql> show variables like '%innodb_io_capacity%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| innodb_io_capacity     | 200   |
| innodb_io_capacity_max | 2000  |
+------------------------+-------+
2 rows in set (0.09 sec)

然后测试主机的IOPS，用fio工具：

fio -filename=/data/iotest -direct=1 -iodepth 1 -thread -rw=randrw -ioengine=psync -bs=16k -size=500M -numjobs=10 -runtime=10 -group_reporting -name=mytest

[root@VM_0_11_centos ~]# fio --version
fio-3.7

参数解释：fio --cmd看原文

filename=/dev/sdb1 测试文件名称，通常选择需要测试的盘的data目录。
direct=1 测试过程绕过机器自带的buffer。使测试结果更真实。
-iodepth 1 io队列深度 Number of IO buffers to keep in flight
-thread Use threads instead of processes
-rw=randrw 测试随机写和读的I/O
-ioengine=psync io引擎使用psync方式
-bs=16k 单次io的块文件大小为16k
-size=500M 本次的测试文件大小为500M
-numjobs=10 本次的测试线程为10
-runtime=10 测试时间为10秒，如果不写则一直将500M文件分14k每次写完为止
-group_reporting 关于显示结果的，汇总每个进程的信息
-name=mytest

测试结果：

[root@VM_0_11_centos /]# fio -filename=/data/iotest -direct=1 -iodepth 1 -thread -rw=randrw -ioengine=psync -bs=16k -size=500M -numjobs=10 -runtime=10 -group_reporting -name=mytest
mytest: (g=0): rw=randrw, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.7
Starting 10 threads
mytest: Laying out IO file (1 file / 500MiB)
Jobs: 10 (f=10): [m(10)][100.0%][r=11.9MiB/s,w=12.2MiB/s][r=762,w=777 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=30934: Wed Dec 22 21:10:36 2021
   read: IOPS=763, BW=11.9MiB/s (12.5MB/s)(119MiB/10006msec)
    clat (usec): min=344, max=49611, avg=6088.99, stdev=7312.79
     lat (usec): min=344, max=49612, avg=6090.03, stdev=7312.78
    clat percentiles (usec):
     |  1.00th=[  400],  5.00th=[  445], 10.00th=[  474], 20.00th=[  515],
     | 30.00th=[  562], 40.00th=[  619], 50.00th=[  979], 60.00th=[ 6194],
     | 70.00th=[ 9110], 80.00th=[11863], 90.00th=[16188], 95.00th=[20317],
     | 99.00th=[29754], 99.50th=[34866], 99.90th=[43254], 99.95th=[47449],
     | 99.99th=[49546]
   bw (  KiB/s): min=  928, max= 1536, per=10.00%, avg=1221.28, stdev=154.68, samples=199
   iops        : min=   58, max=   96, avg=76.28, stdev= 9.66, samples=199
  write: IOPS=792, BW=12.4MiB/s (12.0MB/s)(124MiB/10006msec)
    clat (usec): min=659, max=51786, avg=6740.67, stdev=7323.13
     lat (usec): min=660, max=51788, avg=6742.18, stdev=7323.09
    clat percentiles (usec):
     |  1.00th=[  758],  5.00th=[  824], 10.00th=[  881], 20.00th=[  963],
     | 30.00th=[ 1057], 40.00th=[ 1205], 50.00th=[ 3163], 60.00th=[ 6980],
     | 70.00th=[ 9634], 80.00th=[12649], 90.00th=[16909], 95.00th=[20841],
     | 99.00th=[30278], 99.50th=[35390], 99.90th=[41157], 99.95th=[45876],
     | 99.99th=[51643]
   bw (  KiB/s): min=  608, max= 2176, per=10.00%, avg=1267.11, stdev=294.57, samples=199
   iops        : min=   38, max=  136, avg=79.15, stdev=18.41, samples=199
  lat (usec)   : 500=8.13%, 750=15.78%, 1000=13.08%
  lat (msec)   : 2=12.66%, 4=3.46%, 10=19.34%, 20=21.87%, 50=5.69%
  lat (msec)   : 100=0.01%
  cpu          : usr=0.11%, sys=0.34%, ctx=27231, majf=0, minf=6
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=7637,7925,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=11.9MiB/s (12.5MB/s), 11.9MiB/s-11.9MiB/s (12.5MB/s-12.5MB/s), io=119MiB (125MB), run=10006-10006msec
  WRITE: bw=12.4MiB/s (12.0MB/s), 12.4MiB/s-12.4MiB/s (12.0MB/s-12.0MB/s), io=124MiB (130MB), run=10006-10006msec

Disk stats (read/write):
  vda: ios=8681/7804, merge=0/75, ticks=11308/9649, in_queue=20850, util=96.29%

测试结果上来看读写的IOPS，单个线程平均在75左右。我是单核机器，所以理论上开1个io线程的话，IOPS也就这个数，这么来看MySQL上面的设置确实有些偏大了。

调整一下参数：

set global innodb_io_capacity = 50;
set global innodb_io_capacity_max = 100;

正确设置这个参数，设置的过低、比如用了SSD盘结果设置个1、200，InnoDB引擎认为主机io能力就这么低，刷脏页会比较慢的刷，脏页堆积，写入速度就慢TPS低下。如果设置的过高，io资源全都用在刷脏页上了，正常的请求查询等得不到足够的io资源造成请求缓慢。

后台I/O任务线程个数

另外，用来刷脏页的后台异步io线程个数也可以指定：

mysql> show variables like '%innodb_write_io_threads%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| innodb_write_io_threads | 4     |
+-------------------------+-------+
1 row in set (0.00 sec)

说实话对于俺的1核心的机器来说设置4个线程有点多了。设置1个就好了。

脏页比例与redo log写入速度

查看当前脏页和所有内存页个数：

select VARIABLE_VALUE  from performance_schema.global_status where VARIABLE_NAME = 'Innodb_buffer_pool_pages_dirty';
select VARIABLE_VALUE  from performance_schema.global_status where VARIABLE_NAME = 'Innodb_buffer_pool_pages_total';

InnoDB默认允许最大脏页比例：

mysql> show variables like '%innodb_max_dirty_pages_pct%';
+--------------------------------+-----------+
| Variable_name                  | Value     |
+--------------------------------+-----------+
| innodb_max_dirty_pages_pct     | 75.000000 |
| innodb_max_dirty_pages_pct_lwm | 0.000000  |
+--------------------------------+-----------+
2 rows in set (0.00 sec)

除了这个比例之外，存储引擎还会根据redo log当前的写入速度来估算刷脏页的速度。

大致是这样一个关系：刷脏页速度 = innodb_io_capacity * max(脏页比例， redo log写入速度)

所以平时要注意脏页的比例不要大于75%，如果比例比较高，那要分析是redo log空间不够了，还是内存不够了，刷脏页的速度又没可能再提高一下等等（增加线程或提高IOPS标定）。

脏页的邻居页一起刷的策略

mysql> show variables like '%innodb_flush_neighbors%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| innodb_flush_neighbors | 1     |
+------------------------+-------+
1 row in set (0.00 sec)

innodb_flush_neighbors这个参数设置为1的时候，刷脏页的时候会检查旁边的是不是也是脏页如果是的话就一起刷掉，并且会一直这么连带检查，邻居的邻居也是脏页也刷，在检查邻居的邻居的邻居。。。连锁反应

如果是机械硬盘，那么这样相邻的脏页连续刷是顺序磁盘io，减少了随机io，对性能有提升；

但如果是SSD硬盘的话，往往IOPS不是瓶颈，建议关闭这个参数、设置为0，只刷自己而更快的完成刷脏页操作，可以提升SQL的响应速度。

总结

利用 WAL （write-ahead logging）技术，数据库将随机写转换成了顺序写，大大提升了数据库的性能。但是，由此也带来了内存脏页的问题。脏页会被后台线程自动 flush，也会由于数据页淘汰而触发 flush，而刷脏页的过程由于会占用资源，可能会让你的更新和查询语句的响应时间长一些。

网友评论

本文标题：MySQL刷脏页造成查询性能抖动

本文链接：https://www.haomeiwen.com/subject/qclhqrtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！