InnoDB存储引擎

作者: xiaolyuh | 来源:发表于2020-04-21 09:22 被阅读0次

InnoDB体系架构

后台线程分类

Master Thread

主要负责将缓冲池中的数据异步刷新到磁盘，保证数据的一致性。

Master Thread具有最高的线程优先级别。其内部由多个循环（loop）组成：主循环（loop）、后台循环（backgroup loop）、刷新循环（flush loop）、暂停循环（suspendloop）。Master Thread会根据数据库运行的状态在loop、background loop、flush loop和suspend loop中进行切换。

IO Thread

IO Thread的工作主要是负责AIO请求的回调（call back）处理。分为 write thread，read thread，insert buffer thread，log IO thread。

查看读写线程数：

 mysql> show variables like 'innodb_%io_threads';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| innodb_read_io_threads  | 4     |
| innodb_write_io_threads | 4     |
+-------------------------+-------+
2 rows in set

Purge Thread

事务被提交后，其所使用的undolog可能不再需要，因此需要PurgeThread来回收已经使用并分配的undo页。

Page Cleaner Thread

作用是将之前版本中脏页的刷新操作都放入到单独的线程中来完成。而其目的是为了减轻原Master Thread的工作及对于用户查询线程的阻塞，进一步提高InnoDB存储引擎的性能。

内存结构

InnoDB存储引擎的内存划分：

image.png

缓冲池

在数据库系统中，由于CPU速度与磁盘速度之间的鸿沟，基于磁盘的数据库系统通常使用缓冲池技术来提高数据库的整体性能。

缓冲池简单来说就是一块内存区域，通过内存的速度来弥补磁盘速度较慢对数据库性能的影响。主要分为：索引页（index page）、数据页（data page）、 undo页、插入缓冲池（insert buffer）、锁信息（lock info）、自适应哈希索引、数据字典信息。

在数据库中进行读取页的操作，首先将从磁盘读到的页存放在缓冲池中，这个过程称为将页“FIX”在缓冲池中。下一次再读相同的页时，首先判断该页是否在缓冲池中。若在缓冲池中，称该页在缓冲池中被命中，直接读取该页。否则，读取磁盘上的页。

对于数据库中页的修改操作，则首先修改在缓冲池中的页，然后再以一定的频率刷新到磁盘上。

查看缓冲池数量：

mysql> show variables like 'innodb_buffer_pool_instances';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| innodb_buffer_pool_instances | 1     |
+------------------------------+-------+
1 row in set

查看缓冲器大小：

mysql> show variables like 'innodb_buffer_pool_size';
+-------------------------+---------+
| Variable_name           | Value   |
+-------------------------+---------+
| innodb_buffer_pool_size | 8388608 |
+-------------------------+---------+
1 row in set

重做日志缓冲（redo log_buffer）

InnoDB存储引擎首先将重做日志信息先放入到这个缓冲区，然后按一定频率将其刷新到重做日志文件。

查看重做日志缓冲大小：

mysql> show variables like 'innodb_log_buffer_size';
+------------------------+---------+
| Variable_name     | Value  |
+------------------------+---------+
| innodb_log_buffer_size | 1048576 |
+------------------------+---------+
1 row in set

LRU List

数据库中的缓冲池是通过LRU（Latest Recent Used，最近最少使用）算法来进行管理的。在InnoDB存储引擎中，缓冲池中页的大小默认为16KB。

脏页

在LRU列表中的页被修改后，称该页为脏页（dirty page），即缓冲池中的页和磁盘上的页的数据产生了不一致。

Flush List

主要用来存放脏页数据，脏页既存在于LRU列表中，也存在于Flush列表中。LRU列表用来管理缓冲池中页的可用性，Flush列表用来管理将页刷新回磁盘，二者互不影响。

InnoDB关键特性

插入缓冲（Insert Buffer）

插入缓冲主要是在解决Innodb性能问题。

通常应用程序中行记录的插入顺序是按照主键递增的顺序进行插入的。因此，插入聚集索引（Primary Key）一般是顺序的，不需要磁盘的随机读取，性能比较好。

但是如果一张表中还有一个非唯一的辅助索引，在进行数据插入的时候，非聚集索引的叶子节点将不再是有序的了，这时就需要离散地访问非聚集索引页，由于随机读取的存在而导致了插入操作性能下降。为了解决这个问题Innodb引入了Insert Buffer。对于非聚集索引的插入或更新操作，不是每一次直接插入到索引页中，而是先判断插入的非聚集索引页是否在缓冲池中，若在，则直接插入；若不在，则先放入到一个Insert Buffer对象中，然后再以一定的频率和情况进行Insert Buffer和辅助索引页子节点的merge（合并）操作，这时通常能将多个插入合并到一个操作中（因为在一个索引页中），这就大大提高了对于非聚集索引插入的性能，Insert Buffer 使用的是 B+Tree数据结构来实现。

使用条件

Insert Buffer的使用需要同时满足以下两个条件：

索引是辅助索引（secondary index）；
索引不是唯一（unique）的。

适用场景

Insert Buffer的使用场景，即非唯一辅助索引的插入操作。

两次写（Double Write）

两次写主要是在解决Innodb存储引擎数据页的可靠性。

在应用（apply）重做日志前，用户需要一个页的副本，当写入失效发生时，先通过页的副本来还原该页，再进行重做，这就是doublewrite，主要目的是防止在使用重做日志恢复数据之前，该页数据已经发生了损坏的情况。

自适应哈希索引（Adaptive Hash Index）

InnoDB是支持哈希索引的，但是我们在创建索引的时候是无法直接创建Hash索引的，Hash索引必须由优化器来决定是否创建。

InnoDB存储引擎会监控对表上各索引页的查询。如果观察到建立哈希索引可以带来速度提升，则建立哈希索引，称之为自适应哈希索引（Adaptive Hash Index，AHI）。

异步IO（Async IO）

用户可以在发出一个IO请求后立即再发出另一个IO请求，当全部IO请求发送完毕后，等待所有IO操作的完成，这就是AIO。

AIO的优势

任务异步化；
可以进行IO Merge；

刷新邻接页（Flush Neighbor Page）

当刷新一个脏页时，InnoDB存储引擎会检测该页所在区（extent）的所有页，如果是脏页，那么一起进行刷新。这样做的好处显而易见，通过AIO可以将多个IO写入操作合并为一个IO操作，故该工作机制在传统机械磁盘下有着显著的优势。

Innodb状态日志

查看innodb状态命令：

mysql> show engine innodb status;
+--------+------+------------------------+
| Type   | Name | Status                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+--------+------+------------------------+
| InnoDB |      |
=====================================
2020-04-20 13:53:40 0x1d70 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 49 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1 srv_active, 0 srv_shutdown, 14861 srv_idle
srv_master_thread log flush and writes: 14862
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 10
OS WAIT ARRAY INFO: signal count 10
RW-shared spins 0, rounds 4, OS waits 2
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 4.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 27907
Purge done for trx's n:o < 0 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 283762304370480, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
394 OS file reads, 54 OS file writes, 7 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 203, seg size 205, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 2267, node heap has 1 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 1 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 2885896575
Log flushed up to   2885896575
Pages flushed up to 2885896575
Last checkpoint at  2885896566
0 pending log flushes, 0 pending chkp writes
10 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 8585216
Dictionary memory allocated 935461
Buffer pool size   512
Free buffers       256
Database pages     254
Old database pages 0
Modified db pages  0
Pending reads      0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 364, created 35, written 37
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 254, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=17188, Main thread ID=23600, state: sleeping
Number of rows inserted 0, updated 0, deleted 0, read 8
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
 |
+--------+------+---------------------------+
1 row in set

执行命令show engine innodb status显示的不是当前的状态，而是过去某个时间范围内InnoDB存储引擎的状态。从上面的例子可以发现，Per second averages calculated from the last 49 seconds代表的信息为过去49秒内的数据库状态。

BACKGROUND THREAD

主线程的执行情况：

BACKGROUND THREAD
-----------------
srv_master_thread loops: 1 srv_active, 0 srv_shutdown, 14861 srv_idle
srv_master_thread log flush and writes: 14862
-----------------

参数	说明
srv_master_thread loops	表示Master线程的循环次数，master线程在每次loop过程中都会sleep，sleep的时间为1秒。而在每次loop的过程中会选择active、shutdown、idle中一种状态执行。Master线程在不停循环，所以其值是随时间递增的。
srv_active	Master线程选择的active状态执行。Active数量增加与数据表、数据库更新操作有关，与查询无关，例如：插入数据、更新数据、修改表等。
srv_shutdown	这个参数的值一直为0，因为srv_shutdown只有在mysql服务关闭的时候才会增加。
srv_idle	这个参数是在master线程空闲的时候增加，即没有任何数据库改动操作时。
srv_master_thread log flush and writes	Master线程在后台会定期刷新日志，日志刷新是由参数innodb_flush_log_at_timeout参数控制前后刷新时间差。

SEMAPHORES

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 10
OS WAIT ARRAY INFO: signal count 10
RW-shared spins 0, rounds 4, OS waits 2
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 4.00 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------

Semphores输出内容可以分成两段，一部分是当前的等待，这部分只是包含了在高并发环境下的全部记录(这些记录不包括sleep的)。第二部分是事件统计。相比系统等待，自旋锁的开销较小，但是它是活跃的等待，会浪费CPU资源，如果有大量的自旋等待和自旋轮转，则会浪费大量的CPU资源。

参数	说明
OS WAIT ARRAY INFO	系统等待队列信息
Reservation count	线程尝试访问os wait array的次数，大于等于线程进入os wait状态的线程数(因为尝试放入os wait array可能不成功，不成功的时候reservation count也会++)。
Signal count	线程被唤醒的次数，进入os wait的线程，在占用资源线程释放mutex的时候会通过signal唤醒等待线程。
Mutex spin wait	线程自旋等待次数，线程在获取mutex过程中如果没有获取到mutex，则首先进入自旋状态，这个时候mutex spin wait值++。
Mutex Rounds	进入spin wait的线程，通过不断循环来等待获取mutex，循环的次数称为round(理解可以参照图 4)。
Mutex Os wait	线程进入系统等待的次数，现在在获取mutex过程中，如果没有在第一时间内获取，则进入自旋，自旋达到设定的时间需求后依旧没能获取到mutex，这个时候线程进入系统等待。Os wait的值增加。
Spin rounds per waits mutex	每次mutex自旋等待中round的次数，值＝mutex rounds/mutex spin wait
Spin rounds per waits rw-share(读锁)	每次rw-sahre自旋等待中round的次数，值＝rw-share rounds/rw-share spin wait
Spin rounds per waits rw-excl(写锁)	每次rw-excl自旋等待中round的次数,值＝rw-excl rounds/rw-excl spin wait