MySQL Online DDL实现分析

作者: 真之棒2016 | 来源:发表于2020-04-30 22:28 被阅读0次

MySQL Online DDL实现分析
mysql online ddl
MySQL online DDL测试
mysql ddl
Mysql Online DDL 和 pt-ost 、gh-os
MySQL使用pt-ost执行ddl语句
MySQL---在线DDL工具pt-osc
MySQL 5.6 Online DDL.md
mysql 事务回滚，online ddl
MySQL & MariaDB Online DDL 参考指南

author：sufei
源码版本：8.0.18

说明：如果对于online ddl还不了解可以参考

一、Online DDL简介

1.1 Online DDL划分

在mysql 8.0上，对于Online DDL的讨论主要从两个角度进行了分类讨论，一个是通过加锁范围来区分不同ddl与dml的并发程度；另一个根据是否拷贝数据来划分不同的执行逻辑。

锁与并发度划分

先说一下与DML语句的并发度方面来说明一下DDL语句的分类，其主要分为下面几类，可以在ddl语句中通过LOCK关键字来指定DDL期间加锁程度。其可选择的值如下：

值	含义
NONE	允许并发查询和DML
SHARED	允许并发查询，但阻塞DML
DEFAULT	由数据库决定选择最大并发的模式，指定该类型与不指定LOCK关键字含义相同
EXCLUSIVE	阻塞查询和DML

默认的情况下，MySQL在执行DDL操作期间尽可能少的使用锁，以提高并发。当然也可以通过LOCK子句，来指定更加严格的锁。但是，如果LOCK子句指定的锁定级别低于特定DDL操作所允许的限制级别，则语句将失败，并出现错误。

否拷贝数据划分

另一种划分方式为是否拷贝数据，通过ALGORITHM关键字进行指定，值有如下几种：

值	含义
COPY	采用拷表方式进行表变更，该过程中不允许并发DML
INPLACE	该模式避免进行表的拷贝，而是在让引擎层就地重新生成表，也就是仅需要进行引擎层数据改动，不涉及Server层。在操作的准备和执行阶段，表上的排他元数据锁可能会被短暂地占用。通常，支持并发DML。
INSTANT	该操作仅仅修改元数据。在准备和执行期间，表上没有独占的元数据锁，并且表数据不受影响，因此操作是即时的。允许并发DML。目前仅支持在表最后增加新列；
DEFAULT	系统决定，选择最优的算法执行DDL

如果没有指定ALGORITHM子句，系统决定，选择最优的算法执行DDL。用户可以选用上述算法来执行，但本身收到DDL类型限制，如果指定的算法无法执行DDL，则ALTER操作会报错。

1.2 执行流程

Online DDL执行过程可以分为三个阶段：

初始化阶段

在初始化阶段，主要检测选择获得最优的LOCK和ALGORITHM设置，如果指定的LOCK和ALGORITHM选项不满足则报错，在该阶段主要持有的是可升级的元数据锁以保护当前的表定义。

执行阶段

该阶段主要包含语句的prepare和executed，这个阶段元数据锁是否升级为exclusive取决于在初始化阶段评估。如果需要排他元数据锁，则只在语句准备期间短暂持有，在执行阶段并不持有exclusive元数据锁。如果需要拷表或修改引擎层数据，则该阶段是最耗时的阶段；

提交表定义阶段

该阶段就是需要持有排它锁，进行新旧表切换。

从上面的大致过程可以看到，即使是INSTANT的情况下，也依然会需要在提交阶段获取元数据的排它锁，即使最大限度提高并发，仅仅也是通过缩短持有排它锁的时间来实现。所以在默写情况依然会阻塞。

比如：

典型的情况就是，DDL语句等待X类型的元数据锁，随后的DML语句则被阻塞等待

官方有一个举例，如下：

会话1 ：

mysql> CREATE TABLE t1 (c1 INT) ENGINE=InnoDB;
mysql> START TRANSACTION;
mysql> SELECT * FROM t1;

由于事务没有提交，此时会话1持有表t1的共享元数据锁

会话2：

mysql> ALTER TABLE t1 ADD COLUMN x INT, ALGORITHM=INPLACE, LOCK=NONE;

该语句虽然可以极短时间内完成，但其在提交阶段依然需要获取表t1的元数据排它锁，故阻塞等待会话1中的事务提交或者回滚。

会话3：

mysql> SELECT * FROM t1;

此时后续会话则被会话2阻塞。

上面就是Online DDL的大致过程，下面将结合源码分析具体执行过程。

二、源码分析

上面说明了MySQL进行在线创建索引时，会自动选择最优的方式进行，不太会对业务造成严重影响，这里要从从MySQL源码出发，分析MySQL中是如何实现的，同时也确认是否在回放DML时会报duplicate key。

2.1 核心结构

在线online处理的核心代码在文件row0log.cc中，有兴趣的可以进行详细解读，这里说明一下MySQL支持online ddl的细节逻辑是：通过一个日志缓存，保留在ddl期间的dml操作，然后进行缓存日志回复，类似于gh-ost工具，只不过后者采用binlog进行dml操作回放，而mysql内部是单独维护一个核心缓存结构——row_log_t

row_log_t

/** @brief Buffer for logging modifications during online index creation

All modifications to an index that is being created will be logged by
row_log_online_op() to this buffer.

All modifications to a table that is being rebuilt will be logged by
row_log_table_delete(), row_log_table_update(), row_log_table_insert()
to this buffer.

When head.blocks == tail.blocks, the reader will access tail.block
directly. When also head.bytes == tail.bytes, both counts will be
reset to 0 and the file will be truncated. */
struct row_log_t {
  int fd;              /*!< file descriptor */
  ib_mutex_t mutex;    /*!< mutex protecting error,
                       max_trx and tail */
  page_no_map *blobs;  /*!< map of page numbers of off-page columns
                       that have been freed during table-rebuilding
                       ALTER TABLE (row_log_table_*); protected by
                       index->lock X-latch only */
  dict_table_t *table; /*!< table that is being rebuilt,
                       or NULL when this is a secondary
                       index that is being created online */
  bool same_pk;        /*!< whether the definition of the PRIMARY KEY
                       has remained the same */
  const dtuple_t *add_cols;
  /*!< default values of added columns, or NULL */
  const ulint *col_map; /*!< mapping of old column numbers to
                        new ones, or NULL if !table */
  dberr_t error;        /*!< error that occurred during online
                        table rebuild */
  trx_id_t max_trx;     /*!< biggest observed trx_id in
                        row_log_online_op();
                        protected by mutex and index->lock S-latch,
                        or by index->lock X-latch only */
  row_log_buf_t tail;   /*!< writer context;
                        protected by mutex and index->lock S-latch,
                        or by index->lock X-latch only */
  row_log_buf_t head;   /*!< reader context; protected by MDL only;
                        modifiable by row_log_apply_ops() */
  ulint n_old_col;
  /*!< number of non-virtual column in
  old table */
  ulint n_old_vcol;
  /*!< number of virtual column in old table */
  const char *path; /*!< where to create temporary file during
                    log operation */
};

从说明可以看出，mysql内部将online ddl分为两类：

一类是增加索引类，即调用row_log_online_op函数来进行dml操作缓存填写；
一类是其他ddl。则调用row_log_table_delete, row_log_table_update, row_log_table_insert进行缓存区填充。

下面说要一下，核心结构体中row_log_t各字段含义：

fd，path ：分别是在ddl操作期间，用于保存dml操作记录的临时文件的句柄和文件名；从源码可以看到该目录为innodb_tmpdir指定，若该值为空，则设置为tmpdir对应目录。其获取临时目录的函数为innobase_mysql_tmpdir()

blobs：记录的写入是按照记录块的方式，该字段表示记录块的数量；

table：不为null表示重建表，为null表示online 添加索引

tail，head：该成员就是记录块，分别用于写入和回放。具体结构 row_log_buf_t 下面会详细说明

row_log_buf_t

/** Log block for modifications during online ALTER TABLE */
struct row_log_buf_t {
  byte *block;            /*!< file block buffer */
  ut_new_pfx_t block_pfx; /*!< opaque descriptor of "block". Set
                       by ut_allocator::allocate_large() and fed to
                       ut_allocator::deallocate_large(). */
  mrec_buf_t buf;         /*!< buffer for accessing a record
                          that spans two blocks */
  ulint blocks;           /*!< current position in blocks */
  ulint bytes;            /*!< current position within block */
  ulonglong total;        /*!< logical position, in bytes from
                          the start of the row_log_table log;
                          0 for row_log_online_op() and
                          row_log_apply(). */
};

上面就是online ddl记录块的内存结构体，在写入临时文件和读取临时文件都是以块为单位进行。一个记录块可保存一条或多条增量DML日志。一条增量DML日志可能跨2个记录块。

block：表示正在进行读取或者写入的记录块；

bytes：是该记录块已使用的字节数；

blocks：表示已经往临时文件中写入多少个记录块；

buf：用于处理一条DML日志横跨2个记录块的场景；

total：表示记录总大小。

这里block块大小由参数<font color=blue>innodb_sort_buffer_size</font>指定。

/** Allocate the memory for the log buffer.
@param[in,out]  log_buf Buffer used for log operation
@return true if success, false if not */
static MY_ATTRIBUTE((warn_unused_result)) bool row_log_block_allocate(
    row_log_buf_t &log_buf) {
  DBUG_TRACE;
  if (log_buf.block == NULL) {
    DBUG_EXECUTE_IF("simulate_row_log_allocation_failure", return false;);
    // 分配块空间，大小由innodb_sort_buffer_size指定
    log_buf.block = ut_allocator<byte>(mem_key_row_log_buf)
                        .allocate_large(srv_sort_buf_size, &log_buf.block_pfx);

    if (log_buf.block == NULL) {
      return false;
    }
  }
  return true;
}

下面以添加索引为例，来说明online ddl的执行过程，对于其他类型的ddl，可以自己分析。

2.2 增量DML写入实现分析

上面知道，如果是创建索引，其最终调用的是row_log_online_op函数，首先我们来看一下其实现过程

/** Logs an operation to a secondary index that is (or was) being created. */
void row_log_online_op(
    dict_index_t *index,   /*!< in/out: index, S or X latched */
    const dtuple_t *tuple, /*!< in: index tuple */
    trx_id_t trx_id)       /*!< in: transaction ID for insert,
                           or 0 for delete */
{
  byte *b;
  ulint extra_size;
  ulint size;
  ulint mrec_size;
  ulint avail_size;
  row_log_t *log;

  ut_ad(dtuple_validate(tuple));
  ut_ad(dtuple_get_n_fields(tuple) == dict_index_get_n_fields(index));
  // 判断已经获取了索引的共享锁或者排它锁
  ut_ad(rw_lock_own(dict_index_get_lock(index), RW_LOCK_S) ||
        rw_lock_own(dict_index_get_lock(index), RW_LOCK_X));
  // 检测索引是否正常
  if (index->is_corrupted()) {
    return;
  }
  // 检测索引状态为ONLINE_INDEX_CREATION
  ut_ad(dict_index_is_online_ddl(index));

  /* Compute the size of the record. This differs from
  row_merge_buf_encode(), because here we do not encode
  extra_size+1 (and reserve 0 as the end-of-chunk marker). */
  // 获取插入记录的长度
  size = rec_get_converted_size_temp(index, tuple->fields, tuple->n_fields,
                                     NULL, &extra_size);
  ut_ad(size >= extra_size);
  ut_ad(size <= sizeof log->tail.buf);
  /*
  真实写入临时文件中的记录，包含：
    记录头（2字节，op和extra_size）
    size
    事务id（如果trx_id非0，即非删除记录）
  */
  mrec_size = ROW_LOG_HEADER_SIZE + (extra_size >= 0x80) + size +
              (trx_id ? DATA_TRX_ID_LEN : 0);

  log = index->online_log;
  mutex_enter(&log->mutex);  // 获取锁
  // 更新log->max_trx
  if (trx_id > log->max_trx) {
    log->max_trx = trx_id;
  }
  // 如果该记录块没有分配空间，则为log->tail->block分配空间
  if (!row_log_block_allocate(log->tail)) {
    log->error = DB_OUT_OF_MEMORY;
    goto err_exit;
  }

  UNIV_MEM_INVALID(log->tail.buf, sizeof log->tail.buf);
  // 计算剩余空间
  ut_ad(log->tail.bytes < srv_sort_buf_size);
  avail_size = srv_sort_buf_size - log->tail.bytes;
  // 如果记录大于剩余空间，则写入buf中，不然这直接写入block中
  if (mrec_size > avail_size) {
    b = log->tail.buf;
  } else {
    b = log->tail.block + log->tail.bytes;
  }
  // 写入操作标识符，trx_id，extra_size，以及数据
  if (trx_id != 0) {
    *b++ = ROW_OP_INSERT;
    trx_write_trx_id(b, trx_id);
    b += DATA_TRX_ID_LEN;
  } else {
    *b++ = ROW_OP_DELETE;
  }

  if (extra_size < 0x80) {
    *b++ = (byte)extra_size;
  } else {
    ut_ad(extra_size < 0x8000);
    *b++ = (byte)(0x80 | (extra_size >> 8));
    *b++ = (byte)extra_size;
  }

  rec_convert_dtuple_to_temp(b + extra_size, index, tuple->fields,
                             tuple->n_fields, NULL);
  b += size;
  /*
  如果记录大于剩余空间，则先将记录部分填入block，调用os_file_write_int_fd写入文件，并将剩余继续写入下一个block缓存;
  block还有空间，直接存入相应空间即可
  */
  if (mrec_size >= avail_size) {
    dberr_t err;
    IORequest request(IORequest::WRITE);
    const os_offset_t byte_offset =
        (os_offset_t)log->tail.blocks * srv_sort_buf_size;
    //如果写入后文件总大小超过innodb_online_alter_log_max_size，则报错
    if (byte_offset + srv_sort_buf_size >= srv_online_max_size) {
      goto write_failed;
    }

    if (mrec_size == avail_size) {
      ut_ad(b == &log->tail.block[srv_sort_buf_size]);
    } else {
      ut_ad(b == log->tail.buf + mrec_size);
      memcpy(log->tail.block + log->tail.bytes, log->tail.buf, avail_size);
    }

    UNIV_MEM_ASSERT_RW(log->tail.block, srv_sort_buf_size);

    if (row_log_tmpfile(log) < 0) {
      log->error = DB_OUT_OF_MEMORY;
      goto err_exit;
    }
    // 写入文件
    err = os_file_write_int_fd(request, "(modification log)", log->fd,
                               log->tail.block, byte_offset, srv_sort_buf_size);

    log->tail.blocks++;
    if (err != DB_SUCCESS) {
    write_failed:
      /* We set the flag directly instead of
      invoking dict_set_corrupted() here,
      because the index is not "public" yet. */
      index->type |= DICT_CORRUPT;
    }
    UNIV_MEM_INVALID(log->tail.block, srv_sort_buf_size);
    // 剩余记录放入下一个block
    memcpy(log->tail.block, log->tail.buf + avail_size, mrec_size - avail_size);
    log->tail.bytes = mrec_size - avail_size;
  } else {
    log->tail.bytes += mrec_size;
    ut_ad(b == log->tail.block + log->tail.bytes);
  }

  UNIV_MEM_INVALID(log->tail.buf, sizeof log->tail.buf);
err_exit:
  mutex_exit(&log->mutex); // 释放锁
}

从这里可以看出：

当等待缓存的增量DML日志量mrec_size大于等于当前记录块的可用空间avail_size时，会触发将记录块写入临时文件的操作，将当前的DML日志先写入tail.buf字段，并拷贝DML日志前面部分到当前记录块，将其填满。再调用os_file_write_int_fd将记录块写入临时文件。完成当前记录块写入临时文件后，把DML日志的剩余部分拷贝到已经空闲的tail.block上
如果mrec_size等于avail_size，那么直接写入当前记录块。
DML日志不会全部缓存在内存中，而是会写入到临时文件中，内存中仅保留最后一个记录块。
因此，不存在执行时间过长引起内存空间占用过多的问题。相对来说，临时文件磁盘空间消耗，问题会小很多。

下面来看一下row_log_online_op的调用情况，可以看到只有两个处调用：row_log_online_op_try和row_upd_sec_index_entry_low，其中row_log_online_op_try用于记录的增加和删除，而row_upd_sec_index_entry_low用于记录的更新，下面来看一下具体情况：

/** Try to log an operation to a secondary index that is
 (or was) being created.
 @retval true if the operation was logged or can be ignored
 @retval false if online index creation is not taking place */
UNIV_INLINE
bool row_log_online_op_try(
    dict_index_t *index,   /*!< in/out: index, S or X latched */
    const dtuple_t *tuple, /*!< in: index tuple */
    trx_id_t trx_id)       /*!< in: transaction ID for insert,
                           or 0 for delete */
{
  // 获取锁
  ut_ad(rw_lock_own_flagged(dict_index_get_lock(index),
                            RW_LOCK_FLAG_S | RW_LOCK_FLAG_X | RW_LOCK_FLAG_SX));

  switch (dict_index_get_online_status(index)) {
    case ONLINE_INDEX_COMPLETE:
      /* This is a normal index. Do not log anything.
      The caller must perform the operation on the
      index tree directly. */
      return (false);
    case ONLINE_INDEX_CREATION:
      /* The index is being created online. Log the
      operation. */
      row_log_online_op(index, tuple, trx_id);
      break;
    case ONLINE_INDEX_ABORTED:
    case ONLINE_INDEX_ABORTED_DROPPED:
      /* The index was created online, but the operation was
      aborted. Do not log the operation and tell the caller
      to skip the operation. */
      break;
  }

  return (true);
}

/** Updates a secondary index entry of a row.
@param[in]  node        row update node
@param[in]  old_entry   the old entry to search, or nullptr then it
                                has to be created in this function
@param[in]  thr     query thread
@return DB_SUCCESS if operation successfully completed, else error
code or DB_LOCK_WAIT */
static MY_ATTRIBUTE((warn_unused_result)) dberr_t
    row_upd_sec_index_entry_low(upd_node_t *node, dtuple_t *old_entry,
                                que_thr_t *thr) {
    …………
    // 获取锁
    mtr_s_lock(dict_index_get_lock(index), &mtr);

    switch (dict_index_get_online_status(index)) {
      case ONLINE_INDEX_COMPLETE:
        /* This is a normal index. Do not log anything.
        Perform the update on the index tree directly. */
        break;
      case ONLINE_INDEX_CREATION:
        /* Log a DELETE and optionally INSERT. */
        // 先插入删除操作
        row_log_online_op(index, entry, 0);

        if (!node->is_delete) {
          mem_heap_empty(heap);
          entry =
              row_build_index_entry(node->upd_row, node->upd_ext, index, heap);
          ut_a(entry);
          // 再进行插入操作
          row_log_online_op(index, entry, trx->id);
        }
        /* fall through */
      case ONLINE_INDEX_ABORTED:
      case ONLINE_INDEX_ABORTED_DROPPED:
        mtr_commit(&mtr);
        goto func_exit;
    }
    ……
}

从上面可以看出：

对于二次索引的插入和删除操作，指定调用row_log_online_op_try函数；
对于二次索引的更新操作，调用row_upd_sec_index_entry_low，并且内部是先记录删除操作，然后再插入；
所有操作都是在index锁保护之下进行。

<font color=red>思考</font>：通过上面看，应该没有主键冲突才是呀，为什么官方提示存在主键冲突的可能呢？

When running an in-place online DDL operation, the thread that runs the ALTER TABLE statement applies an online log of DML operations that were run concurrently on the same table from other connection threads. When the DML operations are applied, it is possible to encounter a duplicate key entry error (ERROR 1062 (23000): Duplicate entry), even if the duplicate entry is only temporary and would be reverted by a later entry in the online log. This is similar to the idea of a foreign key constraint check in InnoDB in which constraints must hold during a transaction.

可以从row_log_online_op_try的调用情况分析：

 if (check) {
    DEBUG_SYNC_C("row_ins_sec_index_enter");
    if (mode == BTR_MODIFY_LEAF) {
      search_mode |= BTR_ALREADY_S_LATCHED;
      mtr_s_lock(dict_index_get_lock(index), &mtr);
    } else {
      mtr_sx_lock(dict_index_get_lock(index), &mtr);
    }
    // 插入dml操作到缓存区
    if (row_log_online_op_try(index, entry, thr_get_trx(thr)->id)) {
      goto func_exit;
    }
  }
  ……
  if (row_ins_sec_mtr_start_and_check_if_aborted(&mtr, index, check,
                                                   search_mode)) {
      goto func_exit;
    }
    // 检测是否主键冲突
    err = row_ins_scan_sec_index_for_duplicate(flags, index, entry, thr, check,
                                               &mtr, offsets_heap);

    mtr_commit(&mtr);

    switch (err) {
      case DB_SUCCESS:
        break;
      case DB_DUPLICATE_KEY:  // 如果主键冲突，则设置所以corrupted，但是err为DB_SUCCESS
        if (!index->is_committed()) {
          ut_ad(!thr_get_trx(thr)->dict_operation_lock_mode);

          dict_set_corrupted(index);
          /* Do not return any error to the
          caller. The duplicate will be reported
          by ALTER TABLE or CREATE UNIQUE INDEX.
          Unfortunately we cannot report the
          duplicate key value to the DDL thread,
          because the altered_table object is
          private to its call stack. */
          err = DB_SUCCESS;
        }
        /* fall through */
      default:
        if (dict_index_is_spatial(index)) {
          rtr_clean_rtr_info(&rtr_info, true);
        }
        return err;
    }

从上面的代码可以看到，虽然已经检测到了主键冲突，但是对于dml客户端来说并不报错，而是将index设置为corrupted，从而导致在回访dml记录时报错-主键冲突。

2.3 增量DML回放实现分析

dml记录回访的接口函数为row_log_apply，而具体的操作是在函数row_log_apply_ops中，下面来看一下该函数的简化代码：

next_block:
  ut_ad(has_index_lock);
  // 判断已经获得索引锁
  ut_ad(rw_lock_own(dict_index_get_lock(index), RW_LOCK_X));
  ut_ad(index->online_log->head.bytes == 0);

  // 事务是否被中断
  if (trx_is_interrupted(trx)) {
    goto interrupted;
  }

  error = index->online_log->error;
  if (error != DB_SUCCESS) {
    goto func_exit;
  }
  //检测索引是否正常，或者是否错误
  if (index->is_corrupted()) {
    error = DB_INDEX_CORRUPT;
    goto func_exit;
  }

  /*
  如果是最后一个记录块，则直接在内存中获取，并且删除文件,保存持锁状态
  如果是非最后一个记录块，则从文件中读取记录块，解除index锁状态进行回放
  */
  if (index->online_log->head.blocks == index->online_log->tail.blocks) {
    if (index->online_log->head.blocks) {
#ifdef HAVE_FTRUNCATE
      /* Truncate the file in order to save space. */
      if (index->online_log->fd > 0 &&
          ftruncate(index->online_log->fd, 0) == -1) {
        perror("ftruncate");
      }
#endif /* HAVE_FTRUNCATE */
      index->online_log->head.blocks = index->online_log->tail.blocks = 0;
    }
    // 内存获取最后一个记录块
    next_mrec = index->online_log->tail.block;
    next_mrec_end = next_mrec + index->online_log->tail.bytes;
   // 非最后一块
  } else {
    os_offset_t ofs;

    ofs = (os_offset_t)index->online_log->head.blocks * srv_sort_buf_size;

    ut_ad(has_index_lock);
    has_index_lock = false;
    // 不是最后一块记录则在回放中会释放index锁 
    rw_lock_x_unlock(dict_index_get_lock(index));
    // 分配空间
    if (!row_log_block_allocate(index->online_log->head)) {
      error = DB_OUT_OF_MEMORY;
      goto func_exit;
    }
    // 读取文件记录块
    IORequest request;
    dberr_t err = os_file_read_no_error_handling_int_fd(
        request, index->online_log->path, index->online_log->fd,
        index->online_log->head.block, ofs, srv_sort_buf_size, NULL);

    if (err != DB_SUCCESS) {
      ib::error(ER_IB_MSG_963) << "Unable to read temporary file"
                                  " for index "
                               << index->name;
      goto corruption;
    }
    next_mrec = index->online_log->head.block;
    next_mrec_end = next_mrec + srv_sort_buf_size;
  }
 /*
 调用row_log_apply_op循环回放相应的记录操作，
 当一个块回放完成后，则调到next_block标记处进行下一个块的回放
 */
  while (!trx_is_interrupted(trx)) {
    mrec = next_mrec;
    ut_ad(mrec < mrec_end);

    if (!has_index_lock) {
      /* We are applying operations from a different
      block than the one that is being written to.
      We do not hold index->lock in order to
      allow other threads to concurrently buffer
      modifications. */
      ut_ad(mrec >= index->online_log->head.block);
      ut_ad(mrec_end == index->online_log->head.block + srv_sort_buf_size);
      ut_ad(index->online_log->head.bytes < srv_sort_buf_size);

      /* Take the opportunity to do a redo log
      checkpoint if needed. */
      log_free_check();
    } else {
      /* We are applying operations from the last block.
      Do not allow other threads to buffer anything,
      so that we can finally catch up and synchronize. */
      ut_ad(index->online_log->head.blocks == 0);
      ut_ad(index->online_log->tail.blocks == 0);
      ut_ad(mrec_end ==
            index->online_log->tail.block + index->online_log->tail.bytes);
      ut_ad(mrec >= index->online_log->tail.block);
    }

    next_mrec = row_log_apply_op(index, dup, &error, offsets_heap, heap,
                                 has_index_lock, mrec, mrec_end, offsets);

    if (error != DB_SUCCESS) {
      goto func_exit;
    } else if (next_mrec == next_mrec_end) {
      /* The record happened to end on a block boundary.
      Do we have more blocks left? */
      if (has_index_lock) {
        /* The index will be locked while
        applying the last block. */
        goto all_done;
      }

      mrec = NULL;
    process_next_block:
      rw_lock_x_lock(dict_index_get_lock(index));
      has_index_lock = true;

      index->online_log->head.bytes = 0;
      index->online_log->head.blocks++;
      goto next_block;
    } else if (next_mrec != NULL) {
      ut_ad(next_mrec < next_mrec_end);
      index->online_log->head.bytes += next_mrec - mrec;
    } else if (has_index_lock) {
      /* When mrec is within tail.block, it should
      be a complete record, because we are holding
      index->lock and thus excluding the writer. */
      ut_ad(index->online_log->tail.blocks == 0);
      ut_ad(mrec_end ==
            index->online_log->tail.block + index->online_log->tail.bytes);
      ut_ad(0);
      goto unexpected_eof;
    } else {
      memcpy(index->online_log->head.buf, mrec, mrec_end - mrec);
      mrec_end += index->online_log->head.buf - mrec;
      mrec = index->online_log->head.buf;
      goto process_next_block;
    }
  }

从代码分析可以得出：

虽然进入该函数时加了index锁，但在处理非最后一个block时，会释放锁，然后读取文件上的对应日志块并进行回放，只有当处理最后一个内存块时一直持锁；
处理最后一个block时不需要从日志文件中读取block，因为最后一个block还缓存在内存中。因此，在开始处理前会先将用于缓存增量DML日志的临时文件truncate掉，避免无意义的存储资源消耗；
创建二级索引时会通过trx_is_interrupted判断创建操作是否被中断，也就是说可以通过kill等方式终止创建操作
回放过程通过index->is_corrupted()判断索引是否正常，上面的主键冲突由于设置了该索引corrupted，故被终止。

MySQL Online DDL实现分析
author：sufei源码版本：8.0.18 说明：如果对于online ddl还不了解可以参考一、Onlin...
mysql online ddl
看过这些文章：http://blog.gerryyang.com/mysql/2018/09/28/mysql-o...
MySQL online DDL测试
online ddl是mysql 5.6版本新增的功能，之前版本做ddl，为了避免堵塞DML一般都是选择pt-os...
mysql ddl
Online DDL算法 INSTANT（MySQL 8.0.12） INPLACE COPY 关注点从如下5个维...
Mysql Online DDL 和 pt-ost 、gh-os
Mysql 官方DDL Mysql 5.6 之前版本ddl实现方式在 Mysql 5.6 之前版本中，如果要...
MySQL使用pt-ost执行ddl语句
MySQL使用pt-ost执行ddl语句 pt-online-schema-change工具，是通过对原始表的数据...
MySQL---在线DDL工具pt-osc
导读： MySQL原生的Online DDL还是有很多限制的，还是会遇到data meta lock的问题等诸多不...
MySQL 5.6 Online DDL.md
一 .Fast index Creation MySQL 5.5和更高版本并且MySQL 5.1 innodb p...
mysql 事务回滚，online ddl
回滚当事务中有DDL时，事务回滚会出错，当事务中只有DML时，回滚一般不会出问题DDL：新建表等操作DML：数据插...
MySQL & MariaDB Online DDL 参考指南
概述在早期的 MySQL 版本中，DDL 操作（如创建索引等）通常都需要对数据表加锁，操作过程中 DML 操作都...

MySQL Online DDL实现分析

一、Online DDL简介

1.1 Online DDL划分

锁与并发度划分

否拷贝数据划分

1.2 执行流程

二、源码分析

2.1 核心结构

row_log_t

row_log_buf_t

2.2 增量DML写入实现分析

2.3 增量DML回放实现分析

相关文章

MySQL Online DDL实现分析

mysql online ddl

MySQL online DDL测试

mysql ddl

Mysql Online DDL 和 pt-ost 、gh-os

MySQL使用pt-ost执行ddl语句

MySQL---在线DDL工具pt-osc

MySQL 5.6 Online DDL.md

mysql 事务回滚，online ddl

MySQL & MariaDB Online DDL 参考指南

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读