美文网首页
一次MYSQL频繁宕机的排查

一次MYSQL频繁宕机的排查

作者: 数齐 | 来源:发表于2024-09-12 21:12 被阅读0次

    背景

    近期,兄弟部门反馈关于我们产品自带的MYSQL数据库的问题。他们指出,数据库在每天凌晨12点会出现宕机现象。这一现象也是最新一段才出现的,这段时间只是增加了一些自动化程序。频繁宕机的问题影响了兄弟部门的工作流程,也产生了一定的负面影响。为了解决这一问题,我们需要深入调查并给出解决措施。

    排查过程

    确认MYSQL所在的服务器资源情况

    我们了解到这段时间做的变更中提到增加了一些自动化的程序,是不是自动化程序中涉及到操作数据库,造成系统资源不足,被系统杀掉?于是查看各类系统资源。无论是磁盘还是内存都有较大剩余,也查看了部分系统日志,均没有因为资源不足杀进程的日志,所以应该不是此原因导致宕机。

    结论:不是因为系统资源导致MYSQL宕机

    排查MYSQL错误日志

    在my.ini中指定当mysql发生故障时的的错误日志地址,具体配置方式如下:

    # start error log

    log-error=mysql.err

    mysql.err内容如下

    2024-01-08  9:30:46 0 [Note] InnoDB: Uses event mutexes

    2024-01-08  9:30:46 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

    2024-01-08  9:30:46 0 [Note] InnoDB: Number of pools: 1

    2024-01-08  9:30:46 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

    2024-01-08  9:30:46 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)

    2024-01-08  9:30:46 0 [Note] InnoDB: Using Linux native AIO

    2024-01-08  9:30:46 0 [Note] InnoDB: Initializing buffer pool, total size = 536870912, chunk size = 134217728

    2024-01-08  9:30:46 0 [Note] InnoDB: Completed initialization of buffer pool

    2024-01-08  9:30:46 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=296938092837,296938109566

    2024-01-08  9:30:47 0 [Note] InnoDB: Transaction 57863035 was in the XA prepared state.

    2024-01-08  9:30:47 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

    2024-01-08  9:30:47 0 [Note] InnoDB: Trx id counter is 57863036

    2024-01-08  9:30:47 0 [Note] InnoDB: Starting final batch to recover 54726 pages from redo log.

    2024-01-08  9:30:49 0 [Note] InnoDB: Last binlog file './mysql-bin.000078', position 111348635

    2024-01-08  9:30:49 0 [Note] InnoDB: 128 rollback segments are active.

    2024-01-08  9:30:49 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

    2024-01-08  9:30:49 0 [Note] InnoDB: Rollback of non-prepared transactions completed

    2024-01-08  9:30:49 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"

    2024-01-08  9:30:49 0 [Note] InnoDB: Creating shared tablespace for temporary tables

    2024-01-08  9:30:49 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

    2024-01-08  9:30:49 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

    2024-01-08  9:30:49 0 [Note] InnoDB: 10.5.11 started; log sequence number 297037605262; transaction id 57863037

    2024-01-08  9:30:49 0 [Note] InnoDB: Loading buffer pool(s) from /home/hsrisk/disk01/see/mysql/data/ib_buffer_pool

    2024-01-08  9:30:49 0 [Note] Plugin 'FEEDBACK' is disabled.

    2024-01-08  9:30:49 0 [Note] Recovering after a crash using mysql-bin

    2024-01-08  9:30:49 0 [Note] InnoDB: Buffer pool(s) load completed at 240108  9:30:49

    2024-01-08  9:30:49 0 [Note] Starting crash recovery...

    2024-01-08  9:30:49 0 [Note] InnoDB: Starting recovery for XA transactions...

    2024-01-08  9:30:49 0 [Note] InnoDB: Transaction 57863035 in prepared state after recovery

    2024-01-08  9:30:49 0 [Note] InnoDB: Transaction contains changes to 1 rows

    2024-01-08  9:30:49 0 [Note] InnoDB: 1 transactions in prepared state after recovery

    2024-01-08  9:30:49 0 [Note] Found 1 prepared transaction(s) in InnoDB

    2024-01-08  9:30:49 0 [Note] Crash recovery finished.

    2024-01-08  9:30:49 0 [Note] Server socket created on IP: '::'.

    2024-01-08  9:30:49 0 [Note] Reading of all Master_info entries succeeded

    2024-01-08  9:30:49 0 [Note] Added new Master_info '' to hash table

    2024-01-08  9:30:49 0 [Note] ./mysqld: ready for connections.

    Version: '10.5.11-MariaDB-log'  socket: '/tmp/see20/mysql.sock'  port: 3306  MariaDB Server

    这段日志是客户因为数据库宕机后手动重启的日志。这个日志中发现有Recovering after a crash using mysql-bin,Starting crash recovery,Crash recovery finished 等字样,表示数据库之前确实是异常关闭的,但是这种异常是MYSQL SERVER预料之外的。预料之外的含义是这种异常数据库未定义,如果是意料之中的异常的话,日志中会留下更多的信息,例如下图,虽然没写明是什么原因导致的数据库崩溃,但是他清楚的告诉了报错的地点以及保留了帮助定位的信息。

    2023-09-17 21:36:12 0x7f384413a700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/lock/lock0lock.cc line 7014

    InnoDB: We intentionally generate a memory trap.

    InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

    InnoDB: If you get repeated assertion failures or crashes, even

    InnoDB: immediately after the mysqld startup, there may be

    InnoDB: corruption in the InnoDB tablespace. Please refer to

    InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/

    InnoDB: about forcing recovery.

    230917 21:36:12 [ERROR] mysqld got signal 6 ;

    This could be because you hit a bug. It is also possible that this binary

    or one of the libraries it was linked against is corrupt, improperly built,

    or misconfigured. This error can also be caused by malfunctioning hardware.

    To report this bug, see https://mariadb.com/kb/en/reporting-bugs

    We will try our best to scrape up some info that will hopefully help

    diagnose the problem, but since we have already crashed,

    something is definitely wrong and this may fail.

    Server version: 10.3.10-MariaDB-log

    key_buffer_size=134217728

    read_buffer_size=131072

    max_used_connections=322

    max_threads=6002

    thread_count=261

    It is possible that mysqld could use up to

    key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 13325602 K  bytes of memory

    Hope that's ok; if not, decrease some variables in the equation.

    Thread pointer: 0x7f35980009a8

    Attempting backtrace. You can use the following information to find out

    where mysqld died. If you see no messages after this, something went

    terribly wrong...

    stack_bottom = 0x7f3844139e30 thread_stack 0x49000

    mysys/stacktrace.c:270(my_print_stacktrace)[0x556c2c650b4b]

    sql/signal_handler.cc:168(handle_fatal_signal)[0x556c2c13bba7]

    sigaction.c:0(__restore_rt)[0x7f3849863680]

    :0(__GI_raise)[0x7f384866f207]

    :0(__GI_abort)[0x7f38486708f8]

    ut/ut0rbt.cc:461(rbt_eject_node(ib_rbt_node_t*, ib_rbt_node_t*) [clone .part.4])[0x556c2bea97b9]

    lock/lock0lock.cc:7014(DeadlockChecker::check_and_resolve(ib_lock_t const*, trx_t*))[0x556c2c319ec2]

    lock/lock0lock.cc:1741(lock_rec_enqueue_waiting(ib_lock_t*, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*, lock_prdt*))[0x556c2c31aaf4]

    lock/lock0lock.cc:1990(lock_rec_lock(bool, unsigned long, buf_block_t const*, unsigned long, dict_index_t*, que_thr_t*))[0x556c2c31df48]

    row/row0sel.cc:1284(sel_set_rec_lock(btr_pcur_t*, unsigned char const*, dict_index_t*, unsigned long const*, unsigned long, unsigned long, que_thr_t*, mtr_t*))[0x556c2c3b783c]

    row/row0sel.cc:5047(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x556c2c3b8a10]

    handler/ha_innodb.cc:9264(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function))[0x556c2c2e11fc]

    sql/handler.cc:2798(handler::ha_rnd_pos(unsigned char*, unsigned char*))[0x556c2c140c82]

    sql/handler.h:3568(handler::rnd_pos_by_record(unsigned char*))[0x556c2c14855b]

    sql/sql_class.h:6384(handler::ha_rnd_pos_by_record(unsigned char*))[0x556c2c219170]

    sql/log_event.cc:14265(Update_rows_log_event::do_exec_row(rpl_group_info*))[0x556c2c2194f4]

    sql/log_event.cc:11514(Rows_log_event::do_apply_event(rpl_group_info*))[0x556c2c20ddcc]

    sql/log_event.h:5159(Log_event::apply_event(rpl_group_info*))[0x556c2bef5c5c]

    sql/slave.cc:3920(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x556c2bee83db]

    sql/mysqld.h:808(thread_safe_increment64)[0x556c2c09073e]

    sql/rpl_parallel.cc:1300(handle_rpl_parallel_thread)[0x556c2c0931fb]

    pthread_create.c:0(start_thread)[0x7f384985bdd5]

    *** stack smashing detected ***: /home/hundsun/mysql/mysql/bin/mysqld terminated

    结论:是异常宕机,并且该异常不是系统定义,无法被MYSQL SERVER所捕获。

    排查MYSQL BINLOG日志

    在my.ini中指定当开启binlog日志,具体配置方式如下:

    log-bin=mysql-bin  #开启binlog

    重启MYSQL后会创建如下格式的binlog文件

    mysql-bin.000001

    mysql-bin.000002

    MYSQL生成的binlog是二进制文件,无法直接查看。可以使用mysqlbinlog工具将binlog二进制数据导出进行查看,例如

    ./mysqlbinlog ../data/mysql-bin.000004 --start-datetime='2020-02-12 23:30:00' --stop-datetime='2025-02-12:23:50:00' --base64-output=DECODE-ROWS  > 4.txt

    可以按照时间维度将数据导出到文件中进行查看。下面是2个binlog的导出数据

    mysql-bin.000041这个文件有203M,最后的修改时间是2023-12-24 23:59:15。此binlog导出文件的末尾数据如下

    #231224 23:59:15 server id 1  end_log_pos 83409660 CRC32 0x595aa790 Table_map: `acm`.`tb_lock` mapped to number 58

    # at 83409660

    #231224 23:59:15 server id 1  end_log_pos 83410156 CRC32 0xed3ad18a Update_rows: table id 58 flags: STMT_END_F

    # Number of rows: 1

    # at 83410156

    #231224 23:59:15 server id 1  end_log_pos 83410187 CRC32 0x539c3bfa Xid = 1904727

    COMMIT/*!*/;

    DELIMITER ;

    # End of log file

    ROLLBACK /* added by mysqlbinlog */;

    /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;

    /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

    mysql-bin.000042这个文件有1G,最后的修改时间是2023-12-25 19:03:03。此binlog导出文件的末尾数据如下

    相关文章

      网友评论

          本文标题:一次MYSQL频繁宕机的排查

          本文链接:https://www.haomeiwen.com/subject/zjzuljtx.html