美文网首页
惊魂未定,回忆mysql挂掉运维历程。

惊魂未定,回忆mysql挂掉运维历程。

作者: 小胖子善轩 | 来源:发表于2018-02-02 12:13 被阅读0次

5分钟前问题已经修复了,但是这个问题是很难重现的。趁着自己还有印象赶紧写下简书,希望能对后人有所帮助。(环境centos)

问题重现

2018年2月1号晚上11点左右,思考好了新架构之后开始动手设计数据库了。然后突然发现mysql连接不上。嗯,mysql挂掉了。

问题分析

首先还是先看看能不能手动启动服务。

sudo service mysqld restart

结果是(注意"Active: failed (Result: start-limit)“)

MySQL Server Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled) 
Active: failed (Result: start-limit) since Sun 2015-12-06 03:14:54 GMT;
4min 7s ago Process: 6992 ExecStart=/usr/sbin/mysqld
--daemonize $MYSQLD_OPTS (code=exited, status=1/FAILURE) Process: 6971 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited,
status=0/SUCCESS) 
Dec 06 03:14:54 localhost.localdomain systemd[1]: mysqld.service: control process exited, code=exited status=1 
Dec 06 03:14:54 localhost.localdomain systemd[1]: Failed to start MySQL Server. 
Dec 06 03:14:54 localhost.localdomain systemd[1]: Unit  mysqld.service entered failed state. 
Dec 06 03:14:54 localhost.localdomain systemd[1]: mysqld.service holdoff time over, scheduling restart. 
Dec 06 03:14:54 localhost.localdomain systemd[1]: Stopping MySQL Server... 
Dec 06 03:14:54 localhost.localdomain systemd[1]: Starting MySQL Server...   
Dec 06 03:14:54 localhost.localdomain systemd[1]: mysqld.service start request repeated too quickly, refusing to start. 
Dec 06 03:14:54 localhost.localdomain systemd[1]: Failed to start MySQL Server. 
Dec 06 03:14:54 localhost.localdomain systemd[1]: Unit mysqld.service entered failed state. The journal command reads Failed to start MySQL Server

之前没有太多的mysql运维经验,所以看到这个错误我是很奇怪的。首先我不知道mysqld有自动重启的这个配置(也就是mysql服务如果挂掉,会自动重启,但是如果超过一定的次数就会有这个错误-- start-limit)

系统提示的错误是十分有限的。我打算check一下错误日志,先看看日志放哪里。

vim /etc/my.cnf
# 结果如下。。。


# result
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

lower_case_table_names=1
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

max_allowed_packet=100M
interactive_timeout=28800000
wait_timeout=28800000

嗯。。看到了。"/var/log/mysqld.log",打开mysqld.log,检查一下日志。

vim /var/log/mysqld.log

惊魂的一幕出现了。

2018-02-01T10:20:32.366585Z 189 [Note] Access denied for user 'root'@'140.205.225.192' (using password: NO)
2018-02-01T10:20:32.501455Z 191 [Note] Access denied for user 'root'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:32.630537Z 192 [Note] Access denied for user 'root'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:32.766954Z 193 [Note] Access denied for user 'root'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:32.898277Z 194 [Note] Access denied for user 'root'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:33.032151Z 195 [Note] Access denied for user 'root'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:33.164272Z 196 [Note] Access denied for user 'root'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:33.286177Z 197 [Note] Access denied for user 'admin'@'140.205.225.192' (using password: NO)
2018-02-01T10:20:33.412762Z 198 [Note] Access denied for user 'admin'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:33.546427Z 199 [Note] Access denied for user 'admin'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:33.674729Z 200 [Note] Access denied for user 'admin'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:33.802997Z 201 [Note] Access denied for user 'admin'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:33.932708Z 202 [Note] Access denied for user 'admin'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:34.064128Z 203 [Note] Access denied for user 'admin'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:34.191768Z 204 [Note] Access denied for user 'test'@'140.205.225.192' (using password: NO)
2018-02-01T10:20:34.317222Z 205 [Note] Access denied for user 'test'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:34.450647Z 206 [Note] Access denied for user 'test'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:34.584794Z 207 [Note] Access denied for user 'test'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:34.712521Z 208 [Note] Access denied for user 'test'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:34.843531Z 209 [Note] Access denied for user 'test'@'140.205.225.192' (using password: YES)
2018-02-01T10:20:34.974842Z 210 [Note] Access denied for user 'test'@'140.205.225.192' (using password: YES)
2018-02-01T12:31:02.081525Z 211 [Warning] IP address '58.62.52.180' could not be resolved: Name or service not known
2018-02-01T13:07:58.557754Z 0 [Note] Giving 6 client threads a chance to die gracefully
2018-02-01T13:07:58.644307Z 0 [Note] Shutting down slave threads
2018-02-01T13:08:00.744418Z 0 [Note] Forcefully disconnecting 5 remaining clients
2018-02-01T13:08:00.744494Z 0 [Warning] /usr/sbin/mysqld: Forcing close of thread 104  user: 'root'

2018-02-01T13:08:00.744555Z 0 [Warning] /usr/sbin/mysqld: Forcing close of thread 211  user: 'root'

震惊~

有爆破经验的我意识到我的测试服务器的mysql被爆破了,但是显然没有成功。我很后悔没有对mysql端口进行ip控制(已修复)。但是日志,没有半点异常退出的记录。此时我陷入了沉思,一边思考,一边Google。

问题分析

被攻击的事情先放一放。还是得要先恢复服务。因为日志不显示出来,我怀疑是mysql的日志服务还没启动,就已经挂掉了。没有日志提示,我十分难定位。这个时候只能靠多年自己摸索得出来的经验了。

“一个大型的程序,往往日志服务是最先优先级的。但是如果连日志服务都挂掉的话,可能就是分配内存,空间。或者端口有问题了。端口问题已经被排除。“

当然了,上述只是一个猜想。通过sudo service mysqld status,我看到了两个mysqlserver的进程服务。

  Process: 12686 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
  Process: 12669 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)

后来自己手动分别执行服务,惊奇的发现,有日志了。

2018-02-02T03:07:56.226901Z 0 [ERROR] InnoDB: Write to file ./ibtmp1failed at offset 9437184, 1048576 bytes should have been written, only 221184 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
2018-02-02T03:07:56.226938Z 0 [ERROR] InnoDB: Error number 28 means 'No space left on device'
2018-02-02T03:07:56.226952Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2018-02-02T03:07:56.226967Z 0 [ERROR] InnoDB: Could not set the file size of './ibtmp1'. Probably out of disk space
2018-02-02T03:07:56.226980Z 0 [ERROR] InnoDB: Unable to create the shared innodb_temporary
2018-02-02T03:07:56.226994Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2018-02-02T03:07:56.829843Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2018-02-02T03:07:56.829883Z 0 [ERROR] Plugin 'InnoDB' init function returned error.
2018-02-02T03:07:56.829897Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2018-02-02T03:07:56.829912Z 0 [ERROR] Failed to initialize plugins.
2018-02-02T03:07:56.829922Z 0 [ERROR] Aborting

原来是空间不足,mysql启动后又关闭了。然后又自动重启,陷入了死循环。根本原因是空间不足。

不过奇怪的是,40GB的硬盘,怎么可能硬盘不足呢?Check了一下,

[root@iZ2ze0pmkgrozc32e9vq3iZ ~]# df
Filesystem         1K-blocks    Used     Available Use% Mounted on
/dev/xvda1          41152832 4140124      34899224  100% /
devtmpfs              497684       0        497684   0% /dev
tmpfs                 507292       0        507292   0% /dev/shm
tmpfs                 507292     420        506872   1% /run
tmpfs                 507292       0        507292   0% /sys/fs/cgroup
cloudfs        4398046511104       0 4398046511104   0% /root/CouldDisk
tmpfs                 101460       0        101460   0% /run/user/0

发现根目录的使用是100%,已经没有空间给mysql的Innodb分配空间了

问题解决

问题的原因是我编写的下载器的一个bug,上传后没有把文件删除导致的。然后我把那些大型文件删除了就解决了。意外的震惊是,发现mysql的爆破攻击,而且攻击者用了国内的一个ip代理群。。。

相关文章

  • 惊魂未定,回忆mysql挂掉运维历程。

    5分钟前问题已经修复了,但是这个问题是很难重现的。趁着自己还有印象赶紧写下简书,希望能对后人有所帮助。(环境cen...

  • mysql第一章节-简介,安装

    1、MySQL-DBA工作职责 1.1 开发DBA *** 1.2 运维DBA ***** 1.2.1 初级运维 ...

  • MySQL应用实践

    《老男孩Linux运维》笔记MySQL-Documentation 概述 MySQL介绍 MySQL属于传统关系型...

  • Mysql 运维

    环境:CentOS7版本: 1. 部署 本文仅介绍 rpm 安装方式 1.1. 安装 mysql yum 源 官方...

  • 运维体系历程

    运维体系伴随着软件,互联网发展至今已经从一个纯体力工具支撑,演变到了技术团队的核心竞争力,在最早软件时代开...

  • MySQL DBA

    MySQL DBA大概可以分为两种:一种是开发DBA,一种是运维DBA,这里说的MySQL DBA是指运维DBA。...

  • 阿里P8耗时半年著作全新版PDF抽丝剥茧MySQL、Galera

    《MySQL运维内参:MySQL、Galera、Inception核心原理与最佳实践》是二本详细介绍MySQL数据...

  • 运维内功——MySQL运维篇

    牢记心法准则,让你告别从删库到跑路!!! 1.做好备份是基础,还需要经常性的做回复测试,检查数据的有效性; 2.管...

  • mysql编译选项说明

    编译选项 来自==MySQL运维内参:MySQL、Galera、Inception核心原理与最佳实践==与mysq...

  • Mysql安装与通过xtrabackup备份

    1、Mysql安装 下载并安装 修改默认配置 Mysql简单运维命令 Mysql默认相关目录 2、分配用户权限 创...

网友评论

      本文标题:惊魂未定,回忆mysql挂掉运维历程。

      本文链接:https://www.haomeiwen.com/subject/wimtoxtx.html