Redis 故障排查

作者: _大叔_ | 来源:发表于2020-09-28 10:04 被阅读0次

Redis 故障排查
vSphere 性能优化方法 & 故障排错方法及工具总结（二）
vsphere 故障排查
电脑直连路由器无法登陆WEB界面
Linux基础-Linux实用故障排查
售货机工作职责
AP无法上线AC
线上linux系统故障排查之一：CPU使用率过高
BGP故障排查
404故障排查

最近公司要做一天8亿级数据的缓存，然后让我对redis进行一波性能测试，但是今天发现redis突然没有在运行，并且内存没有任何占用情况。然后我就想到先查看日志，如下

                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 6.0.8 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in standalone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 14135
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

14135:M 27 Sep 2020 20:36:47.498 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
14135:M 27 Sep 2020 20:36:47.498 # Server initialized
14135:M 27 Sep 2020 20:36:47.498 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
14135:M 27 Sep 2020 20:36:47.498 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
14135:M 27 Sep 2020 20:36:47.498 * Ready to accept connections

日志除了启动信息以外并没有输出任何挂掉的信息。如果 redis 是被 cli 关掉的话，会在日志信息中有bye bye的信息，但日志没有，那就是可能是被kill的，通过如下命令可以查看情况。

[root@localhost log]# dmesg | egrep -i 'killed process'
[ 1620.429506] Killed process 9696 (redis-server), UID 0, total-vm:23432912kB, anon-rss:18961524kB, file-rss:68kB, shmem-rss:0kB
[338063.393365] Killed process 13956 (redis-server), UID 0, total-vm:33468100kB, anon-rss:26147740kB, file-rss:0kB, shmem-rss:0kB
[367433.290406] Killed process 14135 (redis-server), UID 0, total-vm:39923260kB, anon-rss:34407376kB, file-rss:0kB, shmem-rss:0kB

发现 redis 的确是被kill掉的，我启动redis的进程是 14135，刚好查到 killed process 是有 14135。这台服务器只有我自己知道，所以可以直接排除是人为的情况。那程序被kill调就只有 linux 自己的策略了，我们知道 linux 是有oom的策略具体根据设置的 oom_score_adj 的值有关，那我们直接查下是不是 oom 原因杀死，命令如下

[root@localhost log]# grep "Out of memory" /var/log/messages  
Sep 27 16:45:11 localhost kernel: Out of memory: Kill process 13445 (redis-server) score 747 or sacrifice child
Sep 28 00:54:43 localhost kernel: Out of memory: Kill process 14135 (redis-server) score 951 or sacrifice child

看来真的是内存不够用把redis 给kill 了。如果是被其他用户 kill掉的话我们该怎么排查？
先查询最近哪些用户登录

[root@localhost log]# last
root     pts/3        192.168.200.89   Sun Sep 27 17:18   still logged in   
root     pts/1        192.168.200.89   Sun Sep 27 12:59   still logged in

符号	描述
root	用户
pts/3	终端
192.168.200.89	登录者IP
Sun Sep 27 17:18	登录时间
still logged in （还在线）	登录状态(距离上次登录时间)

知道了以后也可以只查看某个用户，我这里只有root用户，实际情况中，每个人都应该有一个账户，root只有超级管理员拥有，否则都用root用户是无法排查出来的。

[root@localhost log]# last root
root     pts/3        192.168.200.89   Sun Sep 27 17:18   still logged in   
root     pts/1        192.168.200.89   Sun Sep 27 12:59   still logged in   
root     pts/2        192.168.200.89   Sun Sep 27 09:47   still logged in   
root     pts/1        192.168.200.89   Sun Sep 27 09:46 - 12:59  (03:12)

history 命令，可以把用户所用过的历史命令查出来，每个用户都会有这样一个文件。

指令	描述
-c	清空当前历史命令
-a	将历史命令缓冲区中命令写入历史命令文件【/root/.bash_history】
-r	将历史命令文件中的命令读入当前历史命令缓冲区
-w	将当前历史命令缓冲区命令写入历史命令文件中【/root/.bash_history】
n	如果n=3 打印最近3条历史命令

[root@localhost log]# history 10
 1030  w rott
 1031  w root
 1032  history
 1033  last
 1034  last root
 1035  w
 1036  lastlog
 1037  history
 1038  history -h
 1039  history 10

默认 history 不带执行时间，所以如果是同一个用户没办法区分是谁使用了 kill 造成破坏。
让 history 带有时间

echo 'export HISTTIMEFORMAT="%F %T  "' >> /etc/bashrc
source /etc/bashrc

[root@localhost log]# history 6
 1048  2020-09-28 11:03:21  echo 'export HISTTIMEFORMAT="%F %T  "' >> /etc/bashrc
 1049  2020-09-28 11:03:25  source /etc/bashrc
 1050  2020-09-28 11:03:27  history 10
 1051  2020-09-28 11:04:27  ls
 1052  2020-09-28 11:04:30  history 10
 1053  2020-09-28 11:04:45  history 6

网友评论

本文标题：Redis 故障排查

本文链接：https://www.haomeiwen.com/subject/evxduktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Redis 故障排查

相关文章