最近公司要做一天8亿级数据的缓存,然后让我对redis进行一波性能测试,但是今天发现redis突然没有在运行,并且内存没有任何占用情况。然后我就想到先查看日志,如下
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 6.0.8 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 14135
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
14135:M 27 Sep 2020 20:36:47.498 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
14135:M 27 Sep 2020 20:36:47.498 # Server initialized
14135:M 27 Sep 2020 20:36:47.498 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
14135:M 27 Sep 2020 20:36:47.498 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo madvise > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled (set to 'madvise' or 'never').
14135:M 27 Sep 2020 20:36:47.498 * Ready to accept connections
日志除了启动信息以外并没有输出任何挂掉的信息。如果 redis 是被 cli 关掉的话,会在日志信息中有bye bye的信息,但日志没有,那就是可能是被kill的,通过如下命令可以查看情况。
[root@localhost log]# dmesg | egrep -i 'killed process'
[ 1620.429506] Killed process 9696 (redis-server), UID 0, total-vm:23432912kB, anon-rss:18961524kB, file-rss:68kB, shmem-rss:0kB
[338063.393365] Killed process 13956 (redis-server), UID 0, total-vm:33468100kB, anon-rss:26147740kB, file-rss:0kB, shmem-rss:0kB
[367433.290406] Killed process 14135 (redis-server), UID 0, total-vm:39923260kB, anon-rss:34407376kB, file-rss:0kB, shmem-rss:0kB
发现 redis 的确是被kill掉的,我启动redis的进程是 14135,刚好查到 killed process 是有 14135。这台服务器只有我自己知道,所以可以直接排除是人为的情况。那程序被kill调就只有 linux 自己的策略了,我们知道 linux 是有oom的策略具体根据设置的 oom_score_adj 的值有关,那我们直接查下是不是 oom 原因杀死,命令如下
[root@localhost log]# grep "Out of memory" /var/log/messages
Sep 27 16:45:11 localhost kernel: Out of memory: Kill process 13445 (redis-server) score 747 or sacrifice child
Sep 28 00:54:43 localhost kernel: Out of memory: Kill process 14135 (redis-server) score 951 or sacrifice child
看来真的是内存不够用把redis 给kill 了。如果是被其他用户 kill掉的话我们该怎么排查?
先查询最近哪些用户登录
[root@localhost log]# last
root pts/3 192.168.200.89 Sun Sep 27 17:18 still logged in
root pts/1 192.168.200.89 Sun Sep 27 12:59 still logged in
符号 | 描述 |
---|---|
root | 用户 |
pts/3 | 终端 |
192.168.200.89 | 登录者IP |
Sun Sep 27 17:18 | 登录时间 |
still logged in (还在线) | 登录状态(距离上次登录时间) |
知道了以后也可以只查看某个用户,我这里只有root用户,实际情况中,每个人都应该有一个账户,root只有超级管理员拥有,否则都用root用户是无法排查出来的。
[root@localhost log]# last root
root pts/3 192.168.200.89 Sun Sep 27 17:18 still logged in
root pts/1 192.168.200.89 Sun Sep 27 12:59 still logged in
root pts/2 192.168.200.89 Sun Sep 27 09:47 still logged in
root pts/1 192.168.200.89 Sun Sep 27 09:46 - 12:59 (03:12)
history 命令,可以把用户所用过的历史命令查出来,每个用户都会有这样一个文件。
指令 | 描述 |
---|---|
-c | 清空当前历史命令 |
-a | 将历史命令缓冲区中命令写入历史命令文件【/root/.bash_history】 |
-r | 将历史命令文件中的命令读入当前历史命令缓冲区 |
-w | 将当前历史命令缓冲区命令写入历史命令文件中【/root/.bash_history】 |
n | 如果n=3 打印最近3条历史命令 |
[root@localhost log]# history 10
1030 w rott
1031 w root
1032 history
1033 last
1034 last root
1035 w
1036 lastlog
1037 history
1038 history -h
1039 history 10
默认 history 不带执行时间,所以如果是同一个用户没办法区分是谁使用了 kill 造成破坏。
让 history 带有时间
echo 'export HISTTIMEFORMAT="%F %T "' >> /etc/bashrc
source /etc/bashrc
[root@localhost log]# history 6
1048 2020-09-28 11:03:21 echo 'export HISTTIMEFORMAT="%F %T "' >> /etc/bashrc
1049 2020-09-28 11:03:25 source /etc/bashrc
1050 2020-09-28 11:03:27 history 10
1051 2020-09-28 11:04:27 ls
1052 2020-09-28 11:04:30 history 10
1053 2020-09-28 11:04:45 history 6
网友评论