问题
线上Java进程无缘无故被干掉,重启任务又为啥没生效,这究竟是道德的沦丧,还是人性的扭曲?
- 问题1. 线上Java进程没了,没有dmup文件(java_pidxxx.hprof),只有日志(hs_err_pidxxx.log)
- 问题2. crontab任务没有自动重启java进程
解决
- 问题1. 缩小jvm内存分配 -Xmx
- 问题2. 在crontab执行的脚本里,手动source环境变量
原因和定位
问题1
java进程占用内存过大,威胁到linux,触发了linux的OOM-Killer,被linux kill,还来不及输出dump文件;如果是自身oom,会输出dump文件;缩小jvm内存分配后,线上有dmup文件生成
- 查看hs_err_pidxxx.log,日志开头有提示(Decrease Java heap size (-Xmx/-Xms))
-bash-4.2$ cat hs_err_pid8777.log
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 249561088 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (os_linux.cpp:2640), pid=8777, tid=0x00002abdf7403700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
- 查看/var/log/messages(Out of memory: Kill process 29164 (java) score 908 or sacrifice child)
/var/log/message包括整体系统信息,其中也包含系统启动期间的日志。此外,mail,cron,daemon,kern和auth等内容也记录在var/log/messages日志中。
- 查看/var/log/messages(Out of memory: Kill process 29164 (java) score 908 or sacrifice child)
Jul 18 14:35:19 izbp1edkkjhg2sy911wtycz systemd: Starting Session 126131 of user admin.
Jul 18 14:38:43 izbp1edkkjhg2sy911wtycz kernel: java invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: java cpuset=/ mems_allowed=0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: CPU: 1 PID: 29168 Comm: java Not tainted 3.10.0-693.17.1.el7.x86_64 #1
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.
org 04/01/2014
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Call Trace:
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff816a6071>] dump_stack+0x19/0x1b
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff816a1466>] dump_header+0x90/0x229
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff810eb28c>] ? ktime_get_ts64+0x4c/0xf0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff8113f24f>] ? delayacct_end+0x8f/0xb0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff81188274>] oom_kill_process+0x254/0x3d0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff81187d1d>] ? oom_unkillable_task+0xcd/0x120
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff81187dc6>] ? find_lock_task_mm+0x56/0xc0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff81188ab6>] out_of_memory+0x4b6/0x4f0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff816a1f6a>] __alloc_pages_slowpath+0x5d6/0x724
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff8118ec85>] __alloc_pages_nodemask+0x405/0x420
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff811d6255>] alloc_pages_vma+0xb5/0x200
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff811b4480>] handle_mm_fault+0xb60/0xfa0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff816b37e4>] __do_page_fault+0x154/0x450
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff816b3bc6>] trace_do_page_fault+0x56/0x150
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff816b325a>] do_async_page_fault+0x1a/0xd0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [<ffffffff816af928>] async_page_fault+0x28/0x30
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Mem-Info:
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: active_anon:915301 inactive_anon:36 isolated_anon:0#012 active_file:963 inactive_file:995 isolated_file:0#012
unevictable:0 dirty:11 writeback:0 unstable:0#012 slab_reclaimable:12666 slab_unreclaimable:4309#012 mapped:426 shmem:110 pagetables:3300 bounce:0#012 free:21
938 free_pcp:261 free_cma:0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Node 0 DMA free:15348kB min:276kB low:344kB high:412kB active_anon:420kB inactive_anon:0kB active_file:0kB ina
ctive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: lowmem_reserve[]: 0 2814 3773 3773
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Node 0 DMA32 free:54208kB min:50208kB low:62760kB high:75312kB active_anon:2756212kB inactive_anon:88kB active_file:2004kB inactive_file:2048kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2884196kB mlocked:0kB dirty:36kB writeback:0kB mapped:1076kB shmem:296kB slab_reclaimable:38172kB slab_unreclaimable:10040kB kernel_stack:2912kB pagetables:8532kB unstable:0kB bounce:0kB free_pcp:148kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: lowmem_reserve[]: 0 0 958 958
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Node 0 Normal free:16980kB min:17096kB low:21368kB high:25644kB active_anon:904572kB inactive_anon:56kB active_file:1592kB inactive_file:1428kB unevictable:0kB isolated(anon):0kB isolated(file):100kB present:1048576kB managed:981532kB mlocked:0kB dirty:8kB writeback:0kB mapped:728kB shmem:144kB slab_reclaimable:12488kB slab_unreclaimable:7176kB kernel_stack:1552kB pagetables:4636kB unstable:0kB bounce:0kB free_pcp:204kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:7 all_unreclaimable? no
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: lowmem_reserve[]: 0 0 0 0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Node 0 DMA: 2*4kB (UE) 7*8kB (UEM) 2*16kB (UE) 2*32kB (UE) 3*64kB (UEM) 1*128kB (E) 2*256kB (UE) 2*512kB (EM) 3*1024kB (UEM) 1*2048kB (E) 2*4096kB (M) = 15328kB
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Node 0 DMA32: 432*4kB (UEM) 1022*8kB (UEM) 866*16kB (UEM) 436*32kB (UEM) 120*64kB (UEM) 29*128kB (UEM) 10*256kB (UE) 4*512kB (UE) 1*1024kB (M) 0*2048kB 0*4096kB = 54736kB
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Node 0 Normal: 102*4kB (EM) 364*8kB (UEM) 274*16kB (UEM) 101*32kB (UEM) 43*64kB (UEM) 12*128kB (UEM) 3*256kB (UM) 2*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 17016kB
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: 325 total pagecache pages
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: 0 pages in swap cache
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Swap cache stats: add 0, delete 0, find 0/0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Free swap = 0kB
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Total swap = 0kB
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: 1048446 pages RAM
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: 0 pages HighMem/MovableOnly
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: 78037 pages reserved
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 336] 0 336 22802 104 50 0 0 systemd-journal
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 359] 0 359 11092 262 22 0 -1000 systemd-udevd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 393] 0 393 13863 114 27 0 -1000 auditd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 412] 81 412 6137 118 17 0 -900 dbus-daemon
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 414] 998 414 133562 1422 58 0 0 polkitd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 415] 0 415 5369 62 14 0 0 irqbalance
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 417] 0 417 6129 167 16 0 0 systemd-logind
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 419] 0 419 171693 444 154 0 0 rsyslogd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 427] 0 427 6464 51 18 0 0 atd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 442] 0 442 27511 34 10 0 0 agetty
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 444] 0 444 27511 32 11 0 0 agetty
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 713] 0 713 140599 3186 93 0 0 tuned
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 738] 38 738 7476 159 19 0 0 ntpd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 1105] 0 1105 26499 245 54 0 -1000 sshd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 1138] 0 1138 16599 204 16 0 0 aliyun-service
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 1293] 0 1293 8657 313 20 0 0 staragentd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 1294] 0 1294 323022 3276 68 0 0 staragent-core
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 1295] 0 1295 200147 15278 71 0 0 staragent-ppf
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [18215] 0 18215 30529 734 61 0 0 nginx
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [28759] 0 28759 31558 155 18 0 0 crond
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 1327] 1000 1327 30530 835 60 0 0 nginx
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 1328] 1000 1328 30530 835 60 0 0 nginx
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 3947] 0 3947 8179 415 20 0 0 AliYunDunUpdate
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [ 4051] 0 4051 31625 697 59 0 0 AliYunDun
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [20736] 0 20736 56938 867 25 0 0 CmsGoAgent.linu
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [20742] 0 20742 199942 2717 40 0 0 exe
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [21951] 0 21951 36554 327 73 0 0 sshd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [21955] 1000 21955 36554 326 71 0 0 sshd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [21956] 1000 21956 28901 126 15 0 0 bash
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [29164] 1000 29164 1550710 879523 1872 0 0 java
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [29499] 0 29499 36554 326 73 0 0 sshd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [29501] 1000 29501 36554 332 72 0 0 sshd
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [29502] 1000 29502 28871 126 13 0 0 bash
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: [29617] 1000 29617 26986 25 10 0 0 tail
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Out of memory: Kill process 29164 (java) score 908 or sacrifice child
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: Killed process 29164 (java) total-vm:6202840kB, anon-rss:3518092kB, file-rss:0kB, shmem-rss:0kB
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: exe invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: exe cpuset=/ mems_allowed=0
Jul 18 14:38:44 izbp1edkkjhg2sy911wtycz kernel: CPU: 0 PID: 20750 Comm: exe Not tainted 3.10.0-693.17.1.el7.x86_64 #1
- 机器是8G内存,java分配了6G堆内存,导致后续运行机器内存不足;修改JVM参数成4G,-Xms4g -Xmx4g
-
修改参数后,线上有出现过dmup文件,使用MAT可以分析如下,Excel解析问题
image.png
-
问题2
直接执行重启脚本能重启,crontab触发就不行,原来是java命令找不到
- 查看/var/log/cron,定时任务有调用脚本
- 把脚本调用的日志重定向到指定文件,发现问题
网友评论