美文网首页
OOM问题排查方法

OOM问题排查方法

作者: 定金喜 | 来源:发表于2020-05-20 20:34 被阅读0次

    根据日志确定发生OOM的原因和区域,以下几个内存区域都可能发生OOM,先找到打印出的OOM错误日志和dump文件(linux系统的OOM不会产生),错误日志一般可以通过

    1.堆heap

    错误提示有:
    java.lang.OutOfMemoryError: Java heap space
    java.lang.OutOfMemoryError: unable to create new native thread
    java.lang.OutOfMemoryError:GC overhead limit exceeded
    这三种情况产生的原因都是因为JVM设置的堆内存较小,或者代码中产生了内存泄漏导致耗光了内存,定位步骤可以参照网上的攻略,步骤基本相同
    1.先查看设置的xms和xmx大小,如果太小增大该参数;
    2.如果增大了参数还是会出现此问题,可能是程序中存在内存泄漏,通过工具(jvisualvm或jconsole或mat或jmap等)进行定位:
    ps -ef|grep java 查看所有的java进程,找到出现问题的java进程,例如此pid为45338
    jmap -histo:live 45338|head -100 找到占用内存最大的对象

     num     #instances         #bytes  class name
    ----------------------------------------------
       1:        211963       25960888  [C
       2:        141554       12456752  java.lang.reflect.Method
       3:        262890        8412480  java.util.concurrent.ConcurrentHashMap$Node
       4:        151455        7269840  org.aspectj.weaver.reflect.ShadowMatchImpl
       5:        209866        5036784  java.lang.String
       6:        151455        4846560  org.aspectj.weaver.patterns.ExposedState
       7:         93251        3730040  java.util.LinkedHashMap$Entry
       8:         60755        3402280  java.util.LinkedHashMap
       9:           801        2688000  [Ljava.util.concurrent.ConcurrentHashMap$Node;
      10:         40718        2631792  [Ljava.lang.Object;
      11:         39048        2623560  [Ljava.util.HashMap$Node;
      12:         19925        2468120  [B
      13:         30926        2226672  java.lang.reflect.Field
      14:         18914        2155672  java.lang.Class
      15:         88266        2022544  [Ljava.lang.Class;
      16:         14117        1692472  [I
      17:         39147        1252704  java.util.HashMap$Node
      18:         34709         833016  org.springframework.core.MethodClassKey
      19:         24055         769760  java.lang.ref.WeakReference
      20:         30624         734976  java.util.ArrayList
      21:         27200         684208  [Z
      22:         26754         642088  [Lorg.aspectj.weaver.ast.Var;
      23:         38527         616432  java.lang.Object
      24:         15276         611040  java.util.WeakHashMap$Entry
      25:          7266         547248  [Ljava.lang.reflect.Method;
      26:          6573         525840  java.lang.reflect.Constructor
      27:          7198         518256  org.springframework.core.annotation.AnnotationAttributes
      28:         16154         516928  java.util.LinkedList
      29:         16787         402888  java.util.LinkedList$Node
      30:         24436         390976  java.util.LinkedHashSet
      31:          7998         383904  java.util.HashMap
      32:         12668         366872  [Ljava.lang.String;
      33:          4630         355088  [S
      34:          7749         309960  java.lang.ref.SoftReference
      35:          3059         293664  org.springframework.beans.GenericTypeAwarePropertyDescriptor
      36:          4000         288000  org.springframework.core.type.classreading.AnnotationMetadataReadingVisitor
      37:          5054         283024  java.lang.Class$ReflectionData
      38:          6275         251000  com.sun.org.apache.xerces.internal.dom.DeferredAttrImpl
      39:         10072         241728  sun.reflect.generics.tree.SimpleClassTypeSignature
      40:          5895         235800  com.sun.org.apache.xerces.internal.dom.DeferredTextImpl
      41:          9744         233856  sun.reflect.annotation.AnnotationInvocationHandler
      42:          4727         226896  org.springframework.core.ResolvableType
      43:          9152         219648  org.springframework.cglib.core.Signature
      44:          8533         204792  java.beans.MethodRef
      45:          8972         199592  [Ljava.lang.reflect.Type;
      46:           451         198640  [Ljava.util.WeakHashMap$Entry;
      47:         10072         191568  [Lsun.reflect.generics.tree.TypeArgument;
      48:          3129         175224  com.sun.org.apache.xerces.internal.dom.DeferredElementImpl
      49:         10371         165936  java.util.LinkedHashMap$LinkedValues
      50:           976         156160  org.springframework.beans.factory.support.RootBeanDefinition
      51:          2615         152992  [Ljava.lang.reflect.Field;
      52:          6364         152736  java.util.Collections$UnmodifiableRandomAccessList
    

    找到异常的对象实例后进行定位,排查出代码存在的问题

    2.栈

    java.lang.StackOverflowError
    java.lang.OutOfMemoryError: Java heap space

    StackOverflowError出现的原因一般是递归的条件控制不好导致的无限递归调用而抛出的异常,后面异常同堆异常的排查方法。

    3.直接内存

    Exception in thread"main"java.lang.OutOfMemoryError
    at sun.misc.Unsafe.allocateMemory(Native Method)
    at org.fenixsoft.oom.DMOOM.main(DMOOM.java:20)
    一般有如下三种方法可以申请直接内存(堆外内存)
    1.Unsafe.allocateMemory()
    2.ByteBuffer.allocateDirect()
    3.native方法
    具体可以参考这篇文章:
    https://blog.csdn.net/zhuguang10/article/details/88966563

    4. linux系统 OOM-Killer

    例如jvm配置为

    -Xms6g -Xmx6g -Xss512k -XX:MetaspaceSize=256m 
    -XX:MaxMetaspaceSize=256m
    -XX:NewRatio=1 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
    -XX:+CMSClassUnloadingEnabled -XX:+DisableExplicitGC 
    -XX:MaxTenuringThreshold=2 -XX:CMSInitiatingOccupancyFraction=75
    -verbose:gc -Xloggc:/Users/ding/Desktop/logs/gc.log -XX:+PrintGCDetails 
    -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError
    -XX:HeapDumpPath=/Users/ding/Desktop/logs
    -XX:ErrorFile=/Users/ding/Desktop/logs/hs_err_pid%p.log
    -Dons.client.logRoot=/Users/ding/Desktop/logs -Dons.client.logLevel=WARN 
    -Dons.client.logFileMaxIndex=2
    

    如果此时业务日志文件没有报错日志,也没有生成dump文件,但是hs_err_pid文件会会产生此类日志

    -bash-4.2$ cat hs_err_pid8777.log 
    #
    # There is insufficient memory for the Java Runtime Environment to continue.
    # Native memory allocation (mmap) failed to map 249561088 bytes for committing reserved memory.
    # Possible reasons:
    #   The system is out of physical RAM or swap space
    #   In 32 bit mode, the process size limit was hit
    # Possible solutions:
    #   Reduce memory load on the system
    #   Increase physical memory or swap space
    #   Check if swap backing store is full
    #   Use 64 bit Java on a 64 bit OS
    #   Decrease Java heap size (-Xmx/-Xms)
    #   Decrease number of Java threads
    #   Decrease Java thread stack sizes (-Xss)
    #   Set larger code cache with -XX:ReservedCodeCacheSize=
    # This output file may be truncated or incomplete.
    #
    #  Out of Memory Error (os_linux.cpp:2640), pid=8777, tid=0x00002abdf7403700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    

    此时大概可以定位到是触发了linux的oom-killer,它是Linux内核的一种内存管理机制,在系统可用内存较少的情况下,内核为保证系统还能够继续运行下去,会选择杀掉一些进程释放掉一些内存。通常oom_killer的触发流程是:进程A想要分配物理内存(通常是读写内存)->触发缺页异常->内核去分配物理内存->物理内存不足,触发OOM;
    查看/var/log/message(Out of memory: Kill process 29164 (java) score 908 or sacrifice child)
    /var/log/message包括整体系统信息,其中也包含系统启动期间的日志。此外,mail,cron,daemon,kern和auth等内容也记录在var/log/messages日志中

    Jul  18  14:35:19  izbp764664httdtdff11wtycz  systemd:  Starting  Session  126131  of  user  admin.
    Jul  18  14:38:43  izbp764664httdtdff11wtycz  kernel:  java  invoked  oom-killer:  gfp_mask=0x280da,  order=0,  oom_score_adj=0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  java  cpuset=/  mems_allowed=0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  CPU:  1  PID:  29168  Comm:  java  Not  tainted  3.10.0-693.17.1.el7.x86_64  #1
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  Hardware  name:  Alibaba  Cloud  Alibaba  Cloud  ECS,  BIOS  rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.
    org  04/01/2014
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  Call  Trace:
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff816a6071>]  dump_stack+0x19/0x1b
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff816a1466>]  dump_header+0x90/0x229
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff810eb28c>]  ?  ktime_get_ts64+0x4c/0xf0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff8113f24f>]  ?  delayacct_end+0x8f/0xb0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff81188274>]  oom_kill_process+0x254/0x3d0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff81187d1d>]  ?  oom_unkillable_task+0xcd/0x120
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff81187dc6>]  ?  find_lock_task_mm+0x56/0xc0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff81188ab6>]  out_of_memory+0x4b6/0x4f0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff816a1f6a>]  __alloc_pages_slowpath+0x5d6/0x724
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff8118ec85>]  __alloc_pages_nodemask+0x405/0x420
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff811d6255>]  alloc_pages_vma+0xb5/0x200
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff811b4480>]  handle_mm_fault+0xb60/0xfa0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff816b37e4>]  __do_page_fault+0x154/0x450
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff816b3bc6>]  trace_do_page_fault+0x56/0x150
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff816b325a>]  do_async_page_fault+0x1a/0xd0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  [<ffffffff816af928>]  async_page_fault+0x28/0x30
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  Mem-Info:
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  active_anon:915301  inactive_anon:36  isolated_anon:0#012  active_file:963  inactive_file:995  isolated_file:0#012
    unevictable:0  dirty:11  writeback:0  unstable:0#012  slab_reclaimable:12666  slab_unreclaimable:4309#012  mapped:426  shmem:110  pagetables:3300  bounce:0#012  free:21
    938  free_pcp:261  free_cma:0
    Jul  18  14:38:44  izbp764664httdtdff11wtycz  kernel:  Node  0 DMA free:15348kB min:276kB low:344kB high:412kB active_anon:420kB inactive_anon:0kB active_file:0kB ina
    ctive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: lowmem_reserve[]: 0 2814 3773 3773
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Node 0 DMA32 free:54208kB min:50208kB low:62760kB high:75312kB active_anon:2756212kB inactive_anon:88kB active_file:2004kB inactive_file:2048kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2884196kB mlocked:0kB dirty:36kB writeback:0kB mapped:1076kB shmem:296kB slab_reclaimable:38172kB slab_unreclaimable:10040kB kernel_stack:2912kB pagetables:8532kB unstable:0kB bounce:0kB free_pcp:148kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: lowmem_reserve[]: 0 0 958 958
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Node 0 Normal free:16980kB min:17096kB low:21368kB high:25644kB active_anon:904572kB inactive_anon:56kB active_file:1592kB inactive_file:1428kB unevictable:0kB isolated(anon):0kB isolated(file):100kB present:1048576kB managed:981532kB mlocked:0kB dirty:8kB writeback:0kB mapped:728kB shmem:144kB slab_reclaimable:12488kB slab_unreclaimable:7176kB kernel_stack:1552kB pagetables:4636kB unstable:0kB bounce:0kB free_pcp:204kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:7 all_unreclaimable? no
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: lowmem_reserve[]: 0 0 0 0
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Node 0 DMA: 2*4kB (UE) 7*8kB (UEM) 2*16kB (UE) 2*32kB (UE) 3*64kB (UEM) 1*128kB (E) 2*256kB (UE) 2*512kB (EM) 3*1024kB (UEM) 1*2048kB (E) 2*4096kB (M) = 15328kB
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Node 0 DMA32: 432*4kB (UEM) 1022*8kB (UEM) 866*16kB (UEM) 436*32kB (UEM) 120*64kB (UEM) 29*128kB (UEM) 10*256kB (UE) 4*512kB (UE) 1*1024kB (M) 0*2048kB 0*4096kB = 54736kB
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Node 0 Normal: 102*4kB (EM) 364*8kB (UEM) 274*16kB (UEM) 101*32kB (UEM) 43*64kB (UEM) 12*128kB (UEM) 3*256kB (UM) 2*512kB (UM) 0*1024kB 0*2048kB 0*4096kB = 17016kB
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: 325 total pagecache pages
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: 0 pages in swap cache
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Swap cache stats: add 0, delete 0, find 0/0
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Free swap  = 0kB
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Total swap = 0kB
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: 1048446 pages RAM
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: 0 pages HighMem/MovableOnly
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: 78037 pages reserved
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  336]     0   336    22802      104      50        0             0 systemd-journal
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  359]     0   359    11092      262      22        0         -1000 systemd-udevd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  393]     0   393    13863      114      27        0         -1000 auditd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  412]    81   412     6137      118      17        0          -900 dbus-daemon
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  414]   998   414   133562     1422      58        0             0 polkitd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  415]     0   415     5369       62      14        0             0 irqbalance
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  417]     0   417     6129      167      16        0             0 systemd-logind
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  419]     0   419   171693      444     154        0             0 rsyslogd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  427]     0   427     6464       51      18        0             0 atd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  442]     0   442    27511       34      10        0             0 agetty
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  444]     0   444    27511       32      11        0             0 agetty
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  713]     0   713   140599     3186      93        0             0 tuned
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [  738]    38   738     7476      159      19        0             0 ntpd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 1105]     0  1105    26499      245      54        0         -1000 sshd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 1138]     0  1138    16599      204      16        0             0 aliyun-service
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 1293]     0  1293     8657      313      20        0             0 staragentd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 1294]     0  1294   323022     3276      68        0             0 staragent-core
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 1295]     0  1295   200147    15278      71        0             0 staragent-ppf
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [18215]     0 18215    30529      734      61        0             0 nginx
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [28759]     0 28759    31558      155      18        0             0 crond
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 1327]  1000  1327    30530      835      60        0             0 nginx
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 1328]  1000  1328    30530      835      60        0             0 nginx
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 3947]     0  3947     8179      415      20        0             0 AliYunDunUpdate
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [ 4051]     0  4051    31625      697      59        0             0 AliYunDun
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [20736]     0 20736    56938      867      25        0             0 CmsGoAgent.linu
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [20742]     0 20742   199942     2717      40        0             0 exe
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [21951]     0 21951    36554      327      73        0             0 sshd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [21955]  1000 21955    36554      326      71        0             0 sshd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [21956]  1000 21956    28901      126      15        0             0 bash
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [29164]  1000 29164  1550710   879523    1872        0             0 java
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [29499]     0 29499    36554      326      73        0             0 sshd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [29501]  1000 29501    36554      332      72        0             0 sshd
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [29502]  1000 29502    28871      126      13        0             0 bash
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: [29617]  1000 29617    26986       25      10        0             0 tail
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Out of memory: Kill process 29164 (java) score 908 or sacrifice child
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: Killed process 29164 (java) total-vm:6202840kB, anon-rss:3518092kB, file-rss:0kB, shmem-rss:0kB
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: exe invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: exe cpuset=/ mems_allowed=0
    Jul 18 14:38:44 izbp764664httdtdff11wtycz kernel: CPU: 0 PID: 20750 Comm: exe Not tainted 3.10.0-693.17.1.el7.x86_64 #1
    

    机器是8G内存,java分配了6G堆内存,导致后续运行机器内存不足;修改JVM参数成4G,-Xms4g -Xmx4g,修改参数后再次oom时生成了dump文件,经过分析是exel解析出现问题,导致内存不够出现oom

    5.永久代或者元数据空间

    jdk1.8以后移除永久代,增加了元数据空间,常量保存在堆中,方法区放在元数据空间中,运行时出现此异常一般是动态代理生成class文件导致,例如jsp运行时的动态编译产生

    参考文章
    https://blog.csdn.net/windforce828/article/details/104819055/

    相关文章

      网友评论

          本文标题:OOM问题排查方法

          本文链接:https://www.haomeiwen.com/subject/spigohtx.html