美文网首页
G1 to-space exhausted 的问题

G1 to-space exhausted 的问题

作者: 小菜Yang | 来源:发表于2020-10-22 12:37 被阅读0次

问题描述

线上支付元数据系统出现告警,堆内存使用率超过95%

应用背景

应用说明:该系统存储了很多账户-银行的配置信息,供业务系统查询使用,业务系统调用是全量获取的,所以会产生些大对象(将list转化为json返回)
JVM参数如下:

-Xms6144m -Xmx6144m -XX:+UseG1GC -XX:MaxGCPauseMillis=500 
-Xloggc:/home/finance/Logs/xxx/xxx-gc.log 
-XX:+PrintGCDetails 
-XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=50M 

GC日志如下:

2020-09-16T19:29:36.138+0800: 1113411.729: [GC pause (G1 Humongous Allocation) (young) (initial-mark), 0.0486270 secs]
   [Parallel Time: 44.3 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 1113411731.0, Avg: 1113411731.0, Max: 1113411731.0, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 3.8, Avg: 4.2, Max: 4.5, Diff: 0.6, Sum: 16.6]
      [Update RS (ms): Min: 34.6, Avg: 34.7, Max: 34.7, Diff: 0.1, Sum: 138.7]
         [Processed Buffers: Min: 184, Avg: 205.5, Max: 229, Diff: 45, Sum: 822]
      [Scan RS (ms): Min: 1.0, Avg: 1.1, Max: 1.2, Diff: 0.2, Sum: 4.5]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 3.6, Avg: 4.0, Max: 4.3, Diff: 0.7, Sum: 15.9]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.2, Max: 2, Diff: 1, Sum: 5]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 43.9, Avg: 44.0, Max: 44.0, Diff: 0.1, Sum: 175.8]
      [GC Worker End (ms): Min: 1113411775.0, Avg: 1113411775.0, Max: 1113411775.0, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 4.3 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 1.2 ms]
      [Free CSet: 0.2 ms]
   [Eden: 14.0M(3682.0M)->0.0B(3680.0M) Survivors: 4096.0K->6144.0K Heap: 2634.4M(6144.0M)->1491.6M(6144.0M)]
 [Times: user=0.18 sys=0.00, real=0.04 secs]
2020-09-16T19:29:36.187+0800: 1113411.778: [GC concurrent-root-region-scan-start]
2020-09-16T19:29:36.203+0800: 1113411.793: [GC concurrent-root-region-scan-end, 0.0154486 secs]
2020-09-16T19:29:36.203+0800: 1113411.793: [GC concurrent-mark-start]
2020-09-16T19:29:36.637+0800: 1113412.227: [GC concurrent-mark-end, 0.4340025 secs]

2020-09-16T19:29:36.651+0800: 1113412.241: [GC remark 2020-09-16T19:29:36.651+0800: 1113412.241: [Finalize Marking, 0.0014060 secs] 2020-09-16T19:29:366
.652+0800: 1113412.243: [GC ref-proc, 0.0011151 secs] 2020-09-16T19:29:36.653+0800: 1113412.244: [Unloading, 0.0640499 secs], 0.0700085 secs]
 [Times: user=0.24 sys=0.00, real=0.07 secs]
2020-09-16T19:29:36.723+0800: 1113412.313: [GC cleanup 2085M->1645M(6144M), 0.0041280 secs]
 [Times: user=0.02 sys=0.00, real=0.00 secs]
2020-09-16T19:29:36.727+0800: 1113412.318: [GC concurrent-cleanup-start]
2020-09-16T19:29:36.728+0800: 1113412.318: [GC concurrent-cleanup-end, 0.0003667 secs]

2020-09-16T21:34:35.527+0800: 1120911.117: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 1.2081112 secs]
   [Parallel Time: 226.4 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 1120911125.5, Avg: 1120911125.6, Max: 1120911125.7, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 4.0, Avg: 4.6, Max: 5.7, Diff: 1.6, Sum: 18.4]
      [Update RS (ms): Min: 23.9, Avg: 24.0, Max: 24.2, Diff: 0.3, Sum: 96.1]
         [Processed Buffers: Min: 98, Avg: 109.5, Max: 119, Diff: 21, Sum: 438]
      [Scan RS (ms): Min: 2.4, Avg: 2.5, Max: 2.6, Diff: 0.2, Sum: 10.1]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.3]
      [Object Copy (ms): Min: 190.3, Avg: 193.9, Max: 195.4, Diff: 5.1, Sum: 775.5]
      [Termination (ms): Min: 0.0, Avg: 0.9, Max: 3.7, Diff: 3.7, Sum: 3.8]
         [Termination Attempts: Min: 1, Avg: 1.2, Max: 2, Diff: 1, Sum: 5]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.2]
      [GC Worker Total (ms): Min: 226.0, Avg: 226.1, Max: 226.3, Diff: 0.3, Sum: 904.3]
      [GC Worker End (ms): Min: 1120911351.6, Avg: 1120911351.7, Max: 1120911351.8, Diff: 0.2]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.8 ms]
   [Other: 980.8 ms]
      [Evacuation Failure: 968.8 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 7.7 ms]
      [Humongous Reclaim: 2.2 ms]
      [Free CSet: 1.2 ms]
   [Eden: 2942.0M(3680.0M)->0.0B(306.0M) Survivors: 6144.0K->0.0B Heap: 5889.8M(6144.0M)->2864.0M(6144.0M)]
 [Times: user=4.17 sys=0.03, real=1.21 secs] 

2020-09-16T21:56:31.801+0800: 1122227.392: [GC pause (G1 Evacuation Pause) (mixed), 0.0330733 secs]
   [Parallel Time: 29.8 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 1122227392.3, Avg: 1122227392.4, Max: 1122227392.4, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 3.4, Avg: 3.8, Max: 4.5, Diff: 1.2, Sum: 15.2]
      [Update RS (ms): Min: 21.0, Avg: 21.4, Max: 21.5, Diff: 0.5, Sum: 85.6]
         [Processed Buffers: Min: 35, Avg: 50.8, Max: 59, Diff: 24, Sum: 203]
      [Scan RS (ms): Min: 1.0, Avg: 1.2, Max: 1.5, Diff: 0.5, Sum: 4.9]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 2.6, Avg: 3.2, Max: 3.7, Diff: 1.1, Sum: 12.7]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
      [GC Worker Total (ms): Min: 29.6, Avg: 29.6, Max: 29.7, Diff: 0.1, Sum: 118.5]
      [GC Worker End (ms): Min: 1122227422.0, Avg: 1122227422.0, Max: 1122227422.0, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.3 ms]
   [Other: 3.0 ms]
      [Choose CSet: 0.1 ms]
      [Ref Proc: 1.4 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.7 ms]
   [Eden: 306.0M(306.0M)->0.0B(2744.0M) Survivors: 0.0B->2048.0K Heap: 3170.0M(6144.0M)->2602.0M(6144.0M)]
 [Times: user=0.11 sys=0.01, real=0.03 secs] 

2020-09-16T21:34:35.527+0800: 1120911.117: [GC pause (G1 Evacuation Pause) (young) (to-space exhausted), 1.2081112 secs]
这个暂停时间达到1.2s之长

巨型对象过多导致的内存碎片,引起G1 to-space exhausted

根据上述的GC日志,可以发现几个关键字: to-space exhausted, G1 Humongous Allocation。
查找相关的资料,如:
https://docs.oracle.com/en/java/javase/11/gctuning/garbage-first-garbage-collector-tuning.html#GUID-2428DA90-B93D-48E6-B336-A849ADF1C552

文章中描述了巨型对象是如何定义的,只要对象大小>=1/2 region,那么这个对象就会被JVM标记为Humongous object

Humongous objects are objects larger or equal the size of half a region. The current region size is determined ergonomically as described in the Ergonomic Defaults for G1 GC section, unless set using the -XX:G1HeapRegionSize option.

每一个巨型对象是一组连续的region,分配在老年代。 如:如果region为1m,对象大小为600KB,那么剩余的空间(1024kb - 600kb)将成为碎片,不在参与分配。

Every humongous object gets allocated as a sequence of contiguous regions in the old generation. The start of the object itself is always located at the start of the first region in that sequence. Any leftover space in the last region of the sequence will be lost for allocation until the entire object is reclaimed.

G1的巨型对象内存碎片的验证

package GC;

import java.util.ArrayList;
import java.util.List;

/**
 * -verbose:gc -Xms56m -Xmx56m -XX:G1HeapRegionSize=1M -XX:+UseG1GC -Xloggc:e:\g1\g1-gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpBeforeFullGC -XX:HeapDumpPath=e:\g1\fullgc.dump
 * Created by yongzheng.yang on 2017/7/20.
 */
public class G1GCTest {
    static List<byte[]> list = new ArrayList<>();

    public static void main(String[] args) throws InterruptedException {
        // 10M内存Eden区容不下, S区也容不下, 直接进入Old区, 但是还不够30*50%=15M CMS GC触发条件
        for (int i = 0; i < 56; i++) {
            /*Heap
            garbage-first heap   total 57344K, used 31931K [0x00000000fc800000, 0x00000000fc9001c0, 0x0000000100000000)
            region size 1024K, 2 young (2048K), 1 survivors (1024K)
            */
            // region为1m 1024k,分配的内存为 612K >= 1/2m 所以该对象是Humongous-obj,占用1个region
            // 当分配到53个后,老年代占53个区(Humongous Obj) , Eden 占1个,Survivor占2个,所以53以后没有空间在继续分配大对象了
            // 就会报
            byte[] byte10m1 = new byte[1 * 612 * 1024];
            list.add(byte10m1); // 一直持有,不让回收
        }
        // 分配1kb 但此次内存已经使用完毕了,会触发一次java.lang.OutOfMemoryError: Java heap space
        // 如果配置-XX:G1HeapRegionSize=2  指定每个region为2M,则此处不会抛异常
        byte[] byte10m2 = new byte[1 * 1024];
        Thread.sleep(30000);
    }
}

执行到第52次的时候

C:\Program Files\Java\jdk1.8.0_172\bin>jmap -heap 22784
Attaching to process ID 22784, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.172-b11

using thread-local object allocation.
Garbage-First (G1) GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 58720256 (56.0MB)
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 34603008 (33.0MB)
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
   regions  = 56
   capacity = 58720256 (56.0MB)
   used     = 33761088 (32.19708251953125MB)
   free     = 24959168 (23.80291748046875MB)
   57.49479021344866% used
G1 Young Generation:
Eden Space:
   regions  = 1
   capacity = 11534336 (11.0MB)
   used     = 1048576 (1.0MB)
   free     = 10485760 (10.0MB)
   9.090909090909092% used
Survivor Space:
   regions  = 2
   capacity = 2097152 (2.0MB)
   used     = 2097152 (2.0MB)
   free     = 0 (0.0MB)
   100.0% used
G1 Old Generation:
   regions  = 52
   capacity = 45088768 (43.0MB)
   used     = 30615360 (29.19708251953125MB)
   free     = 14473408 (13.80291748046875MB)
   67.90019190588663% used

1727 interned Strings occupying 155576 bytes.

第53次分配的时候,可以看到G1 OLD区的regions变为了53,说明new byte[1 * 612 * 1024]的分配对于1M的regions来说就是一个Humongou Obj 占用一个独立的region (1M),此时所有regions均与分配完毕,如果在来下一次分配会怎么样呢?

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 58720256 (56.0MB)
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 34603008 (33.0MB)
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
   regions  = 56
   capacity = 58720256 (56.0MB)
   used     = 33787552 (32.222320556640625MB)
   free     = 24932704 (23.777679443359375MB)
   57.53985813685826% used
G1 Young Generation:
Eden Space:
   regions  = 1
   capacity = 11534336 (11.0MB)
   used     = 1048576 (1.0MB)
   free     = 10485760 (10.0MB)
   9.090909090909092% used
Survivor Space:
   regions  = 2
   capacity = 2097152 (2.0MB)
   used     = 2097152 (2.0MB)
   free     = 0 (0.0MB)
   100.0% used
G1 Old Generation:
   regions  = 52
   capacity = 45088768 (43.0MB)
   used     = 30641824 (29.222320556640625MB)
   free     = 14446944 (13.777679443359375MB)
   67.95888501544331% used

1727 interned Strings occupying 155576 bytes.

第54次分配,heap分布情况如下,可以发现eden, suvivor的regions全变为0,这两个space的对象全部移到了old区(old区regions变为56)

Heap Configuration:
   MinHeapFreeRatio         = 40
   MaxHeapFreeRatio         = 70
   MaxHeapSize              = 58720256 (56.0MB)
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 34603008 (33.0MB)
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2
   SurvivorRatio            = 8
   MetaspaceSize            = 21807104 (20.796875MB)
   CompressedClassSpaceSize = 1073741824 (1024.0MB)
   MaxMetaspaceSize         = 17592186044415 MB
   G1HeapRegionSize         = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
   regions  = 56
   capacity = 58720256 (56.0MB)
   used     = 35100520 (33.474464416503906MB)
   free     = 23619736 (22.525535583496094MB)
   59.77582931518555% used
G1 Young Generation:
Eden Space:
   regions  = 0
   capacity = 3145728 (3.0MB)
   used     = 0 (0.0MB)
   free     = 3145728 (3.0MB)
   0.0% used
Survivor Space:
   regions  = 0
   capacity = 0 (0.0MB)
   used     = 0 (0.0MB)
   free     = 0 (0.0MB)
   0.0% used
G1 Old Generation:
   regions  = 56
   capacity = 55574528 (53.0MB)
   used     = 35100520 (33.474464416503906MB)
   free     = 20474008 (19.525535583496094MB)
   63.15936682359228% used

观察GC日志,发生to-space exhausted的收集,查看了https://www.infoq.com/articles/tuning-tips-G1-GC/
这一步骤是非常耗时的,所以我们需要尽量避免由于空间不足引起的to-space exhausted

For G1 GC, an evacuation failure is very expensive -
For successfully copied objects, G1 needs to update the references and the regions have to be tenured.
For unsuccessfully copied objects, G1 will self-forward them and tenure the regions in place.

2020-10-22T11:26:25.291+0800: 227.567: [GC pause (G1 Humongous Allocation) (young) (to-space exhausted), 0.0087451 secs]
   [Parallel Time: 6.8 ms, GC Workers: 4]
      [GC Worker Start (ms): Min: 227566.9, Avg: 227566.9, Max: 227567.0, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 0.2, Avg: 1.8, Max: 6.6, Diff: 6.5, Sum: 7.2]
      [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
      [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]
      [Object Copy (ms): Min: 0.0, Avg: 3.1, Max: 4.3, Diff: 4.3, Sum: 12.4]
      [Termination (ms): Min: 0.0, Avg: 1.7, Max: 2.4, Diff: 2.4, Sum: 6.8]
         [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [GC Worker Total (ms): Min: 6.6, Avg: 6.7, Max: 6.7, Diff: 0.1, Sum: 26.6]
      [GC Worker End (ms): Min: 227573.6, Avg: 227573.6, Max: 227573.6, Diff: 0.1]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.1 ms]
   [Other: 1.8 ms]
      [Evacuation Failure: 1.4 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Humongous Register: 0.0 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.0 ms]
   [Eden: 1024.0K(10.0M)->0.0B(2048.0K) Survivors: 2048.0K->0.0B Heap: 33.0M(56.0M)->32.9M(56.0M)]
 [Times: user=0.06 sys=0.00, real=0.01 secs] 

解决线上的问题

结合我们业务的场景及GC日志中的巨型对象分配触发的收集,以及发生to-space exhausted时,GC日志的空间实际上还是充足的,说明是巨型对象Humongou Obj分配过多引起的内存碎片, 最终导致to-space exhausted,所以决定调整线上的region size大小,增大为4m(原先并无配置,根据6G内存推导计算应该是2M),经过一段时间上线运行,不在出现to-space exhausted导致的GC暂停过长

-XX:G1HeapRegionSize=4M

但,对于应用程序来说,最好的做法还是避免Humongou Obj的产生,应该避免大对象的加载。

相关文章

  • G1 to-space exhausted 的问题

    问题描述 线上支付元数据系统出现告警,堆内存使用率超过95% 应用背景 应用说明:该系统存储了很多账户-银行的配置...

  • JVM调优实战:G1中的to-space exhausted问题

    最近刚刚将自己的一个应用从CMS升级到G1,在一天早上,刚刚到办公室坐下,就收到手机一阵报警,去查看了监控,发现机...

  • Centos7 编译过程中遇到 virtual memory e

    问题 在编译的时候(make),发生了错误 virtual memory exhausted: Cannot al...

  • iOS virtual memory exhausted! 问题

    报错信息 之前我在edit shceme中打开NSZombies 从Stack Overflow 上找到答案是说 ...

  • Exhausted

    20170602、working harder. Working smarter. Still the missi...

  • exhausted

    我没能力遗忘 你不用提醒我 哪怕结局就这样. 我还能怎样 要怎样 最后还不是落的朋友的立场 你从不会想 我何必这样

  • exhausted

    最近状态不大对,ddl焦虑,家里压抑,天气燥热,休息不足,睡眠差,环境吵,正确率低,进度慢,被限流,加载中…… d...

  • Exhausted

    Exhausted 意思是“极度疲倦的”。 现在的我差不多就是这种状态了。 工作8小时电子屏幕时间。 下班4小时电...

  • exhausted

    是什么让父母很焦虑?对,没错,是孩子。孩子的吃喝拉撒学习生活都让父母焦头烂额。如果你问我世界上最伟大的爱是什...

  • G1 与 CMS 两个垃圾收集器的对比

    细节方面不同 G1 在压缩空间方面有优势。 G1 通过将内存空间分成区域(Region)的方式避免内存碎片问题。 ...

网友评论

      本文标题:G1 to-space exhausted 的问题

      本文链接:https://www.haomeiwen.com/subject/rnxemktx.html