分析gc日志并进行调优

作者: nhhnhh | 来源:发表于2020-05-13 23:22 被阅读0次

    在我们正常的开发中很少会接触gc调优的事情。gc调优一般都是发生在项目开始之前。而项目开始之后如果发生oom或者full gc时间太长,导致系统卡顿等问题,这时候最好先找自己代码的问题。

    首先先指定好启动参数: 图片.png
    这段的意思就是打印gc的详情,输出gc时间戳,使用ParNew+CMS+SerialOld进行组合回收,当CMS失败时,SerialOld作为后备回收器。将gc日志输出到指定目录中。

    然后启动项目。获取gc日志的信息。截取其中的一段来进行分析:

    CommandLine flags: -XX:-BytecodeVerificationLocal -XX:-BytecodeVerificationRemote -XX:InitialHeapSize=134217728 -XX:+ManagementServer -XX:MaxHeapSize=2147483648 -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:TieredStopAtLevel=1 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC 
    0.992: [GC (Allocation Failure) [PSYoungGen: 33280K->3941K(38400K)] 33280K->4013K(125952K), 0.0046700 secs] [Times: user=0.01 sys=0.01, real=0.00 secs] 
    1.239: [GC (Allocation Failure) [PSYoungGen: 37221K->4387K(38400K)] 37293K->4467K(125952K), 0.0060482 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
    1.615: [GC (Allocation Failure) [PSYoungGen: 37667K->4719K(38400K)] 37747K->4807K(125952K), 0.0045716 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
    1.758: [GC (Allocation Failure) [PSYoungGen: 37999K->5101K(71680K)] 38087K->5452K(159232K), 0.0066052 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
    2.040: [GC (Allocation Failure) [PSYoungGen: 71661K->5106K(71680K)] 72012K->6515K(159232K), 0.0089319 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
    2.216: [GC (Allocation Failure) [PSYoungGen: 71666K->6582K(139264K)] 73075K->7999K(226816K), 0.0084832 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
    2.446: [GC (Metadata GC Threshold) [PSYoungGen: 69050K->5044K(139776K)] 70468K->9525K(227328K), 0.0104451 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
    2.457: [Full GC (Metadata GC Threshold) [PSYoungGen: 5044K->0K(139776K)] [ParOldGen: 4481K->7961K(52736K)] 9525K->7961K(192512K), [Metaspace: 20638K->20638K(1067008K)], 0.0291630 secs] [Times: user=0.08 sys=0.01, real=0.03 secs] 
    

    接下来挑几行日志来分析一下各个参数到底是什么意思。
    1.CommandLine flags指的是java的启动参数。

    1. 然后第一段gc的详情日志:0.992: [GC (Allocation Failure) [PSYoungGen: 33280K->3941K(38400K)] 33280K->4013K(125952K), 0.0046700 secs] [Times: user=0.01 sys=0.01, real=0.00 secs]
      Allocation Failure 指的是内存分配失败,所以进行gc。
      PSYoungGen指的是进行了新生代的gc,内存从33280K到了3941K,新生代总内存为38400K。整个堆的大小从33280K到了4013K,堆的总内存为125952K。总共耗时4毫秒。
      3.然后分析发生full gc的那一行日志:2.457: [Full GC (Metadata GC Threshold) [PSYoungGen: 5044K->0K(139776K)] [ParOldGen: 4481K->7961K(52736K)] 9525K->7961K(192512K), [Metaspace: 20638K->20638K(1067008K)], 0.0291630 secs] [Times: user=0.08 sys=0.01, real=0.03 secs]
      发生了full gc,导致full gc的原因是metadata 分配,其他参数跟上面参数一样,Metaspace这个参数表示发生gc前metadata大小为20638K,gc后metadata的大小为20638K,总大小为1067008K。

    可以将gc日志上传至https://gceasy.io这个网站上,会给你生成可视化的gc的情况。类似如下页面:

    图片.png 图片.png

    注意,红框标记的意思是吞吐量的意思,吞吐量越高,gc的性能越好。

    然后我们来进行调优一下,调优一般就是改动一些参数,如metadata的大小,新生代的大小等等。一个一个试。上述的gc日志,表明了metadata区可能太小了,因为我们如果不指定metadata的区域的大小的话,他的最小值是我们机器内存的1/64,最大值是我们机器的1/4。我们先给他指定一下大小,加如下参数 -XX:MetaspaceSize=1g
    -XX:MaxMetaspaceSize=1g
    然后再分析一下gc的日志:


    图片.png

    可以看到吞吐量提高了,并且平均gc响应跟最大gc响应都变小了。

    然后我们再指定一下用cms收集器看看:加这个参数-XX:+UseConcMarkSweepGC

    6.228: [GC (CMS Initial Mark) [1 CMS-initial-mark: 44081K(87424K)] 47207K(126720K), 0.0009259 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    6.229: [CMS-concurrent-mark-start]
    6.278: [CMS-concurrent-mark: 0.050/0.050 secs] [Times: user=0.09 sys=0.01, real=0.05 secs] 
    6.278: [GC (Allocation Failure) 6.278: [ParNew: 37354K->2663K(39296K), 0.0025064 secs] 81435K->46744K(126720K), 0.0025354 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
    6.281: [CMS-concurrent-preclean-start]
    6.282: [CMS-concurrent-preclean: 0.001/0.001 secs] [Times: user=0.00 sys=0.01, real=0.00 secs] 
    6.282: [GC (CMS Final Remark) [YG occupancy: 3362 K (39296 K)]6.282: [Rescan (parallel) , 0.0012354 secs]6.283: [weak refs processing, 0.0003721 secs]6.284: [class unloading, 0.0036852 secs]6.287: [scrub symbol table, 0.0087614 secs]6.296: [scrub string table, 0.0004252 secs][1 CMS-remark: 44081K(87424K)] 47443K(126720K), 0.0150513 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
    6.297: [CMS-concurrent-sweep-start]
    6.312: [CMS-concurrent-sweep: 0.014/0.014 secs] [Times: user=0.03 sys=0.00, real=0.02 secs] 
    6.312: [CMS-concurrent-reset-start]
    6.314: [CMS-concurrent-reset: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
    

    可以看到cms收集器的各个阶段,初始标记,并发标记,最终标记,然后清除等过程。
    然后看下当前的gc的效果图:


    图片.png

    可以看到吞吐量达到了99.208%。比之前高了1.5个百分点。
    再试试使用g1收集器是怎么样的:-XX:+UseG1GC
    g1的gc日志跟其他的日志有很大的不同:

    0.729: [GC pause (G1 Evacuation Pause) (young), 0.0044414 secs]
      [Parallel Time: 2.4 ms, GC Workers: 4]
         [GC Worker Start (ms): Min: 729.7, Avg: 729.7, Max: 729.7, Diff: 0.0]
         [Ext Root Scanning (ms): Min: 0.2, Avg: 0.4, Max: 0.8, Diff: 0.6, Sum: 1.5]
         [Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
            [Processed Buffers: Min: 0, Avg: 0.0, Max: 0, Diff: 0, Sum: 0]
         [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]
         [Object Copy (ms): Min: 1.4, Avg: 1.8, Max: 2.1, Diff: 0.6, Sum: 7.2]
         [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
            [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
         [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
         [GC Worker Total (ms): Min: 2.3, Avg: 2.3, Max: 2.3, Diff: 0.0, Sum: 9.0]
         [GC Worker End (ms): Min: 732.0, Avg: 732.0, Max: 732.0, Diff: 0.0]
      [Code Root Fixup: 0.0 ms]
      [Code Root Purge: 0.0 ms]
      [Clear CT: 0.1 ms]
      [Other: 1.9 ms]
         [Choose CSet: 0.0 ms]
         [Ref Proc: 1.0 ms]
         [Ref Enq: 0.0 ms]
         [Redirty Cards: 0.0 ms]
         [Humongous Register: 0.0 ms]
         [Humongous Reclaim: 0.0 ms]
         [Free CSet: 0.0 ms]
      [Eden: 14.0M(14.0M)->0.0B(19.0M) Survivors: 0.0B->2048.0K Heap: 14.0M(128.0M)->2650.0K(128.0M)]
    [Times: user=0.01 sys=0.01, real=0.00 secs] 
    

    使用g1后发现吞吐量下降了,但是平均gc时间跟最大gc时间都降低了很多。因为g1是低延迟垃圾回收器,不是以高吞吐量为目的的。


    图片.png

    当然真正线上的参数肯定比这里例举的多很多,不过gc调优的大概思路就是这样,根据分析结果,调整各个参数。

    相关文章

      网友评论

        本文标题:分析gc日志并进行调优

        本文链接:https://www.haomeiwen.com/subject/wjqanhtx.html