问题现象
Java进程内存缓慢升高,到达一定值后居高不下:见 RES列
image-20201222110650976RES: A task's currently used share of available physical memory.
任务当前使用的可用物理内存份额
复现过程
2020/09/30 16:00 运维发现该服务内存占用比较高约2.7G
2020/09/30 16:10 运维重启该服务
2020/09/30 16:40 运维发现该服务内存又升高到1.9G
首次排查历程
启动参数
/usr/local/jdk/bin/java -Djava.util.logging.config.file=/apps/myapp/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -server -Xms4g -Xmx4g -Xss1024K -XX:+UseG1GC -XX:MaxGCPauseMillis=500 -XX:ParallelGCThreads=10 -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/logs/myapp/myapp-gc.log -Djava.endorsed.dirs=/apps/myapp/endorsed -classpath /apps/myapp/bin/bootstrap.jar:/apps/myapp/bin/tomcat-juli.jar -Dcatalina.base=/apps/myapp -Dcatalina.home=/apps/myapp -Djava.io.tmpdir=/apps/myapp/temp org.apache.catalina.startup.Bootstrap start
通过启动参数可知:
该应用最大堆内存配置为:4G
怀疑:存在内存泄露,排查堆内存占用情况
验证过程
第一步:dump堆内存快照信息
jmap -dump:format=b,file=heap.bin <PID>
第二步:mat分析内存
结果:mat分析得知,此时应用堆内存仅几百MB,距离1.9G差距很大。暂时解除是堆内存泄露的怀疑。
怀疑:存在堆外内存泄露,排查堆外内存占用情况
由于中间有别的事情,就没有立即对该怀疑进行验证
第二次排查历程
启动参数中的gc日志引起了我的注意:
-Xloggc:/logs/myapp/myapp-gc.log
我决定先分析一下内存回收的情况
cat /logs/myapp/myapp-gc.log
...
2020-12-22T10:50:59.014+0800: 2813573.254: [GC pause (G1 Evacuation Pause) (young), 0.1792911 secs]
[Parallel Time: 7.3 ms, GC Workers: 10]
[GC Worker Start (ms): Min: 2813573253.9, Avg: 2813573255.6, Max: 2813573258.6, Diff: 4.7]
[Ext Root Scanning (ms): Min: 0.0, Avg: 0.6, Max: 2.1, Diff: 2.1, Sum: 6.1]
[Update RS (ms): Min: 0.0, Avg: 1.0, Max: 3.1, Diff: 3.1, Sum: 10.4]
[Processed Buffers: Min: 0, Avg: 7.9, Max: 33, Diff: 33, Sum: 79]
[Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 1.9]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 0.0, Avg: 0.9, Max: 3.1, Diff: 3.1, Sum: 8.7]
[Termination (ms): Min: 0.0, Avg: 1.5, Max: 2.3, Diff: 2.3, Sum: 14.8]
[Termination Attempts: Min: 1, Avg: 5.4, Max: 17, Diff: 16, Sum: 54]
[GC Worker Other (ms): Min: 0.0, Avg: 1.2, Max: 4.6, Diff: 4.5, Sum: 11.6]
[GC Worker Total (ms): Min: 2.3, Avg: 5.4, Max: 7.1, Diff: 4.7, Sum: 53.6]
[GC Worker End (ms): Min: 2813573260.9, Avg: 2813573261.0, Max: 2813573261.0, Diff: 0.1]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.7 ms]
[Other: 171.2 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 169.0 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.4 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 1.2 ms]
[Eden: 2452.0M(2452.0M)->0.0B(2448.0M) Survivors: 4096.0K->8192.0K Heap: 2572.1M(4096.0M)->123.7M(4096.0M)]
[Times: user=0.03 sys=0.00, real=0.18 secs]
...
日志中有一行引起了我的注意:
[Eden: 2452.0M(2452.0M)->0.0B(2448.0M) Survivors: 4096.0K->8192.0K Heap: 2572.1M(4096.0M)->123.7M(4096.0M)]
[Times: user=0.03 sys=0.00, real=0.18 secs]
Eden区内存回收从 2452M -> 0.0B 与最高内存2.9G这个数值比较相近
我决定再分析一下堆内存的分配情况:
jmap -heap 32308
Attaching to process ID 32308, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.65-b01
using thread-local object allocation.
Garbage-First (G1) GC with 10 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 4294967296 (4096.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 2575302656 (2456.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 2097152 (2.0MB)
Heap Usage:
G1 Heap:
regions = 2048
capacity = 4294967296 (4096.0MB)
used = 2111526336 (2013.7084350585938MB)
free = 2183440960 (2082.2915649414062MB)
49.1628035902977% used
G1 Young Generation:
Eden Space:
regions = 945
capacity = 2696937472 (2572.0MB)
used = 1981808640 (1890.0MB)
free = 715128832 (682.0MB)
73.48367029548989% used
Survivor Space:
regions = 4
capacity = 8388608 (8.0MB)
used = 8388608 (8.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 64
capacity = 1589641216 (1516.0MB)
used = 121329088 (115.70843505859375MB)
free = 1468312128 (1400.2915649414062MB)
7.632482523653941% used
33825 interned Strings occupying 3865872 bytes.
从中可以读取到的重要信息是:
新生代Eden区:分配内存2572MB、当前使用了1890MB
新生代Survivor区:分配内存8MB、当前使用了8MB
老年代:分配内存1516MB、当前使用了115MB
从中可大致分析出一个场景:
当Eden区内存使用率到100%,触发新生代GC。然后将存活的对象转移到 Survivor区。Survivor区进行多轮标记,如果对象还存活则转移对象到 Old Generation区。
从这里我们可以知道由于Eden分配的内存是2572MB,所以在新生代GC前JVM的内存占用是有可能大于 2572MB的。
通过上面的分析:得出了JVM内存占用是有可能达到 2.5GB及以上的可能性的结论。
新的疑问,但为何应用重启后,进程的RES列缓慢升高,之后就是久高不下呢,通过GC日志可以知道JVM的确也回收了内存。如果按这个逻辑来说,RES列应该随着GC而同步波动才对啊。
怀疑:JVM回收内存后并没有将内存归还给操作系统
为了验证我的怀疑,我开始查阅相关资料。
最终大概得出了一个结论:
在Linux操作系统下,会为每个进程分配一个虚拟内存(32位操作系统为4G)。通过页映射的方式让虚拟内存与物理内存(及硬盘的Swap)建立映射。
当进程真正需要使用内存的时候会申请虚拟内存,此时操作系统会将物理内存与进程的进程的虚拟内存建立绑定。代表该物理内存被进程所占用,但在进程的视野里只有虚拟内存。
同理,当进程主动释放内存的时候,也是释放虚拟内存,进而操作系统会将物理内存释放掉。
这里涉及到很多操作系统的原理知识。有兴趣大家可以进一步做研究。
本地实验
实验环境:
系统:Linux (发行版:ubuntu)
JDK:1.8
实验代码
每100ms创建一个大小约为 1MB的对象
public class Test {
public static void main (String[] args) throws InterruptedException {
while (true) {
TestObj testObj = new TestObj();
Thread.sleep(100L);
}
}
public static class TestObj {
private byte[][] field01 = new byte[1024][1024];
}
}
运行命令:
java -XX:+UseG1GC -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xms1g -Xmx1g -Xloggc:/work/test-gc.log Test
实验现象
Attaching to process ID 3302, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.271-b09
using thread-local object allocation.
Garbage-First (G1) GC with 2 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 1073741824 (1024.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 643825664 (614.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 1048576 (1.0MB)
Heap Usage:
G1 Heap:
regions = 1024
capacity = 1073741824 (1024.0MB)
used = 54341376 (51.823974609375MB)
free = 1019400448 (972.176025390625MB)
5.060935020446777% used
G1 Young Generation:
Eden Space:
regions = 51
capacity = 675282944 (644.0MB)
used = 53477376 (51.0MB)
free = 621805568 (593.0MB)
7.919254658385094% used
Survivor Space:
regions = 1
capacity = 1048576 (1.0MB)
used = 1048576 (1.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 0
capacity = 397410304 (379.0MB)
used = 0 (0.0MB)
free = 397410304 (379.0MB)
0.0% used
708 interned Strings occupying 47224 bytes.
top -p <PID>
image-20201223171327219RES会逐渐升高,当升高到664752 (约649MB)时,基本不再变换。
大致变化波动约 ±(1-2)MB左右,这里猜测是因为JVM Survivor区在做快速标记回收占用的空间与Eden区的峰值产生的叠加效应。
因此得出结论 低频业务处理的Java服务最高物理内存 约等于 堆区峰值是的使用内存。
结论
当JVM堆内存管理时,当申请要使用虚拟内存时(注意:是申请而不是分配,分配是JVM启动时已经做好了分配。新生代老年代的内存比例也定了)。操作系统会相应的分配内存给Java进程,当GC之后,站在JVM的内存管理角度,会清理掉堆内的对象,但是并不会释放内存,即并没有把曾经申请到的内存归还给操作系统。
这也就印证了实验现象:首次Eden达到峰值触发GC之前,Java进程物理内存占用逐渐升高,因为这个阶段是首次Java进程向操作系统在申请内存,当GC之后Java进程物理内存占用趋于稳定,而不是骤减。
最后总结生产环境的问题:内存占用高应该属于正常现象。因为分配的堆内存高,使 RES ≈ Eden区的内存峰值,首次排查时dump堆内存分析发现堆内存远小于 RES也是偶然,因为正好赶上GC之后不久dump的内存。
之后又再濒临GC前dump过一次堆内存,发现堆内存值约等于RES值。
参考资料:
https://cloud.tencent.com/developer/article/1420898
网友评论