jvm crash when overwritten running jar
现象
- 策划热更完配置表后 jvm直接就crash了(开发机linux)
crash日志
- 日志分析
- 从crash日志看是reload配置表 使用Reflections扫描配置类 最终读取jar包
- java.util.zip.ZipFile.getEntry
- 然后是C代码 memcpy crash
Stack: [0x00007fd2ef8f9000,0x00007fd2ef9fa000], sp=0x00007fd2ef9f6328, free space=1012k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libc.so.6+0x89997] memcpy+0x2e7
C [libzip.so+0x11f2f] ZIP_GetEntry2+0xff
C [libzip.so+0x3d30] Java_java_util_zip_ZipFile_getEntry+0xf0
J 287 java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x00007fd3733076ce [0x00007fd373307600+0xce]
J 941 C2 java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry; (22 bytes) @ 0x00007fd3734e3d38 [0x00007fd3734e39a0+0x398]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 287 java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x00007fd373307658 [0x00007fd373307600+0x58]
J 941 C2 java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry; (22 bytes) @ 0x00007fd3734e3d38 [0x00007fd3734e39a0+0x398]
J 662 C2 sun.misc.URLClassPath$JarLoader.getResource(Ljava/lang/String;Z)Lsun/misc/Resource; (85 bytes) @ 0x00007fd37341c920 [0x00007fd37341c8a0+0x80]
J 2995 C2 sun.misc.URLClassPath$JarLoader.findResource(Ljava/lang/String;Z)Ljava/net/URL; (18 bytes) @ 0x00007fd373c4eb5c [0x00007fd373c4eb20+0x3c]
J 4641 C1 sun.misc.URLClassPath$1.next()Z (63 bytes) @ 0x00007fd373e32404 [0x00007fd373e322a0+0x164]
J 3072 C1 sun.misc.URLClassPath$1.hasMoreElements()Z (5 bytes) @ 0x00007fd373ce6e44 [0x00007fd373ce6dc0+0x84]
J 2745 C1 java.net.URLClassLoader$3$1.run()Ljava/lang/Object; (5 bytes) @ 0x00007fd373b9129c [0x00007fd373b91220+0x7c]
v ~StubRoutines::call_stub
J 2500 java.security.AccessController.doPrivileged(Ljava/security/PrivilegedAction;Ljava/security/AccessControlContext;)Ljava/lang/Object; (0 bytes) @ 0x00007fd373aa4e23 [0x00007fd373aa4dc0+0x63]
J 2404 C1 java.net.URLClassLoader$3.next()Z (73 bytes) @ 0x00007fd373a71b6c [0x00007fd373a719e0+0x18c]
J 2501 C1 java.net.URLClassLoader$3.hasMoreElements()Z (5 bytes) @ 0x00007fd373aa51ec [0x00007fd373aa5180+0x6c]
J 2457 C1 sun.misc.CompoundEnumeration.next()Z (58 bytes) @ 0x00007fd373a8cb64 [0x00007fd373a8cac0+0xa4]
J 2651 C1 sun.misc.CompoundEnumeration.hasMoreElements()Z (5 bytes) @ 0x00007fd373b0dbec [0x00007fd373b0db80+0x6c]
J 2457 C1 sun.misc.CompoundEnumeration.next()Z (58 bytes) @ 0x00007fd373a8cb64 [0x00007fd373a8cac0+0xa4]
J 2651 C1 sun.misc.CompoundEnumeration.hasMoreElements()Z (5 bytes) @ 0x00007fd373b0dbec [0x00007fd373b0db80+0x6c]
j org.reflections.util.ClasspathHelper.forResource(Ljava/lang/String;[Ljava/lang/ClassLoader;)Ljava/util/Collection;+48
j org.reflections.util.ClasspathHelper.forPackage(Ljava/lang/String;[Ljava/lang/ClassLoader;)Ljava/util/Collection;+5
j org.reflections.util.ConfigurationBuilder.build([Ljava/lang/Object;)Lorg/reflections/util/ConfigurationBuilder;+327
j org.reflections.Reflections.<init>([Ljava/lang/Object;)V+2
j org.reflections.Reflections.<init>(Ljava/lang/String;[Lorg/reflections/scanners/Scanner;)V+13
j com.xx.common.dataconfig.DataConfigService.loadAllConfigsFromJSON(Ljava/lang/String;Ljava/lang/String;)V+40
j com.xx.achilles.spurs.agent.GameServerAgent.loadConfigRelated()V+21
j com.xx.achilles.spurs.gs.servlets.DevOpsHttpHandler.reload(Ljavax/servlet/http/HttpServletRequest;)Ljava/lang/String;+10
v ~StubRoutines::call_stub
J 3000 sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (0 bytes) @ 0x00007fd373c5bb77 [0x00007fd373c5bb00+0x77]
J 7233 C2 sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (104 bytes) @ 0x00007fd3736c9470 [0x00007fd3736c9400+0x70]
J 5097 C2 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (62 bytes) @ 0x00007fd373bd18d4 [0x00007fd373bd1820+0xb4]
j com.xx.common.http.HttpMethodConfig.invoke(Ljavax/servlet/http/HttpServletRequest;)Ljava/lang/String;+16
j com.xx.common.http.HttpServerHandlerManager.handle(Ljava/lang/String;Lorg/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V+83
j org.eclipse.jetty.server.handler.HandlerWrapper.handle(Ljava/lang/String;Lorg/eclipse/jetty/server/Request;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V+18
j org.eclipse.jetty.server.Server.handle(Lorg/eclipse/jetty/server/HttpChannel;)V+170
j org.eclipse.jetty.server.HttpChannel.handle()Z+309
j org.eclipse.jetty.server.HttpConnection.onFillable()V+127
j org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded()V+4
j org.eclipse.jetty.io.FillInterest.fillable()V+61
j org.eclipse.jetty.io.ChannelEndPoint$2.run()V+7
j org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Ljava/lang/Runnable;)V+1
j org.eclipse.jetty.util.thread.QueuedThreadPool$2.run()V+104
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
参考原因和解决方案
- we've seen similar errors. our current suspect is jar files which are re-written (by an upgrade process) while the process is running.
- If you replace a zip/jar file that a Java program currently has "open" (has cached the ZipFile/JarFile object), it will use cached table-of-contents (TOC) data it read from the original file, and will try and use that to unpack data in the replaced file. The inflation code is not robust and will outright crash when presented with bad data.
- Issue is zip/JAR file is being overwritten while in use. OpenJDK code for ZIP file format is in native C code any entry lookup, creation requires multiple round-trip of expensive jni invocations. The current native C implementation code uses mmap to map in the central directory table which is a big risk of vm crash when the underlying jar file gets overwritten with new contents while it is still being used by other ZipFile, that is what is happening. Using - Dsun.zip.disableMemoryMapping=true will solve the problem
- It appears -Dsun.zip.disableMemoryMapping=true helps. The test case obviously suggests the zip/JAR file is being overridden while in use. And the changes we made in jdk9 does solve the problem
- The zip library implementation has been improved in JDK 9. The new java.util.zip.ZipFile implementation does not use mmap to map ZIP file central directory into memory anymore. As a result, the
sun.zip.disableMemoryMapping
system property is no longer needed and has been removed.
- Make sure the application code does not reload a class while it's in use. In addition, make sure that a jar file is not being updated while it's being accessed by the class loader.
- Reasons
- While a class is in use it is dynamically reloaded from a jar file
- While a jar file is being accessed by the class loader, the jar file is being modified
- A Jarfile which was bigger than 4GB was accessed (applies to Java 6 and earlier only)
- None of the above are Java bugs. It's the application's responsibility to prevent this from happening.A crash is unavoidable if a Jarfile is being modified while it's in use. It's similar to modifying a shared library or dll while a program is running. This will also lead to an application crash.
- The current j.u.z.ZipFile has following major issues
(1) Its ZIP file format support code is in native C code (shared with the VM via ZipFile.c -> zip_util.c). Any entry lookup, creation requires multiple round-trip of expensive jni invocations.
(2) The current native C implementation code uses mmap to map in the central directory table appears to be a big risk of vm crash when the underlying jar file gets overwritten with new contents while it is still being used by other ZipFile.
(3) The use of "filename + lastModified()" cache (at native) appears to be broken if the timestamp is in low resolution,and/or the file is being overwritten.
The clean solution here is to bring the ZIP format support code from native to Java to remove the jni invocation cost and the mmap risk. Also to use the fileKey and lastModified from java.nio.file.attribute.BasicFileAttributes to have better cache matching key.
白话版本
- 就是zip读取内部实现使用了jni代码(linux),即使用了mmap,当zip/jar中的内容被重写后,mmap可能被破坏而又再次涉及到读取zip/jar内容的时候就很大可能crash(如越界或者空指针等)
- 而本身这个zip读取频繁的使用jni,性能其实也有一定的损失
- java9尝试进行了修复
- 这个本身其实并不算bug,实际线上应该避免此类做法,就如同当一个程序运行的时候,动态修改了dll,程序也会crash一样
- 对于很多解决方案提到的sun.zip.disableMemoryMapping在java9中已经移除了,因为不再使用mmap
我们项目为什么会出现crash
- 从crash日志看 是在load配置表crash的
- 这个是策划在reload 配置表(使用Reflections#scan)
- 但与此同时 我们的后端同学可能正在替换jar包(更新游戏服务器)
- 因为现在未走正式的更新流程 正常应该先shutdown 再覆盖jar 再重启
- 而出问题很可能是先覆盖了jar 再shutdown的 而覆盖jar的时候恰好策划relaod 从而造成crash
- 即发生了overwritten runable jar
总结
- 当一个jar正在使用的时候 请不要尝试在运行过程中去覆盖
- 对于JDK-8156179中的ZipCrashTest.java,我分别在linux和mac测试,均未出现crash(个人猜测可能和linux内核版本有关),但是报了空指针(其实出现空指针也比较诡异即使没有crash)
- 创建一个test1.zip 两个条目分别是world.txt和hello.txt
- 创建一个test2.zip 两个条目分别是hello.txt和world.txt
- 使用ZipFile打开test1.zip 读取hello.txt
- 执行linux命令 cp test2.zip test1.zip 即将test2.zip中的内容替换test1.zip中的内容
- 尝试再次读取test1.zip中的world.txt
- 有兴趣可以使用GDB去调试一下具体出错的jni代码
[xx@achilles landon]$ java ZipCrashTest
Exception in thread "main" java.lang.NullPointerException: entry
at java.util.zip.ZipFile.getInputStream(ZipFile.java:347)
参考
网友评论