美文网首页
记一次线上奔溃分析过程

记一次线上奔溃分析过程

作者: 天地一蜉蝣_6e86 | 来源:发表于2020-03-03 14:43 被阅读0次

在一次升级之后其中一个应用一直奔溃。java crash 的原因有几种:

  1. java 程序问题,发生OOM 导致进程crash
    排查步骤如下:
      1. 查看JVM参数 -XX:+HeapDumpOnOutOfMemoryError 和 -XX:HeapDumpPath=*/java.hprof;
      2. 根据HeapDumpPath指定的路径查看是否产生dump文件;
      3. 若存在dump文件,使用Jhat、VisualVM等工具分析即可;
  2. jvm 出错,jvm 或者jdk 自身的bug 导致crash
    当jvm出现致命错误时,会生成一个错误文件 hs_err_pid.log,其中包括了导致jvm crash的重要信息,可以通过分析该文件定位到导致crash的根源,从而改善以保证系统稳定。当出现crash时,该文件默认会生成到工作目录下,然而可以通过jvm参数-XX:ErrorFile指定生成路径。
  3. 被操作系统oom-killer
    查看操作系统日志:sudo grep –color “java” /var/log/messages,确定Java进程是否被操作系统Kill
    在线上环境上可以看到有大量的hs_err_pid.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f847f9bf641, pid=36367, tid=0x00007f844b3f5700
#
# JRE version: OpenJDK Runtime Environment (Zulu 8.38.0.13-CA-linux64) (8.0_212-b04) (build 1.8.0_212-b04)
# Java VM: OpenJDK 64-Bit Server VM (25.212-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x16f641]  __strlen_sse2_pminub+0x11
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://www.azulsystems.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x00007f834c001000):  JavaThread "AgentMonitor-42" [_thread_in_native, id=36538, stack(0x00007f844b2f5000,0x00007f844b3f6000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000

Registers:
RAX=0x0000000000000000, RBX=0x0000000000000016, RCX=0x0000000000000000, RDX=0x0000000000000000
RSP=0x00007f844b3f16b8, RBP=0x00007f845033a330, RSI=0x00007f844b3f1b90, RDI=0x33261d74e6c20600
R8 =0x00007f844b3f1720, R9 =0x00007f847f89d27d, R10=0x0000000000000002, R11=0x00007f847f9d3df4
R12=0x0000000000000041, R13=0x00007f845033a518, R14=0x00007f84804607e0, R15=0x0000000000000004
RIP=0x00007f847f9bf641, EFLAGS=0x0000000000010283, CSGSFS=0x0000000000000033, ERR=0x0000000000000000
  TRAPNO=0x000000000000000d

Top of Stack: (sp=0x00007f844b3f16b8)
0x00007f844b3f16b8:   00007f848025b6bd 00007f8340567120
0x00007f844b3f16c8:   00007f844b3f1f90 00007f844b3f1f90
0x00007f844b3f16d8:   00007f848025dffb 00007f844b3f1720
...
0x00007f844b3f1898:   00007f845184aba8 00007f834c001000
0x00007f844b3f18a8:   00007f834c001000 00007f844b3f1920 

Instructions: (pc=0x00007f847f9bf641)
0x00007f847f9bf621:   c0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48
0x00007f847f9bf631:   31 c0 89 f9 83 e1 3f 66 0f ef c0 83 f9 30 77 1d
0x00007f847f9bf641:   f3 0f 6f 0f 66 0f 74 c1 66 0f d7 d0 85 d2 0f 85
0x00007f847f9bf651:   4e 02 00 00 48 89 f8 48 83 e0 f0 eb 24 48 89 f8 

Register to memory mapping:

RAX=0x0000000000000000 is an unknown value
RBX=0x0000000000000016 is an unknown value
RCX=0x0000000000000000 is an unknown value
RDX=0x0000000000000000 is an unknown value
RSP=0x00007f844b3f16b8 is pointing into the stack for thread: 0x00007f834c001000
RBP=0x00007f845033a330 is pointing into the stack for thread: 0x00007f838c01a000
RSI=0x00007f844b3f1b90 is pointing into the stack for thread: 0x00007f834c001000
RDI=0x33261d74e6c20600 is an unknown value
R8 =0x00007f844b3f1720 is pointing into the stack for thread: 0x00007f834c001000
R9 =0x00007f847f89d27d: _IO_vfprintf+0x4ccd in /lib64/libc.so.6 at 0x00007f847f850000
R10=0x0000000000000002 is an unknown value
R11=0x00007f847f9d3df4: <offset 0x183df4> in /lib64/libc.so.6 at 0x00007f847f850000
R12=0x0000000000000041 is an unknown value
R13=0x00007f845033a518 is pointing into the stack for thread: 0x00007f838c01a000
R14=0x00007f84804607e0: snoopy_inputdatastorage_data+0 in /usr/lib64/libsnoopy.so at 0x00007f8480255000
R15=0x0000000000000004 is an unknown value

该文件包含如下几类关键信息:
-日志头文件
-导致crash的线程信息
-所有线程信息
-安全点和锁信息
-堆信息
-本地代码缓存
-编译事件
-gc相关记录
-jvm内存映射
-jvm启动参数
-服务器信息
具体分析参考:https://my.oschina.net/xionghui/blog/498785
在stack 中可以看到:

Stack: [0x00007f844b2f5000,0x00007f844b3f6000],  sp=0x00007f844b3f16b8,  free space=1009k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)

C=native code 说明java 在执行native 代码时crash。可以使用strace 追踪系统调用。(strace :https://blog.csdn.net/rigete/article/details/50055783

28100 22:03:35.247322 [00007f1a9305b710] open(0x7f1a95f56110, O_RDONLY) = -1 ENOENT (No such file or directory)
28100 22:03:35.247354 [00007f1a9305b710] open(0x7f1a95f571a0, O_RDONLY) = -1 ENOENT (No such file or directory)
28100 22:03:35.247383 [00007f1a9305b710] open(0x7f1a95f561a0, O_RDONLY) = -1 ENOENT (No such file or directory)
28100 22:03:35.247414 [00007f1a9305b710] open(0x7f1a95f57120, O_RDONLY) = -1 ENOENT (No such file or directory)
28100 22:03:35.247444 [00007f1a9305b710] open(0x7f1a95f57230, O_RDONLY) = -1 ENOENT (No such file or directory)
28100 22:03:35.247473 [00007f1a9305b710] open(0x7f1a95f56220, O_RDONLY) = -1 ENOENT (No such file or directory)
28100 22:03:35.247508 [00007f1a9305b9b0] write(2, 0x7ffc86a540b8, 4) = -1 EPIPE (Broken pipe)
28100 22:03:35.247540 [00007f1a9305b9b0] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28100, si_uid=1001} ---
28100 22:03:35.247622 [????????????????] +++  +++killed by SIGPIPE

和hs_err_pid.log中的SIGSEGV信息差不多,写入不存在的内存或者只读内存。信息不多。所以对Register to memory mapping 中的内存谷歌了下发现:
https://stackoverflow.com/questions/44922588/hadoop-nodemanager-killed-by-sigsegv
err 信息一模一样,所以尝试停止snoopy,然后解决了。

相关文章

  • 记一次线上奔溃分析过程

    在一次升级之后其中一个应用一直奔溃。java crash 的原因有几种: java 程序问题,发生OOM 导致进程...

  • iOS奔溃日志分析

    iOS奔溃日志分析 前言(扯淡) iOS奔溃日志能够比较有效的分析奔溃的原因,方便我们debug我们的项目。当然现...

  • DYLD, Library not loaded: /usr/l

    奔溃日志 奔溃表现:iOS12.1 及以下启动奔溃奔溃日志: 解决方法:关闭bitcode,重新打包上传appst...

  • ips奔溃日志分析

    这里只讲APP在别的机器构建、打包的情况 先在桌面上建个文件夹,就叫crash吧1..ips崩溃报告文件重命名为....

  • 关于符号化(symbolic)和奔溃信息的分析

    最初始的需求是:怎么定位线上的奔溃。 那就是捕获NSException,收集发送到后台统一处理。或者接入第三方奔溃...

  • iOS-千奇百怪的奔溃

    App 上线后,我们最怕的应该就是异常奔溃了。常见的奔溃类型分两种:信号可捕获奔溃、信号不可捕获奔溃,前者比较典型...

  • 线上Crash信息收集调试

    线上出现问题如何定位出错位置 第一种、自己收集奔溃信息 缺点:定位精度不够 第二种、dSYMf分析日志 通过Mac...

  • 奔溃

    送表弟们去上学的路上,听说刚才忘了拿《斗罗大陆》,却又接着说可以向同学借。我的内心立即奔溃了,满满的无力感! 当初...

  • 奔溃

    我总是想不让自己在乎别人的眼光,所以我总是在街上很自在,不担心自己的丑态被别人看到,我也经常对女友这么说,不要在意...

  • 奔溃

    工作到奔溃 加班到晚上八九点 没有人会理解你的苦楚,下班回家星空相伴,人与人之间的关系脆弱到分崩离析,工作后发现身...

网友评论

      本文标题:记一次线上奔溃分析过程

      本文链接:https://www.haomeiwen.com/subject/zlgclhtx.html