美文网首页路由联盟
1-Linux 保存kernel panic信息到flash

1-Linux 保存kernel panic信息到flash

作者: Creator_Ly | 来源:发表于2023-02-22 14:44 被阅读0次

在系统运行过程中,如果内核发生了panic,那么开发人员需要通过内核报错日志来进行定位问题。但是很多时候出现问题的时候没有接调试串口,而报错日志是在内存里面的,重启后就丢失了。所以需要一种方法,可以在系统发生crash时,将crash info保存于非易失存储器中。

1、捕获panic原理

内核使用kmsg_dump_register()函数来注册捕获panic或者oops,如今内核已经有多种捕获panic的方式,最新的是pstore方式。

根据网上搜寻的资料,在pstore文件系统之前其实有不少类似的实现。

  • apanic

Android最早的panic信息记录的方案。在linux 2.6的安卓的内核中找到,却没有提交到社区,后来被放弃维护了。网上找不到放弃的原因,我自己猜测是因为其只适用于mtd nand,然而现在的Android基本用的都是emmc。apanic应该是Android Panic的缩写吧,可以实现在内核崩溃时,把日志转存到mtd nand。

  • ramoops

这里指的是最早的ramoops实现,在最新代码已经整合入pstore中,以pstore/ram的后端形式存在。ramoops可以把日志转存到重启不掉电的ram中。这里对ram有一点要求,即使重启ram的数据也不能丢失。

  • crashlog

这是openwrt提供的内核patch,并没有提交到内核社区。它也是基于ram,只能转存Panic/Oops的日志。

  • mtdoops

MTD子系统支持的功能,与pstore非常相似,只支持转存Panic/Oops日志,不能以文件呈现,需要用户自行解析整个MTD分区。(因为功能的相似,我实现了mtdpstore用于替代mtdoops)

  • kdump

如果说pstore是个轻量级的内核崩溃日志转存的方案,kdump则是一个重量级的问题分析工具。在崩溃时,由kdump产生一个用于捕抓当前信息的内核,该内核会收集内存所有信息到dump core文件中。在重启后,捕抓到的信息保存在特定的文件中。类似的还有netdump和diskdump。kdump的方案适用于服务器这种有大量资源的设备,功能也非常强大,但对嵌入式设备非常不友好。

Linux pstore 实现自动“抓捕”内核崩溃日志:
:https://cloud.tencent.com/developer/article/1646413?from=15425&areaSource=102001.1&traceId=OMnQ0q6ZZQELlvCmhM3dr

2、mtdoops方式实现

2.1 mtdoops配置打开

menuconfig,mtd_oops勾选上

Device Drivers  --->
    Memory Technology Device (MTD) support  ---> 
        Log panic/oops to an MTD buffer 

代码./drivers/mtd/mtdoops.c

mtd添加kpanic分区用来日志保存

partition@a0000 {
    label = "kpanic";
    reg = <0xa0000 0x60000>;
};

./drivers/mtd/mtdoops.c添加对应的分区名字

static char mtddev[80] = "kpanic";
2.2 内核写入mtd流程说明

mtd初始化成功后,会进入mtdoops_notify_add()函数。

该函数会做如下几件事

  • 判断保存panic的分区是哪个,并且判断分区大小之类的边界值
  • 注册panic回调函数,这样当内核出现panic时就会执行回调函数
  • 读取分区头部信息,确认分区存储情况,下一次保存panic的地址
static void mtdoops_notify_add(struct mtd_info *mtd)
{
    struct mtdoops_context *cxt = &oops_cxt;
    u64 mtdoops_pages = div_u64(mtd->size, record_size);
    int err;

    /*判断是否为保存panic的分区名称
    开头可以定义分区名称如下:static char mtddev[80] = "kpanic";
    */
    if (!strcmp(mtd->name, mtddev))
        cxt->mtd_index = mtd->index;

    if (mtd->index != cxt->mtd_index || cxt->mtd_index < 0)
        return;

    if (mtd->size < mtd->erasesize * 2) {
        printk(KERN_ERR "mtdoops: MTD partition %d not big enough for mtdoops\n",
               mtd->index);
        return;
    }
    if (mtd->erasesize < record_size) {
        printk(KERN_ERR "mtdoops: eraseblock size of MTD partition %d too small\n",
               mtd->index);
        return;
    }
    if (mtd->size > MTDOOPS_MAX_MTD_SIZE) {
        printk(KERN_ERR "mtdoops: mtd%d is too large (limit is %d MiB)\n",
               mtd->index, MTDOOPS_MAX_MTD_SIZE / 1024 / 1024);
        return;
    }

    /* oops_page_used is a bit field */
    cxt->oops_page_used = vmalloc(DIV_ROUND_UP(mtdoops_pages,
            BITS_PER_LONG) * sizeof(unsigned long));
    if (!cxt->oops_page_used) {
        printk(KERN_ERR "mtdoops: could not allocate page array\n");
        return;
    }

    /*注册panic回调函数*/
    cxt->dump.max_reason = KMSG_DUMP_OOPS;
    cxt->dump.dump = mtdoops_do_dump;
    err = kmsg_dump_register(&cxt->dump);
    if (err) {
        printk(KERN_ERR "mtdoops: registering kmsg dumper failed, error %d\n", err);
        vfree(cxt->oops_page_used);
        cxt->oops_page_used = NULL;
        return;
    }

    /*检测分区存储情况*/
    cxt->mtd = mtd;
    cxt->oops_pages = (int)mtd->size / record_size;
    find_next_position(cxt);
    printk(KERN_INFO "record_size:%d,max_count:%d,next_count:%d\n", record_size, cxt->oops_pages, cxt->nextcount);
    printk(KERN_INFO "mtdoops: Attached to MTD device %d\n", mtd->index);
}

捕获到panic后执行如下回调函数

static void mtdoops_do_dump(struct kmsg_dumper *dumper,
                enum kmsg_dump_reason reason)
{
    struct timeval tv;
    struct rtc_time tm;
    int skip = MTDOOPS_HEADER_SIZE;
    int len;
    struct mtdoops_context *cxt = container_of(dumper,
            struct mtdoops_context, dump);

    /* Only dump oopses if dump_oops is set */
    if (reason == KMSG_DUMP_OOPS && !dump_oops)
        return;

    /*添加一些自定义信息,比如panic时间,固件版本等信息*/
    do_gettimeofday(&tv);
    rtc_time_to_tm(tv.tv_sec + KPANIC_CST_OFFSET_S, &tm);
    len = sprintf(cxt->oops_buf + skip, "PANIC time:(%d-%02d-%02d %02d:%02d:%02d)\n", 
                            tm.tm_year+1900, tm.tm_mon + 1, tm.tm_mday,
                            tm.tm_hour, tm.tm_min, tm.tm_sec);
    skip += len;
    len = sprintf(cxt->oops_buf + skip, "PANIC version: %s\n", zr_firmware_ver);
    skip += len;

    kmsg_dump_get_buffer(dumper, true, cxt->oops_buf + skip,
                 record_size - skip, NULL);

    /* Panics must be written immediately */
    if (reason != KMSG_DUMP_OOPS)
        mtdoops_write(cxt, 1);

    /* For other cases, schedule work to write it "nicely" */
    schedule_work(&cxt->work_write);
}

调用mtdoops_write()函数将kmsg_dump_get_buffer的信息下入mtd分区。

static void mtdoops_write(struct mtdoops_context *cxt, int panic)
{
    struct mtd_info *mtd = cxt->mtd;
    size_t retlen;
    u32 *hdr;
    int ret;

    /* Add mtdoops header to the buffer */
    hdr = cxt->oops_buf;
    hdr[0] = cxt->nextcount;
    hdr[1] = MTDOOPS_KERNMSG_MAGIC;

    if (panic) {
        ret = mtd_panic_write(mtd, cxt->nextpage * record_size,
                      record_size, &retlen, cxt->oops_buf);
        if (ret == -EOPNOTSUPP) {
            printk(KERN_ERR "mtdoops: Cannot write from panic without panic_write\n");
            return;
        }
    } else
        ret = mtd_write(mtd, cxt->nextpage * record_size,
                record_size, &retlen, cxt->oops_buf);

    if (retlen != record_size || ret < 0)
        printk(KERN_ERR "mtdoops: write failure at %ld (%td of %ld written), error %d\n",
               cxt->nextpage * record_size, retlen, record_size, ret);
    mark_page_used(cxt, cxt->nextpage);
    memset(cxt->oops_buf, 0xff, record_size);

    mtdoops_inc_counter(cxt);
}

这边的区别在于,为啥要使用mtd_panic_write写入,而不用普通mtd_write函数。

因为内核要挂死的时候,如果还是按原本的设备驱动流程去写flash,会出现写不成功的情况,所以不同的存储驱动一般都适配了对应的panic_nand_write函数。

如果是nand flash,mtd_panic_write最终会调用到./drivers/mtd/nand/raw/nand_base.c下面的panic_nand_write函数

static int panic_nand_write(struct mtd_info *mtd, loff_t to, size_t len,
                size_t *retlen, const uint8_t *buf)
{
    struct nand_chip *chip = mtd_to_nand(mtd);
    int chipnr = (int)(to >> chip->chip_shift);
    struct mtd_oob_ops ops;
    int ret;

    /* Grab the device */
    panic_nand_get_device(chip, mtd, FL_WRITING);

    chip->select_chip(mtd, chipnr);

    /* Wait for the device to get ready */
    panic_nand_wait(mtd, chip, 400);

    memset(&ops, 0, sizeof(ops));
    ops.len = len;
    ops.datbuf = (uint8_t *)buf;
    ops.mode = MTD_OPS_PLACE_OOB;

    ret = nand_do_write_ops(mtd, to, &ops);

    *retlen = ops.retlen;
    return ret;
}
2.3 mtdoops写入测试

这个命令可以让kernel crash

 echo c > /proc/sysrq-trigger

报错日志如下:

root@zihome:/# echo c > /proc/sysrq-trigger
[  915.096147] sysrq: SysRq : Trigger a crash
[  915.101081] CPU 1 Unable to handle kernel paging request at virtual address 00000000, epc == 8124ac98, ra == 8124b0c4
[  915.111744] Oops[#1]:
[  915.114033] CPU: 1 PID: 671 Comm: ash Not tainted 4.4.198 #0
[  915.119694] task: 87da85e0 task.stack: 86362000
[  915.124219] $ 0   : 00000000 00000001 00000001 81910000
[  915.129462] $ 4   : 00000063 00000001 000002ad 00157000
[  915.134703] $ 8   : 00000000 00005e30 00000000 00000134
[  915.139941] $12   : 00000000 00000000 00000134 00000000
[  915.145181] $16   : 81913ae0 81910000 00000063 00000000
[  915.150419] $20   : 00000007 81913c60 77ddf518 77de0d8c
[  915.155662] $24   : 00000018 81250e74                  
[  915.160900] $28   : 86362000 86363df8 00000000 8124b0c4
[  915.166143] Hi    : 00000000
[  915.169019] Lo    : 122f2000
[  915.171907] epc   : 8124ac98 sysrq_handle_crash+0x10/0x18
[  915.177309] ra    : 8124b0c4 __handle_sysrq+0xcc/0x1b8
[  915.182443] Status: 11007c03 KERNEL EXL IE 
[  915.186639] Cause : c080000c (ExcCode 03)
[  915.190642] BadVA : 00000000
[  915.193521] PrId  : 0001992f (MIPS 1004Kc)
[  915.197612] Modules linked in: mt753x iptable_nat nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY ts_kmp ts_fsm ts_bm nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 mtfwd mapfilter iptable_mangle iptable_filter ipt_ECN ip_tables crypto_hw_eip93 crc_ccitt fuse act_ipt em_nbyte em_meta em_text sch_codel em_cmp act_connmark nf_conntrack em_u32 sch_ingress sg ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables msdos ip_gre gre ifb l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ip_tunnel vfat fat nls_utf8 nls_iso8859_1 nls_cp437 nls_base sha1_generic ecb des_generic authenc leds_gpio sd_mod scsi_mod gpio_button_hotplug ext4 jbd2 mbcache mbcache2 crc32c_generic [last unloaded: mt753x]
[  915.294698] Process ash (pid: 671, threadinfo=86362000, task=87da85e0, tls=77de7d48)
[  915.302438] Stack : 87c29c50 8180b5f8 81e62d9c 00157000 00000002 00000002 77ddd340 86363f08
          77ddc000 77ddc000 77ddf518 8124b21c 00000000 8123eb14 00000000 86107640
          80c5f700 8113f30c 80df4018 0000000d 87d63016 00000000 00000000 854be5a0
          00000002 810ecd58 00000000 00000924 00000000 0000000b 865f8d20 00000020
          865f8d38 0000000a 00000000 87da85d8 fffffff6 81035494 865f8d38 00000000
          ...
[  915.338093] Call Trace:
[  915.340541] [<8124ac98>] sysrq_handle_crash+0x10/0x18
[  915.345601] [<8124b0c4>] __handle_sysrq+0xcc/0x1b8
[  915.350393] [<8124b21c>] write_sysrq_trigger+0x40/0x5c
[  915.355552] [<8113f30c>] proc_reg_write+0x68/0xac
[  915.360259] [<810ecd58>] __vfs_write+0x28/0x110
[  915.364795] [<810ed764>] vfs_write+0xb0/0x194
[  915.369154] [<810ee13c>] SyS_write+0x74/0xf0
[  915.373439] [<810078d8>] syscall_common+0x30/0x54
[  915.378141] 
[  915.379627] 
Code: 3c038191  ac62c6e4  0000000f <03e00008> a0020000  27bdffe0  24020007  afb10018  2491ffd0 
[  915.389901] ---[ end trace 2a7448a53c8e5e61 ]---
[  915.396246] Fatal exception: panic in 5 seconds
[  915.415478] mtdoops: ready 15, 16 (no erase)

重启后查看mtd日志

如果我们定义的kpanic分区位于mtd1,则如下查看

root@openwrt:/tmp# cat /dev/mtdblock1 > 1.txt
root@openwrt:/tmp# cat 1.txt

3、pstore方式实现

在测试emmc的硬件板子,会出现mtdoops: Cannot write from panic without panic_write,因为emmc的驱动里面没有panic_write函数。

root@openwrt:/lib/modules/4.19.81# echo c > /proc/sysrq-trigger 
[ 1496.199280] sysrq: SysRq : Trigger a crash
[ 1496.203539] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
[ 1496.596036] Call trace:
[ 1496.598553]  sysrq_handle_crash+0x14/0x20
[ 1496.602681]  write_sysrq_trigger+0x6c/0x80
[ 1496.606903]  proc_reg_write+0x5c/0x98
[ 1496.610675]  __vfs_write+0x18/0x130
[ 1496.614266]  vfs_write+0xb4/0x178
[ 1496.617678]  ksys_write+0x50/0xa0
[ 1496.621089]  __arm64_sys_write+0x18/0x20
[ 1496.625130]  el0_svc_common+0x8c/0xf0
[ 1496.628901]  el0_svc_handler+0x68/0x70
[ 1496.632762]  el0_svc+0x8/0xc
[ 1496.635727] Code: 52800020 b908d020 d5033e9f d2800001 (39000020) 
[ 1496.642007] ---[ end trace d88c536421afe116 ]---
[ 1496.647533] Kernel panic - not syncing: Fatal exception
[ 1496.652921] SMP: stopping secondary CPUs
[ 1496.656960] Kernel Offset: disabled
[ 1496.660551] CPU features: 0x0,20002000
[ 1496.664410] Memory Limit: none
[ 1496.668301] mtdoops: Cannot write from panic without panic_write
[ 1496.674491] Rebooting in 3 seconds..

如果修改mtdoops的panic_write改成直接mtd_write的话,会出现如下错误,没有写入成功,类似上面说的系统已经出问题了,不能按正常设备驱动写入。

[   58.047260] Call trace:
[   58.049780]  sysrq_handle_crash+0x14/0x20
[   58.053912]  write_sysrq_trigger+0x6c/0x80
[   58.058137]  proc_reg_write+0x5c/0x98
[   58.061913]  __vfs_write+0x18/0x130
[   58.065508]  vfs_write+0xb4/0x178
[   58.068922]  ksys_write+0x50/0xa0
[   58.072336]  __arm64_sys_write+0x18/0x20
[   58.076381]  el0_svc_common+0x8c/0xf0
[   58.080155]  el0_svc_handler+0x68/0x70
[   58.084018]  el0_svc+0x8/0xc
[   58.086987] Code: 52800020 b908d020 d5033e9f d2800001 (39000020) 
[   58.093271] ---[ end trace 487bc19a9b8382f3 ]---
[   58.105794] Kernel panic - not syncing: Fatal exception
[   58.111188] SMP: stopping secondary CPUs
[   58.115232] Kernel Offset: disabled
[   58.118827] CPU features: 0x0,20002000
[   58.122686] Memory Limit: none
[   63.225303] mtk-msdc 11230000.mmc: msdc_request_timeout: aborting cmd/data/mrq
[   63.232774] mtk-msdc 11230000.mmc: msdc_request_timeout: aborting mrq=00000000045fc364 cmd=18
[   63.241587] mtk-msdc 11230000.mmc: msdc_request_timeout: aborting cmd=18
[   88.164968] WARNING: CPU: 1 PID: 10 at irq_work_queue_on+0x30/0x108

研究mtd调用emmc写入没研究成功,所以接着研究通过pstore的方式进行写入。

3.1 pstore配置打开
|-> File systems
    |-> Miscellaneous filesystems
      |-> Persistent store support
        |-> Log kernel console messages    # console 前端
        |-> Log user space messages      # pmsg 前端
        |-> Persistent function tracer      # ftrace 前端
        |-> Log panic/oops to a RAM buffer     # pstore/ram 后端
        |-> Log panic/oops to a block device   # pstore/blk 后端

当panic出现后,pstore会将dmesg的信息保存到内存,重启后,再从内存中读出来。

由于要存在内存下,所以我们就需要分配一段固定的内存给oops保存。

首先查看iomem哪一块内存可以使用,如下:

root@Openwrt:~# cat /proc/iomem

...

40000000-42ffffff : System RAM
43030000-4fffffff : System RAM
  44000000-44615fff : reserved
  48080000-487bffff : Kernel code
  487c0000-4880ffff : reserved
  48810000-488affff : Kernel data
  4e7f2000-4f5fffff : reserved
  4f7f2000-4f7f7fff : reserved
  4fc00000-4fdfffff : reserved
  4ff4b000-4ffabfff : reserved
  4ffac000-4ffd3fff : reserved
  4ffd6000-4ffd6fff : reserved
  4ffd7000-4ffd7fff : reserved
  4ffd8000-4ffd9fff : reserved
  4ffda000-4fff6fff : reserved
  4fff7000-4fffffff : reserved

RAM 的物理地址空间为 40000000-4fffffff,256MB。选择在 Kernel code 和 Kernel data 之外的 40000000 作为 ramoops 的起始地址,大小为 1MB。

修改后的RAM,如下:多了一块预留的内存。

40000000-42ffffff : System RAM
  41000000-410fffff : reserved
43030000-4fffffff : System RAM
  44000000-44615fff : reserved
  48080000-487bffff : Kernel code
  487c0000-4880ffff : reserved
  48810000-488affff : Kernel data
  4e7f1000-4f5fffff : reserved
  4f7f1000-4f7f6fff : reserved
  4fc00000-4fdfffff : reserved
  4ff4b000-4ffabfff : reserved
  4ffac000-4ffd3fff : reserved
  4ffd6000-4ffd7fff : reserved
  4ffd8000-4ffd9fff : reserved
  4ffda000-4fff6fff : reserved
  4fff7000-4fffffff : reserved

所以在dts里面添加如下配置:

    reserved-memory {
        #address-cells = <2>;
        #size-cells = <2>;
        ranges;

        ...

        ramoops@40000000 {
            compatible = "ramoops";
            reg = <0x0 0x41000000 0x0 0x00100000>;
            record-size     = <0x00020000>;
            console-size    = <0x00020000>;
            ftrace-size     = <0x00020000>;
        };

    };

嵌入式Linux上使用Ramoops:http://www.gongkong.com/article/202209/101525.html

3.2 pstore实现流程说明
3.3 pstore写入测试

制造panic

root@zihome:/# echo c > /proc/sysrq-trigger 
[766252.198643] sysrq: SysRq : Trigger a crash
[766252.202989] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[766252.212172] Mem abort info:
[766252.215150]   ESR = 0x96000046
[766252.218383]   Exception class = DABT (current EL), IL = 32 bits
[766252.224572]   SET = 0, FnV = 0
[766252.227811]   EA = 0, S1PTW = 0
[766252.231132] Data abort info:
[766252.234185]   ISV = 0, ISS = 0x00000046
[766252.238230]   CM = 0, WnR = 1
[766252.241375] user pgtable: 4k pages, 39-bit VAs, pgdp = 000000005925b985
[766252.248287] [0000000000000000] pgd=0000000049706003, pud=0000000049706003, pmd=0000000000000000
[766252.257356] Internal error: Oops: 96000046 [#1] SMP
[766252.262470] Modules linked in: zfirewall mt753x zdlogin pppoe ppp_async pppox ppp_generic iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_pkttype xt_owner xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_bpf xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_HL xt_FLOWOFFLOAD xt_DSCP xt_CT xt_CLASSIFY ts_kmp ts_fsm ts_bm slhc nf_reject_ipv4 nf_nat_ipv4 nf_log_ipv4 nf_flow_table_hw nf_flow_table nf_conntrack_rtcache nf_conntrack_netlink nf_conncount iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables crc_ccitt br_netfilter 8250_pci fuse sch_teql sch_sfq sch_red sch_prio sch_pie sch_multiq sch_gred sch_fq sch_dsmark
[766252.336250]  sch_codel em_text em_nbyte em_meta em_cmp act_simple act_police act_pedit act_ipt act_gact act_csum act_connmark sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred znetstat sg hid evdev input_core xt_set ip_set_list_set ip_set_hash_netportnet ip_set_hash_netport ip_set_hash_netnet ip_set_hash_netiface ip_set_hash_net ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6table_nat ip6t_NPT ip6t_MASQUERADE nf_nat_ipv6 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6
[766252.409111]  nfsv4 nfsv3 nfsd nfs msdos bonding ip6_gre ip_gre gre ifb sit sctp ipip ip6_tunnel tunnel6 tunnel4 ip_tunnel rpcsec_gss_krb5 auth_rpcgss oid_registry tun vfat fat udf crc_itu_t ntfs lockd sunrpc grace minix isofs hfsplus hfs cramfs configfs cifs autofs4 rxrpc dns_resolver fscache nls_utf8 nls_iso8859_1 nls_cp437 vxlan udp_tunnel ip6_udp_tunnel sha512_generic sha1_generic seqiv pcbc md5 md4 fcrypt des_generic cts ctr ccm cbc arc4 usb_storage sdhci_pltfm sdhci leds_gpio xhci_plat_hcd ohci_pci uhci_hcd ohci_platform ohci_hcd ehci_pci ehci_platform ehci_hcd gpio_button_hotplug xfs reiserfs jfs f2fs ext4 mbcache jbd2 exfat btrfs zstd_decompress zstd_compress xxhash xor raid6_pq libcrc32c crc32c_generic crc32_generic [last unloaded: mt753x]
[766252.477621] Process ash (pid: 1143, stack limit = 0x00000000d9570130)
[766252.484353] CPU: 0 PID: 1143 Comm: ash Tainted: G S      W         4.19.81 #0
[766252.491797] Hardware name: ZIHOME ZH-A0501 Board (DT)
[766252.497091] pstate: 80000005 (Nzcv daif -PAN -UAO)
[766252.502122] pc : sysrq_handle_crash+0x14/0x20
[766252.506701] lr : __handle_sysrq+0xd0/0x170
[766252.511008] sp : ffffff800a30bcf0
[766252.514508] x29: ffffff800a30bcf0 x28: ffffffc00bbb9f80 
[766252.520072] x27: 0000000000000000 x26: 0000000000000000 
[766252.525636] x25: 0000000056000000 x24: 0000000000000015 
[766252.531200] x23: ffffff800883ef70 x22: 0000000000000000 
[766252.536764] x21: 0000000000000008 x20: 0000000000000063 
[766252.542328] x19: ffffff8008825000 x18: 0000000000000000 
[766252.547892] x17: 0000000000000000 x16: 0000000000000000 
[766252.553456] x15: 0000000000000000 x14: 0000000000000000 
[766252.559019] x13: ffffff800886b2a0 x12: 0000000000000000 
[766252.564583] x11: ffffff8008818548 x10: 0000000000000010 
[766252.570147] x9 : 0000000000000000 x8 : 0000000000000000 
[766252.575711] x7 : 000000000000000f x6 : 000000000000028b 
[766252.581274] x5 : 0000000000000001 x4 : 0000000000000001 
[766252.586838] x3 : 0000000000000007 x2 : 0000000000000007 
[766252.592401] x1 : 0000000000000000 x0 : 0000000000000001 
[766252.597965] Call trace:
[766252.600572]  sysrq_handle_crash+0x14/0x20
[766252.604791]  write_sysrq_trigger+0x6c/0x80
[766252.609103]  proc_reg_write+0x5c/0x98
[766252.612965]  __vfs_write+0x18/0x130
[766252.616646]  vfs_write+0xb4/0x178
[766252.620148]  ksys_write+0x50/0xa0
[766252.623649]  __arm64_sys_write+0x18/0x20
[766252.627780]  el0_svc_common+0x8c/0xf0
[766252.631641]  el0_svc_handler+0x68/0x70
[766252.635591]  el0_svc+0x8/0xc
[766252.638647] Code: 52800020 b908d020 d5033e9f d2800001 (39000020)
[766252.645017] ---[ end trace f30f2fc370a0dcc8 ]---
[766252.658770] Kernel panic - not syncing: Fatal exception
[766252.664248] SMP: stopping secondary CPUs
[766252.668379] Kernel Offset: disabled
[766252.672062] CPU features: 0x0,20002000
[766252.676010] Memory Limit: none
[766252.688058] Rebooting in 3 seconds..

重启成功后,手动挂载查看内存保存的数据。

mount -t pstore pstore /sys/fs/pstore

将pstore的内容挂载到文件系统,如果有内容的话就会在文件系统下生成文件,如下:

root@openwrt:/# ls /sys/fs/pstore/
console-ramoops-0

我们可以添加启动脚本,当发下存在内容的话,则把内容写入到mtd分区

mtd -q write /sys/fs/pstore/console-ramoops-0 kpanic

当然这样有很多弊端,比如

  • panic后还没重启成功,写入的mtd的时候系统被断电了,那就没有日志了
  • 不像mtdoops那样,可以存储多份panic日志,做标记循环存储
  • 好像reboot的时候,也会生成pstore文件,需要过滤,不是panic的没有必要存。

添加重启后自动写入脚本

cat /etc/init.d/zpstore 
#!/bin/sh /etc/rc.common

START=99
STOP=99

start() {
    mount -t pstore pstore /sys/fs/pstore

    local pstore_file="/tmp/pstore_file"
    need_save=0

    if [ -f $pstore_file ];then
            rm ${pstore_file}
    fi
    touch ${pstore_file}

    for line in `ls /sys/fs/pstore/`
    do
            local file=/sys/fs/pstore/$line
            local panic_msg="$(grep -rn "Call trace" $file)"  #包含Call trace字段的说明是panic信息,进行保存

            if [ "$panic_msg" != "" ]; then
                    need_save=1
                    echo "PANIC time: $(date)" >> $pstore_file
                    cat $file >> $pstore_file
            fi
    done

    if [ $need_save == 1 ]; then
            mtd -q write $pstore_file kpanic
    fi
}

重启后查看mtd日志

如果我们定义的kpanic分区位于mtd1,则如下查看

root@openwrt:/tmp# cat /dev/mtdblock1 > 1.txt
root@openwrt:/tmp# cat 1.txt

4、panic写入后没有自动重启问题解决

有时候panic挂了,也写入mtd。但是系统没有重启。可以观察到内核主动调用的reboot好像被写mtd的打断了,没有调用成功。

但是内核看门狗也没有把系统重启,将内核看门狗的日志打印出来,可以观察到还是有进程一直在给看门狗喂狗(应该是procd进程没Kill掉,一直5秒喂一次狗导致)

内核看门狗代码位于./drivers/watchdog/mt7621_wdt.c

static int mt762x_wdt_ping(struct watchdog_device *w)
{
    rt_wdt_w32(TIMER_REG_TMRSTAT, TMR1CTL_RESTART);

    return 0;
}

所以在内核看门狗中添加panic捕获机制,如果出现panic的时候,将看门狗触发时间改成3秒,这个就算procd进行没挂,等5秒再喂狗也来不及了,系统会被看门狗重启。

代码如下:

static int mt762x_panic_notify(struct notifier_block *this, unsigned long code, void *unused)
{
    printk("mt762x_panic_notify\n");
    printk("Rebooting in 3 seconds..");
    mt762x_wdt_stop(&mt762x_wdt_dev);
    mt762x_wdt_set_timeout(&mt762x_wdt_dev, 3);
    mt762x_wdt_start(&mt762x_wdt_dev);     /* Turn on the WDT to avoid panic hangup */

    return NOTIFY_DONE;
}

static struct notifier_block mt762x_panic_notifier = {
    .notifier_call = mt762x_panic_notify,
};


static int mt762x_wdt_probe(struct platform_device *pdev)
{
    ....

    atomic_notifier_chain_register(&panic_notifier_list, &mt762x_panic_notifier);

    ....
}

5、有时候重启的时候会挂,但是panic没有写入mtd

还不知道原因

[  286.332508] get_wdev_by_idx: invalid idx(1)
[  286.336937] CPU 0 Unable to handle kernel paging request at virtual address 000000b0, epc == 814f6ec0, ra == 8125bd80
[  286.347597] Oops[#1]:
[  286.349875] CPU: 0 PID: 7557 Comm: reboot Not tainted 4.4.198 #0
[  286.355885] task: 87e691a0 task.stack: 8061a000
[  286.360413] $ 0   : 00000000 00000001 00000010 00000000
[  286.365656] $ 4   : 87e04c00 8180c26c 00000200 0014d000
[  286.370895] $ 8   : 00000000 00005e30 00000000 00000133
[  286.376135] $12   : 00000000 00000000 00000133 00000000
[  286.381372] $16   : 87e04c74 87e04468 81cd0000 81d40000
[  286.386613] $20   : 81d10000 8180c26c 87e04c9c 77e8fd8c
[  286.391852] $24   : 00000002 814f6e80                  
[  286.397091] $28   : 8061a000 8061bd78 00000000 8125bd80
[  286.402332] Hi    : 00000000
[  286.405207] Lo    : 122f2000
[  286.408095] epc   : 814f6ec0 rt_pci_shutdown+0x40/0x204
[  286.413329] ra    : 8125bd80 device_shutdown+0x168/0x210
[  286.418635] Status: 11008403 KERNEL EXL IE 
[  286.422830] Cause : c0800008 (ExcCode 02)
[  286.426834] BadVA : 000000b0
[  286.429710] PrId  : 0001992f (MIPS 1004Kc)
[  286.433800] Modules linked in: mt753x iptable_nat nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY ts_kmp ts_fsm ts_bm nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 mtfwd mapfilter iptable_mangle iptable_filter ipt_ECN ip_tables crypto_hw_eip93 crc_ccitt fuse act_ipt em_nbyte em_meta em_text sch_codel em_cmp act_connmark nf_conntrack em_u32 sch_ingress sg ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables msdos ip_gre gre ifb l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ip_tunnel vfat fat nls_utf8 nls_iso8859_1 nls_cp437 nls_base sha1_generic ecb des_generic authenc leds_gpio sd_mod scsi_mod gpio_button_hotplug ext4 jbd2 mbcache mbcache2 crc32c_generic [last unloaded: mt753x]
[  286.530843] Process reboot (pid: 7557, threadinfo=8061a000, task=87e691a0, tls=77e96d48)
[  286.538927] Stack : fee1dead 77e8b000 77e8e518 77e8fd8c 87e04c74 87e04468 87e04c68 81d40000
          81d10000 8180c26c 87e04c9c 77e8fd8c 00000000 8125bd80 87e69480 00000000
          00000000 81900000 00000000 81900000 81910000 01234567 fee1dead 77e8b000
          77e8e518 8104fff4 00000000 00000001 8061be50 81900000 00000000 8105030c
          8111dfe0 00000000 fffffffb 8782e144 0000000d 810bb3e0 8782e010 00000000
          ...
[  286.574568] Call Trace:
[  286.577013] [<814f6ec0>] rt_pci_shutdown+0x40/0x204
[  286.581895] [<8125bd80>] device_shutdown+0x168/0x210
[  286.586873] [<8104fff4>] kernel_restart+0x14/0x68
[  286.591576] [<8105030c>] SyS_reboot+0x134/0x21c
[  286.596116] [<810078d8>] syscall_common+0x30/0x54
[  286.600816] 
[  286.602302] 
Code: 8c4300d0  8e422340  30420010 <1040000a> 8c7600b0  3c0281cd  8c421fd0  30420002  10400005 
[  286.612430] ---[ end trace 97f3cc1d21a76f95 ]---
[  286.618682] Fatal exception: panic in 5 seconds
[  291.631880] Kernel panic - not syncing: Fatal exception
[  291.639833] CPU 0 Unable to handle kernel paging request at virtual address 0000024c, epc == 812851e4, ra == 81278f30
[  291.650442] Oops[#2]:
[  291.652725] CPU: 0 PID: 7557 Comm: reboot Tainted: G      D         4.4.198 #0
[  291.659950] task: 87e691a0 task.stack: 8061a000
[  291.664478] $ 0   : 00000000 00000001 00000000 00000000
[  291.669726] $ 4   : 86558018 00000000 000d8000 00000000
[  291.674973] $ 8   : 86558018 00000000 c0047000 6e202d20
[  291.680221] $12   : 7320746f 00000000 00000000 69636e79
[  291.685468] $16   : 00000000 81910000 00000001 000d8000
[  291.690716] $20   : 81d30000 81d30000 81d30000 81d30000
[  291.695963] $24   : 00000018 812851c0                  
[  291.701210] $28   : 8061a000 8061ba68 00000001 81278f30
[  291.706460] Hi    : 0000003f
[  291.709337] Lo    : 2e147bde
[  291.712249] epc   : 812851e4 panic_nand_write+0x24/0xe0
[  291.717477] ra    : 81278f30 mtdoops_write+0x64/0xf4
[  291.722439] Status: 11008402 KERNEL EXL 
[  291.726374] Cause : 40800008 (ExcCode 02)
[  291.730381] BadVA : 0000024c
[  291.733261] PrId  : 0001992f (MIPS 1004Kc)
[  291.737355] Modules linked in: mt753x iptable_nat nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP xt_CLASSIFY ts_kmp ts_fsm ts_bm nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 mtfwd mapfilter iptable_mangle iptable_filter ipt_ECN ip_tables crypto_hw_eip93 crc_ccitt fuse act_ipt em_nbyte em_meta em_text sch_codel em_cmp act_connmark nf_conntrack em_u32 sch_ingress sg ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables msdos ip_gre gre ifb l2tp_netlink l2tp_core udp_tunnel ip6_udp_tunnel ip_tunnel vfat fat nls_utf8 nls_iso8859_1 nls_cp437 nls_base sha1_generic ecb des_generic authenc leds_gpio sd_mod scsi_mod gpio_button_hotplug ext4 jbd2 mbcache mbcache2 crc32c_generic [last unloaded: mt753x]
[  291.834567] Process reboot (pid: 7557, threadinfo=8061a000, task=87e691a0, tls=77e96d48)
[  291.842654] Stack : c004704d 00000001 81d3df40 81910000 00001f68 00004b08 00006b50 00000000
          000000f0 00000000 00000155 8107200c 81800000 818034f0 ff0a0000 81d3df40
          81910000 00000001 8190e134 81278f30 00000001 c004704e 00000000 81d30000
          00002000 8061badc c0047000 8190e134 81d30000 00000000 81d30000 81d3df40
          00000029 812790b0 00000000 00000003 8190d5f4 00000000 00000000 0000000e
          ...
[  291.878358] Call Trace:
[  291.880810] [<812851e4>] panic_nand_write+0x24/0xe0
[  291.885697] [<81278f30>] mtdoops_write+0x64/0xf4
[  291.890319] [<812790b0>] mtdoops_do_dump+0xf0/0x11c
[  291.895215] [<81074ed8>] kmsg_dump+0xdc/0x11c
[  291.899590] [<81032b88>] panic+0xcc/0x204
[  291.903608] [<81018aec>] die+0x128/0x130
[  291.907551] [<81021ea8>] __do_page_fault+0x360/0x48c
[  291.912522] [<81005420>] ret_from_exception+0x0/0x10
[  291.917483] 
[  291.918972] 
Code: 8c9000f0  00c09821  00071840 <8e02024c> 8e050498  00023027  00e09021  00c33004  00531806 
[  291.929226] ---[ end trace 97f3cc1d21a76f96 ]---
[  291.936505] Fatal exception: panic in 5 seconds

kernel crash dump info 保存:https://blog.csdn.net/u012385733/article/details/79257442

内核printk原理介绍:https://zhuanlan.zhihu.com/p/521094976

相关文章

网友评论

    本文标题:1-Linux 保存kernel panic信息到flash

    本文链接:https://www.haomeiwen.com/subject/kxyakdtx.html