对于内核报错 “Unable to handle kernel paging request at virtual address” 的问题, 绝大多数都是由于程序使用了不可用的指针而引起的.
下面是国产CPU龙芯3A4000机器上出现的一例异常分析过程。
1、异常信息
A8-3
[59259.423469] CPU 3 Unable to handle kernel paging request at virtual address 0000000001c87a44, epc == ffffffff803c6a40, ra == ffffffff803c6a38
[59259.436112] Oops[#1]:
[59259.438373] CPU: 3 PID: 2217 Comm: Xorg Not tainted 4.19.0-loongson-3-desktop #1291
[59259.445984] Hardware name: THTF CQTL630 Series/THTF-LS3A4000-7A1000-1W-VB1-ML4A, BIOS V1.2.4 04/14/2020
[59259.455323] $ 0 : 0000000000000000 0000000000000001 0000000001c87a43 1617b2af0ce4f5be
[59259.463286] $ 4 : ffffffff81eaf880 ffffffff820c42a8 0000000000000000 ffffffffffffffff
[59259.471245] $ 8 : 00000018008e38e4 0017d60fb8523b6c 000035e54fa50e00 0000000000000000
[59259.479205] $12 : 0000000000000000 fffffffffffffffe 0000000000000040 0000000000000000
[59259.487164] $16 : ffffffff81eaf840 ffffffff81eaf848 0000000000000000 ffffffffe72375c8
[59259.495124] $20 : 000000003b9aca00 0000000000000000 0000000000000000 00000000005521e4
[59259.503082] $24 : 0000000000000000 ffffffff81a04980
[59259.511042] $28 : 9800000456acc000 980000045a03bce0 ffffffff820c4238 ffffffff803c6a38
[59259.519003] Hi : 00000000083d5a39
[59259.522551] Lo : 3ee5eedcc78fdcef
[59259.526115] epc : ffffffff803c6a40 update_fast_timekeeper+0x40/0x80
[59259.532516] ra : ffffffff803c6a38 update_fast_timekeeper+0x38/0x80
[59259.538916] Status: 54004ce2 KX SX UX KERNEL EXL
[59259.543594] Cause : 1080000c (ExcCode 03)
[59259.547575] BadVA : 0000000001c87a44
[59259.551124] PrId : 0014c001 (ICT Loongson-3)
[59259.555451] Modules linked in: fuse sha256_generic cfg80211 rfkill vfat fat input_leds led_class serio_raw binfmt_misc ip_tables x_tables raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod amdgpu chash gpu_sched radeon
[59259.579866] Process Xorg (pid: 2217, threadinfo=00000000fb12c19c, task=0000000028c5cf56, tls=000000fff3febb80)
[59259.589808] Stack : 0000000000000000 ffffffff820c4238 0000000000000000 ffffffff803c6be8
[59259.597769] 0000000000000000 ffffffff820c0000 ffffffff820c0000 ffffffff803c7ea8
[59259.605728] 0000000000000002 ffffffff820c0000 ffffffff81e80000 0000000054004ce0
[59259.613687] ffffffff820c0000 ffffffff820c0000 98000000071ba5e0 9800000450eb0000
[59259.621646] c000000002539c80 0000000000001c90 9800000450eb36b4 ffffffffc03acac0
[59259.629604] 9800000450eb0000 c8ffe6836463ab34 98000000071cb4a0 000035e56f13bad0
[59259.637563] 9800000456acf9f0 0000000054004ce0 000035e56f13b5d2 ffffffff81f10000
[59259.645522] ffffffff803e2d40 ffffffff81e80000 98000000071cb4a0 ffffffff803e2aa4
[59259.653482] 98000000071cb4a0 ffffffff803e2d68 98000000071cafc0 98000000071caf80
[59259.661442] 000035e56f13b5d2 ffffffff803c349c 00000000000001d8 ffffffff81d5cc30
[59259.669403] ...
[59259.671831] Call Trace:
[59259.674257] [<ffffffff803c6a40>] update_fast_timekeeper+0x40/0x80
[59259.680318] [<ffffffff803c6be8>] +0x168/0x240
[59259.686202] [<ffffffff803c7ea8>] timekeeping_advance+0xba8/0x1180
[59259.692262] [<ffffffff803e2aa4>] tick_sched_do_timer+0x164/0x180
[59259.698232] [<ffffffff803e2d68>] tick_sched_timer+0x28/0x140
[59259.703857] [<ffffffff803c349c>] __hrtimer_run_queues+0x21c/0x380
[59259.709913] [<ffffffff803c3ad4>] hrtimer_interrupt+0x154/0x380
[59259.715715] [<ffffffff80210830>] stable_irq_handler+0xb0/0x180
[59259.721516] [<ffffffff80388098>] __handle_irq_event_percpu+0x98/0x300
[59259.727919] [<ffffffff80388328>] handle_irq_event_percpu+0x28/0x140
[59259.734148] [<ffffffff80393188>] handle_percpu_irq+0x88/0x100
2、分析过程
epc :exception program counter , 异常程序计数器, ra : return address 返回地址。
先打开System.map,linux下面操作。
vi System.map
/ffffffff803c6a
image.png
由上图分析,问题应该处在update_fast_timekeeper里面。
EPC的位置是0xffffffff803c6a40,update_fast_timekeeper的地址是0xffffffff803c6a00,那么应该是在update_fast_timekeeper里的(0xffffffff803c6a40 - 0xffffffff803c6a00 ) = 0x40偏移位置出了问题。
再反汇编vmlinux.o。
/opt/mips-loongson-gcc7.3-linux-gnu/2019.06-29/bin/mips-linux-gnu-objdump -dr vmlinux.o >> linux-dr
然后打开linux-dr 文件,找到update_fast_timekeeper的0x40偏移地址处。具体分析如下图:
addiu v0,v0,1导致出现异常。
再继续分析C代码,找到具体实现语句即可。
(完)
网友评论