美文网首页
5-Linux 内存溢出oom

5-Linux 内存溢出oom

作者: Creator_Ly | 来源:发表于2023-04-01 16:46 被阅读0次

    Linux内核为了提高内存的使用效率采用过度分配内存(over-commit memory)的办法,造成物理内存过度紧张进而触发OOM机制来杀死一些进程回收内存。该机制会监控那些占用内存过大,尤其是瞬间很快消耗大量内存的进程,为了防止内存耗尽会把该进程杀掉。

    1、oom过程

    out_of_memory函数的代码逻辑还是非常简单清晰的,总共有两步

    • 1.先选择一个要杀死的进程,
    • 2.杀死它。

    oom_kill_process函数的目的很简单,但是实现过程也有点复杂,这里就不展开分析了,大家可以自行去看一下代码。我们重点分析一下select_bad_process函数的逻辑,select_bad_process主要是依靠oom_score来进行进程选择的。

    我们先来看一下和oom_score有关的三个文件。

    • /proc//oom_score 系统计算出来的oom_score值,只读文件,取值范围0 –- 1000,0代表never kill,1000代表aways kill,值越大,进程被选中的概率越大。

    • /proc//oom_score_adj 让用户空间调节oom_score的接口,root可读写,取值范围 -1000 --- 1000,默认为0,若为 -1000,则oom_score加上此值一定小于等于0,从而变成never kill进程。OS可以把一些关键的系统进程的oom_score_adj设为-1000,从而避免被oom kill。

    • /proc//oom_adj 旧的接口文件,为兼容而保留,root可读写,取值范围 -16 — 15,会被线性映射到oom_score_adj,特殊值 -17代表 OOM_DISABLE。大家尽量不要再用此接口。

    Linux内存管理 (21)OOM:https://www.cnblogs.com/arnoldlu/p/8567559.html

    2、oom配置

    2.1 /proc/sys/vm/overcommit_memory

    内核参数 vm.overcommit_memory 接受三种取值:

    • 0 – Heuristic overcommit handling. 这是缺省值,它允许overcommit,但过于明目张胆的overcommit会被拒绝,比如malloc一次性申请的内存大小就超过了系统总内存。Heuristic的意思是“试探式的”,内核利用某种算法猜测你的内存申请是否合理,它认为不合理就会拒绝overcommit。
    • 1 – Always overcommit. 允许overcommit,对内存申请来者不拒。内核执行无内存过量使用处理。使用这个设置会增大内存超载的可能性,但也可以增强大量使用内存任务的性能。
    • 2 – Don’t overcommit. 禁止overcommit。 内存拒绝等于或者大于总可用 swap 大小以及 overcommit_ratio 指定的物理 RAM 比例的内存请求。如果您希望减小内存过度使用的风险,这个设置就是最好的。

    遇到问题,线程里面一直fork()执行内容再退出之后,会出现fork失败。

    由于fork的时候会拷贝一份父进程的内存,所以如果一开始正常fork到后面fork失败,那就说明一定是这个进程有内存泄露问题,因为内存泄露导致进程内存变大,fork的时候拷贝的父进程内存也就更大,导致内存不足,fork失败。

    linux - fork() 因内存不足错误而失败:https://www.coder.work/article/167298

    2.2 /proc/sys/vm/panic_on_oom

    决定系统出现oom的时候,要做的操作。接受的三种取值如下:

    • 0 - 默认值,当出现oom的时候,触发oom killer
    • 1 - 程序在有cpuset、memory policy、memcg的约束情况下的OOM,可以考虑不panic,而是启动OOM killer。其它情况触发 kernel panic,即系统直接重启
    • 2 - 当出现oom,直接触发kernel panic,即系统直接重启
    2.3 /proc/sys/vm/min_free_kbytes
    • 代表系统所保留空闲内存的最低限。

    min_free_kbytes设的越大,watermark的线越高,同时三个线之间的buffer量也相应会增加。这意味着会较早的启动kswapd进行回收,且会回收上来较多的内存(直至watermark[high]才会停止),这会使得系统预留过多的空闲内存,从而在一定程度上降低了应用程序可使用的内存量。极端情况下设置min_free_kbytes接近内存大小时,留给应用程序的内存就会太少而可能会频繁地导致OOM的发生。

    min_free_kbytes设的过小,则会导致系统预留内存过小。kswapd回收的过程中也会有少量的内存分配行为(会设上PF_MEMALLOC)标志,这个标志会允许kswapd使用预留内存;另外一种情况是被OOM选中杀死的进程在退出过程中,如果需要申请内存也可以使用预留部分。这两种情况下让他们使用预留内存可以避免系统进入deadlock状态。

    3.vm参数推荐配置

    # vm
    vm.min_free_kbytes=4096
    vm.vfs_cache_pressure=200
    vm.dirty_background_ratio=5
    vm.dirty_ratio=10
    vm.dirty_expire_centisecs=500
    vm.dirty_writeback_centisecs=200
    vm.extfrag_threshold=10
    vm.panic_on_oom=1
    

    /proc/sys/vm/ 文档介绍:https://zhuanlan.zhihu.com/p/503579974?utm_id=0

    五万字 | 深入理解Linux内存管理:https://mp.weixin.qq.com/s/nlMGEhuaDUYqV6r8A4cRlA

    记一次linux oom内存溢出排查过程:https://blog.csdn.net/hu_jinghui/article/details/81740575

    4.都有内存回收机制了,为啥还会oom

    • 后台内存回收(kswapd)
    • 直接内存回收(direct reclaim)
    • OOM机制(Out of Memory)

    kswapd 是一个内核线程,在内存不足时负责在后台进行内存回收,这个过程发生在后台,因此是异步发生,不会阻塞进程。

    • 当内存大于 pages_low 时,表示此时系统内存足够,不会进行内存回收。
    • 当内存小于 pages_low 时,表示此时内存存在压力,会触发 kswapd0 进行后台内存回收,直到 pages_high 为止。
    • 当内存小于 pages_min 时,表示此时用户内存耗尽,会触发直接内存回收,进程被阻塞。

    如果要调整 kswapd 的触发时机,需要修改 pages_low 的值,而pages_low的值由pages_min计算,因此需要修改pages_min

    相关参数调整

    1、内核参数vm.swappiness,决定回收缓存或swap机制回收内存的倾向取值范围是 0-100,

    • 数值越大,越积极使用 Swap,也就是更倾向于回收匿名页;
    • 数值越小,越消极使用 Swap,也就是更倾向于回收文件页。
    • 0不代表不使用swap,当剩余内存 + 文件页小于页高阈值时,还是会发生 Swap。

    2、内核参数vm.min_free_kbytes,调整内存水位

    • pages_min = min_free_kbytes换算为page单位,
    • pages_low = pages_min*5/4
    • pages_high = pages_min*3/2

    所以当内存消耗的速度比回收的机制快的时候就会直接触发oom,比如dhcp flood驱动没有限制住流量的时候。或者就是内存都被消耗了,实在释放不出来了,那也只能oom。

    【Linux内核】内存管理——内存回收机制:https://blog.csdn.net/weixin_45636061/article/details/127184818

    出现oom实际日志:

    [940027.316884] PTK:be634a2126f098a15a84324cb8fdc92cd05288ac3566b4d14290511de834cbc682d1ff2f7b0ca76ce0e9f956265658f3a8f5e6941538e457bbd020b27733e244
    [940027.330955] peer_msg4: 33781 usec
    [940029.263407] ACT - SendBSS2040CoexistMgmtAction(BSSCoexist2040=0x4)
    [940029.270498] hw_ctrl_flow_v2_peer_update: wdev_idx=0
    [940030.898046] p1905_managerd invoked oom-killer: gfp_mask=0x2420848, order=0, oom_score_adj=0
    [940030.906882] CPU: 0 PID: 14089 Comm: p1905_managerd Not tainted 4.4.198 #0
    [940030.913781] Stack : 81d168c2 0000003d 00000000 00000000 81d168c2 00000000 00000000 00000000
    [940030.913781]       8358ecf4 819086e3 817d4110 00000000 00003709 81d13690 00000003 00000001
    [940030.913781]       819087e0 8107446c 00000000 00000004 00000006 00000000 817db220 857cb87c
    [940030.913781]       00000000 8107213c 81d168c2 0000004f 00000019 0014f000 81d168c2 007cb87c
    [940030.913781]       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [940030.913781]       ...
    [940030.949718] Call Trace:
    [940030.952288] [<810188b4>] show_stack+0x54/0x88
    [940030.956791] [<811e4bd4>] dump_stack+0x8c/0xc4
    [940030.961275] [<810b17d8>] dump_header.isra.4+0x54/0x19c
    [940030.966546] [<810b1d68>] oom_kill_process+0xf0/0x578
    [940030.971643] [<810b2504>] out_of_memory+0x314/0x3a8
    [940030.976555] [<810b67e0>] __alloc_pages_nodemask+0x7dc/0x824
    [940030.982249] [<810ae69c>] pagecache_get_page+0x1b8/0x288
    [940030.987656] [<81122114>] __getblk_slow+0x1ac/0x3e0
    [940030.992635] [<81153e80>] squashfs_bio_submit+0x1e4/0x5c8
    [940030.998132] [<81154938>] __squashfs_read_data+0x264/0x298
    [940031.003710] [<81154a40>] squashfs_read_data_async+0x28/0x34
    [940031.009470] [<81158cd0>] squashfs_readpages_block+0x330/0x378
    [940031.015404] [<811565c4>] __squashfs_readpages.isra.5+0x6d4/0x914
    [940031.021595] [<81156824>] squashfs_readpages+0x20/0x30
    [940031.026829] [<810bab20>] __do_page_cache_readahead+0x1a8/0x288
    [940031.032860] [<810b0a1c>] filemap_fault+0x1e0/0x500
    [940031.037834] [<810d04fc>] __do_fault+0x64/0xd4
    [940031.042371] [<810d3f60>] handle_mm_fault+0x544/0xdfc
    [940031.047546] [<81021c80>] __do_page_fault+0x138/0x48c
    [940031.052689] [<81005420>] ret_from_exception+0x0/0x10
    [940031.057788] 
    [940031.060888] Mem-Info:
    [940031.063598] active_anon:1071 inactive_anon:18 isolated_anon:0
    [940031.063598]  active_file:431 inactive_file:482 isolated_file:70
    [940031.063598]  unevictable:0 dirty:0 writeback:0 unstable:0
    [940031.063598]  slab_reclaimable:491 slab_unreclaimable:4198
    [940031.063598]  mapped:222 shmem:27 pagetables:110 bounce:0
    [940031.063598]  free:4122 free_pcp:0 free_cma:0
    [940031.096924] DMA free:2744kB min:2304kB low:2880kB high:3456kB active_anon:380kB inactive_anon:16kB active_file:200kB inactive_file:236kB unevictable:0kB isolated(anon):0kB isolated(file):100kB present:16384kB managed:16380kB mlocked:0kB dirty:0kB writeback:0kB mapped:40kB shmem:20kB slab_reclaimable:396kB slab_unreclaimable:2520kB kernel_stack:104kB pagetables:44kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
    [940031.140948] lowmem_reserve[]: 0 97 97
    [940031.145100] Normal free:13712kB min:14076kB low:17592kB high:21112kB active_anon:3904kB inactive_anon:56kB active_file:1556kB inactive_file:1488kB unevictable:0kB isolated(anon):0kB isolated(file):352kB present:114688kB managed:99936kB mlocked:0kB dirty:0kB writeback:0kB mapped:792kB shmem:88kB slab_reclaimable:1592kB slab_unreclaimable:14292kB kernel_stack:760kB pagetables:396kB unstable:0kB bounce:0kB free_pcp:100kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:376 all_unreclaimable? no
    [940031.190330] lowmem_reserve[]: 0 0 0
    [940031.194458] DMA: 94*4kB (U) 76*8kB (UM) 49*16kB (U) 29*32kB (U) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2760kB
    [940031.207537] Normal: 345*4kB (UME) 464*8kB (UM) 269*16kB (UM) 59*32kB (M) 28*64kB (UM) 7*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13972kB
    [940031.221806] 987 total pagecache pages
    [940031.225766] 0 pages in swap cache
    [940031.229374] Swap cache stats: add 0, delete 0, find 0/0
    [940031.234878] Free swap  = 0kB
    [940031.238101] Total swap = 0kB
    [940031.241108] 32768 pages RAM
    [940031.244015] 0 pages HighMem/MovableOnly
    [940031.248030] 3689 pages reserved
    [940031.251292] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
    [940031.260252] [  654]     0   654      297       10       4       0        0             0 ubusd
    [940031.269171] [  655]     0   655      225        6       3       0        0             0 askfirst
    [940031.278300] [ 1277]     0  1277      548      253       4       0        0             0 logd
    [940031.287161] [ 1309]     0  1309      362       13       3       0        0             0 rpcd
    [940031.295852] [ 1453]     0  1453      364       41       3       0        0             0 zmsg
    [940031.304628] [ 1542]     0  1542      422       34       3       0        0             0 netifd
    [940031.313645] [ 1566]     0  1566      351       11       3       0        0             0 odhcpd
    [940031.322644] [ 1660]     0  1660      305       22       3       0        0             0 udhcpc
    [940031.331616] [ 1672]     0  1672      305       22       4       0        0             0 udhcpc
    [940031.340639] [ 1684]     0  1684      256        8       3       0        0             0 odhcp6c
    [940031.349639] [ 1788]     0  1788      267        7       3       0        0             0 dropbear
    [940031.358853] [ 1804]     0  1804      256        8       3       0        0             0 odhcp6c
    [940031.367987] [ 1891]     0  1891      381       28       3       0        0             0 uhttpd
    [940031.376973] [ 1933]     0  1933      357       12       4       0        0             0 zdisplay
    [940031.386896] [ 1979]     0  1979      635       11       4       0        0             0 zdtool
    [940031.396188] [ 2013]     0  2013      879       62       5       0        0             0 zmqttproxy
    [940031.405512] [ 2031]     0  2031      311       29       4       0        0             0 zihomescript.sh
    [940031.415776] [ 2059]     0  2059     1088       79       4       0        0             0 zapclient
    [940031.425109] [ 2160]     0  2160      401       14       4       0        0             0 zdetect
    [940031.434347] [ 2185]     0  2185      305        8       3       0        0             0 telnetd
    [940031.443588] [ 2207]     0  2207      384       36       3       0        0             0 zdetect
    [940031.452739] [ 2272]   453  2272      328       10       4       0        0             0 dnsmasq
    [940031.461824] [14087]     0 14087      845       66       5       0        0             0 wapp
    [940031.470593] [14088]     0 14088      820       52       5       0        0             0 wapp
    [940031.479403] [14089]     0 14089      523       51       4       0        0             0 p1905_managerd
    [940031.489111] [14090]     0 14090      470       47       4       0        0             0 mapd
    [940031.498003] [31857]     0 31857      228        9       4       0        0             0 htpdate
    [940031.507119] [26573]     0 26573      311       15       4       0        0             0 zihomescript.sh
    [940031.516833] [26576]     0 26576      446       12       3       0        0             0 curl
    [940031.525562] Out of memory: Kill process 1277 (logd) score 8 or sacrifice child
    [940031.533175] Killed process 1277 (logd) total-vm:2192kB, anon-rss:1012kB, file-rss:0kB
    [940031.677537] wapp invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
    [940031.685326] CPU: 0 PID: 14087 Comm: wapp Not tainted 4.4.198 #0
    [940031.691370] Stack : 81d168c2 00000033 00000000 00000000 81d168c2 00000000 00000000 00000000
    [940031.691370]       863bbdf4 819086e3 817d4110 00000000 00003707 81d13690 00000003 00000001
    [940031.691370]       819087e0 8107446c 00000000 00000004 00000006 00000000 817db220 83683b6c
    [940031.691370]       00000000 8107213c 81d168c2 00000045 83683b50 00000000 81d168c2 00683b6c
    [940031.691370]       00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    [940031.691370]       ...
    [940031.727377] Call Trace:
    [940031.729959] [<810188b4>] show_stack+0x54/0x88
    [940031.734459] [<811e4bd4>] dump_stack+0x8c/0xc4
    [940031.738967] [<810b17d8>] dump_header.isra.4+0x54/0x19c
    [940031.744237] [<810b1d68>] oom_kill_process+0xf0/0x578
    [940031.749354] [<810b2504>] out_of_memory+0x314/0x3a8
    [940031.754300] [<810b66f0>] __alloc_pages_nodemask+0x6ec/0x824
    [940031.760036] [<810b0b98>] filemap_fault+0x35c/0x500
    [940031.764980] [<810d04fc>] __do_fault+0x64/0xd4
    [940031.769475] [<810d3f60>] handle_mm_fault+0x544/0xdfc
    [940031.774577] [<81021c80>] __do_page_fault+0x138/0x48c
    [940031.779662] [<81005420>] ret_from_exception+0x0/0x10
    [940031.784735] 
    [940039.905445] Mem-Info:
    [940039.907867] active_anon:415 inactive_anon:18 isolated_anon:0
    [940039.907867]  active_file:376 inactive_file:532 isolated_file:32
    [940039.907867]  unevictable:0 dirty:0 writeback:0 unstable:0
    [940039.907867]  slab_reclaimable:478 slab_unreclaimable:4239
    [940039.907867]  mapped:144 shmem:27 pagetables:73 bounce:0
    [940039.907867]  free:3638 free_pcp:0 free_cma:0
    [940039.940722] DMA free:2568kB min:2304kB low:2880kB high:3456kB active_anon:216kB inactive_anon:16kB active_file:528kB inactive_file:784kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:16384kB managed:16380kB mlocked:0kB dirty:0kB writeback:0kB mapped:252kB shmem:20kB slab_reclaimable:404kB slab_unreclaimable:2516kB kernel_stack:96kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:156 all_unreclaimable? no
    [940039.984232] lowmem_reserve[]: 0 97 97
    [940039.988439] Normal free:11836kB min:14076kB low:17592kB high:21112kB active_anon:1444kB inactive_anon:56kB active_file:976kB inactive_file:1516kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:114688kB managed:99936kB mlocked:0kB dirty:0kB writeback:0kB mapped:324kB shmem:88kB slab_reclaimable:1508kB slab_unreclaimable:14440kB kernel_stack:696kB pagetables:260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1840 all_unreclaimable? no
    [940040.033091] lowmem_reserve[]: 0 0 0
    [940040.036744] DMA: 143*4kB (UM) 116*8kB (U) 74*16kB (UM) 3*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2780kB
    [940040.050026] Normal: 569*4kB (UME) 439*8kB (UM) 286*16kB (UM) 8*32kB (UM) 15*64kB (UM) 2*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 11836kB
    [940040.064766] 909 total pagecache pages
    [940040.068584] 0 pages in swap cache
    [940040.072334] Swap cache stats: add 0, delete 0, find 0/0
    [940040.078136] Free swap  = 0kB
    [940040.081589] Total swap = 0kB
    [940040.084587] 32768 pages RAM
    [940040.087964] 0 pages HighMem/MovableOnly
    [940040.092312] 3689 pages reserved
    [940040.096087] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
    [940040.105230] [  654]     0   654      297       10       4       0        0             0 ubusd
    [940040.114380] [  655]     0   655      225        6       3       0        0             0 askfirst
    [940040.123882] [ 1309]     0  1309      362       13       3       0        0             0 rpcd
    [940040.132961] [ 1453]     0  1453      364       27       3       0        0             0 zmsg
    [940040.141654] [ 1542]     0  1542      422       34       3       0        0             0 netifd
    [940040.150504] [ 1566]     0  1566      351       45       3       0        0             0 odhcpd
    [940040.160038] [ 1660]     0  1660      305       22       3       0        0             0 udhcpc
    [940040.169236] [ 1672]     0  1672      305       22       4       0        0             0 udhcpc
    [940040.179054] [ 1684]     0  1684      256        8       3       0        0             0 odhcp6c
    [940040.188399] [ 1788]     0  1788      267        7       3       0        0             0 dropbear
    [940040.197855] [ 1804]     0  1804      256        8       3       0        0             0 odhcp6c
    [940040.207217] [ 1891]     0  1891      381       28       3       0        0             0 uhttpd
    [940040.216402] [ 1933]     0  1933      357       21       4       0        0             0 zdisplay
    [940040.225873] [ 1979]     0  1979      635       11       4       0        0             0 zdtool
    [940040.235113] [ 2031]     0  2031      311       29       4       0        0             0 zihomescript.sh
    [940040.245101] [ 2185]     0  2185      305        8       3       0        0             0 telnetd
    [940040.254072] [ 2272]   453  2272      328       10       4       0        0             0 dnsmasq
    [940040.263456] [14090]     0 14090      470       52       4       0        0             0 mapd
    [940040.272473] [31857]     0 31857      228        9       4       0        0             0 htpdate
    [940040.281790] [26573]     0 26573      311       15       4       0        0             0 zihomescript.sh
    [940040.291889] Out of memory: Kill process 14090 (mapd) score 1 or sacrifice child
    [940040.299716] Killed process 14090 (mapd) total-vm:1880kB, anon-rss:208kB, file-rss:0kB
    [940047.227817] MacTableDeleteEntry(): wcid 8 =====
    [940047.238415] hw_ctrl_flow_v2_disconnt_act: wdev_idx=3
    [940085.994632] ACT - SendBSS2040CoexistMgmtAction(BSSCoexist2040=0x6)
    [940086.001068] hw_ctrl_flow_v2_peer_update: wdev_idx=0
    [940224.529243] ACT - SendBSS2040CoexistMgmtAction(BSSCoexist2040=0x4)
    [940224.536168] hw_ctrl_flow_v2_peer_update: wdev_idx=0
    [940276.725992] peer_auth_req: 44 usec
    [940276.732087] IE_WLAN_EXTENSION: no handler for extension_id:35
    [940276.738337] @@@ ap_cmm_peer_assoc_req_action(): (wcid=24), HTC_ICVErrCnt(0), HTC_AAD_OM_Freeze(0), HTC_AAD_OM_CountDown(0),  HTC_AAD_OM_Freeze(1) is in Asso. stage!
    [940276.753598] add he assoc_rsp, len=59
    

    相关文章

      网友评论

          本文标题:5-Linux 内存溢出oom

          本文链接:https://www.haomeiwen.com/subject/quzkddtx.html