Linux内核为了提高内存的使用效率采用过度分配内存(over-commit memory)的办法,造成物理内存过度紧张进而触发OOM机制来杀死一些进程回收内存。该机制会监控那些占用内存过大,尤其是瞬间很快消耗大量内存的进程,为了防止内存耗尽会把该进程杀掉。
1、oom过程
out_of_memory函数的代码逻辑还是非常简单清晰的,总共有两步
- 1.先选择一个要杀死的进程,
- 2.杀死它。
oom_kill_process函数的目的很简单,但是实现过程也有点复杂,这里就不展开分析了,大家可以自行去看一下代码。我们重点分析一下select_bad_process函数的逻辑,select_bad_process主要是依靠oom_score来进行进程选择的。
我们先来看一下和oom_score有关的三个文件。
-
/proc//oom_score 系统计算出来的oom_score值,只读文件,取值范围0 –- 1000,0代表never kill,1000代表aways kill,值越大,进程被选中的概率越大。
-
/proc//oom_score_adj 让用户空间调节oom_score的接口,root可读写,取值范围 -1000 --- 1000,默认为0,若为 -1000,则oom_score加上此值一定小于等于0,从而变成never kill进程。OS可以把一些关键的系统进程的oom_score_adj设为-1000,从而避免被oom kill。
-
/proc//oom_adj 旧的接口文件,为兼容而保留,root可读写,取值范围 -16 — 15,会被线性映射到oom_score_adj,特殊值 -17代表 OOM_DISABLE。大家尽量不要再用此接口。
Linux内存管理 (21)OOM:https://www.cnblogs.com/arnoldlu/p/8567559.html
2、oom配置
2.1 /proc/sys/vm/overcommit_memory
内核参数 vm.overcommit_memory 接受三种取值:
- 0 – Heuristic overcommit handling. 这是缺省值,它允许overcommit,但过于明目张胆的overcommit会被拒绝,比如malloc一次性申请的内存大小就超过了系统总内存。Heuristic的意思是“试探式的”,内核利用某种算法猜测你的内存申请是否合理,它认为不合理就会拒绝overcommit。
- 1 – Always overcommit. 允许overcommit,对内存申请来者不拒。内核执行无内存过量使用处理。使用这个设置会增大内存超载的可能性,但也可以增强大量使用内存任务的性能。
- 2 – Don’t overcommit. 禁止overcommit。 内存拒绝等于或者大于总可用 swap 大小以及 overcommit_ratio 指定的物理 RAM 比例的内存请求。如果您希望减小内存过度使用的风险,这个设置就是最好的。
遇到问题,线程里面一直fork()执行内容再退出之后,会出现fork失败。
由于fork的时候会拷贝一份父进程的内存,所以如果一开始正常fork到后面fork失败,那就说明一定是这个进程有内存泄露问题,因为内存泄露导致进程内存变大,fork的时候拷贝的父进程内存也就更大,导致内存不足,fork失败。
linux - fork() 因内存不足错误而失败:https://www.coder.work/article/167298
2.2 /proc/sys/vm/panic_on_oom
决定系统出现oom的时候,要做的操作。接受的三种取值如下:
- 0 - 默认值,当出现oom的时候,触发oom killer
- 1 - 程序在有cpuset、memory policy、memcg的约束情况下的OOM,可以考虑不panic,而是启动OOM killer。其它情况触发 kernel panic,即系统直接重启
- 2 - 当出现oom,直接触发kernel panic,即系统直接重启
2.3 /proc/sys/vm/min_free_kbytes
- 代表系统所保留空闲内存的最低限。
min_free_kbytes设的越大,watermark的线越高,同时三个线之间的buffer量也相应会增加。这意味着会较早的启动kswapd进行回收,且会回收上来较多的内存(直至watermark[high]才会停止),这会使得系统预留过多的空闲内存,从而在一定程度上降低了应用程序可使用的内存量。极端情况下设置min_free_kbytes接近内存大小时,留给应用程序的内存就会太少而可能会频繁地导致OOM的发生。
min_free_kbytes设的过小,则会导致系统预留内存过小。kswapd回收的过程中也会有少量的内存分配行为(会设上PF_MEMALLOC)标志,这个标志会允许kswapd使用预留内存;另外一种情况是被OOM选中杀死的进程在退出过程中,如果需要申请内存也可以使用预留部分。这两种情况下让他们使用预留内存可以避免系统进入deadlock状态。
3.vm参数推荐配置
# vm
vm.min_free_kbytes=4096
vm.vfs_cache_pressure=200
vm.dirty_background_ratio=5
vm.dirty_ratio=10
vm.dirty_expire_centisecs=500
vm.dirty_writeback_centisecs=200
vm.extfrag_threshold=10
vm.panic_on_oom=1
/proc/sys/vm/ 文档介绍:https://zhuanlan.zhihu.com/p/503579974?utm_id=0
五万字 | 深入理解Linux内存管理:https://mp.weixin.qq.com/s/nlMGEhuaDUYqV6r8A4cRlA
记一次linux oom内存溢出排查过程:https://blog.csdn.net/hu_jinghui/article/details/81740575
4.都有内存回收机制了,为啥还会oom
- 后台内存回收(kswapd)
- 直接内存回收(direct reclaim)
- OOM机制(Out of Memory)
kswapd 是一个内核线程,在内存不足时负责在后台进行内存回收,这个过程发生在后台,因此是异步发生,不会阻塞进程。
- 当内存大于 pages_low 时,表示此时系统内存足够,不会进行内存回收。
- 当内存小于 pages_low 时,表示此时内存存在压力,会触发 kswapd0 进行后台内存回收,直到 pages_high 为止。
- 当内存小于 pages_min 时,表示此时用户内存耗尽,会触发直接内存回收,进程被阻塞。
如果要调整 kswapd 的触发时机,需要修改 pages_low 的值,而pages_low的值由pages_min计算,因此需要修改pages_min。
相关参数调整
1、内核参数vm.swappiness,决定回收缓存或swap机制回收内存的倾向取值范围是 0-100,
- 数值越大,越积极使用 Swap,也就是更倾向于回收匿名页;
- 数值越小,越消极使用 Swap,也就是更倾向于回收文件页。
- 0不代表不使用swap,当剩余内存 + 文件页小于页高阈值时,还是会发生 Swap。
2、内核参数vm.min_free_kbytes,调整内存水位
- pages_min = min_free_kbytes换算为page单位,
- pages_low = pages_min*5/4
- pages_high = pages_min*3/2
所以当内存消耗的速度比回收的机制快的时候就会直接触发oom,比如dhcp flood驱动没有限制住流量的时候。或者就是内存都被消耗了,实在释放不出来了,那也只能oom。
【Linux内核】内存管理——内存回收机制:https://blog.csdn.net/weixin_45636061/article/details/127184818
出现oom实际日志:
[940027.316884] PTK:be634a2126f098a15a84324cb8fdc92cd05288ac3566b4d14290511de834cbc682d1ff2f7b0ca76ce0e9f956265658f3a8f5e6941538e457bbd020b27733e244
[940027.330955] peer_msg4: 33781 usec
[940029.263407] ACT - SendBSS2040CoexistMgmtAction(BSSCoexist2040=0x4)
[940029.270498] hw_ctrl_flow_v2_peer_update: wdev_idx=0
[940030.898046] p1905_managerd invoked oom-killer: gfp_mask=0x2420848, order=0, oom_score_adj=0
[940030.906882] CPU: 0 PID: 14089 Comm: p1905_managerd Not tainted 4.4.198 #0
[940030.913781] Stack : 81d168c2 0000003d 00000000 00000000 81d168c2 00000000 00000000 00000000
[940030.913781] 8358ecf4 819086e3 817d4110 00000000 00003709 81d13690 00000003 00000001
[940030.913781] 819087e0 8107446c 00000000 00000004 00000006 00000000 817db220 857cb87c
[940030.913781] 00000000 8107213c 81d168c2 0000004f 00000019 0014f000 81d168c2 007cb87c
[940030.913781] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[940030.913781] ...
[940030.949718] Call Trace:
[940030.952288] [<810188b4>] show_stack+0x54/0x88
[940030.956791] [<811e4bd4>] dump_stack+0x8c/0xc4
[940030.961275] [<810b17d8>] dump_header.isra.4+0x54/0x19c
[940030.966546] [<810b1d68>] oom_kill_process+0xf0/0x578
[940030.971643] [<810b2504>] out_of_memory+0x314/0x3a8
[940030.976555] [<810b67e0>] __alloc_pages_nodemask+0x7dc/0x824
[940030.982249] [<810ae69c>] pagecache_get_page+0x1b8/0x288
[940030.987656] [<81122114>] __getblk_slow+0x1ac/0x3e0
[940030.992635] [<81153e80>] squashfs_bio_submit+0x1e4/0x5c8
[940030.998132] [<81154938>] __squashfs_read_data+0x264/0x298
[940031.003710] [<81154a40>] squashfs_read_data_async+0x28/0x34
[940031.009470] [<81158cd0>] squashfs_readpages_block+0x330/0x378
[940031.015404] [<811565c4>] __squashfs_readpages.isra.5+0x6d4/0x914
[940031.021595] [<81156824>] squashfs_readpages+0x20/0x30
[940031.026829] [<810bab20>] __do_page_cache_readahead+0x1a8/0x288
[940031.032860] [<810b0a1c>] filemap_fault+0x1e0/0x500
[940031.037834] [<810d04fc>] __do_fault+0x64/0xd4
[940031.042371] [<810d3f60>] handle_mm_fault+0x544/0xdfc
[940031.047546] [<81021c80>] __do_page_fault+0x138/0x48c
[940031.052689] [<81005420>] ret_from_exception+0x0/0x10
[940031.057788]
[940031.060888] Mem-Info:
[940031.063598] active_anon:1071 inactive_anon:18 isolated_anon:0
[940031.063598] active_file:431 inactive_file:482 isolated_file:70
[940031.063598] unevictable:0 dirty:0 writeback:0 unstable:0
[940031.063598] slab_reclaimable:491 slab_unreclaimable:4198
[940031.063598] mapped:222 shmem:27 pagetables:110 bounce:0
[940031.063598] free:4122 free_pcp:0 free_cma:0
[940031.096924] DMA free:2744kB min:2304kB low:2880kB high:3456kB active_anon:380kB inactive_anon:16kB active_file:200kB inactive_file:236kB unevictable:0kB isolated(anon):0kB isolated(file):100kB present:16384kB managed:16380kB mlocked:0kB dirty:0kB writeback:0kB mapped:40kB shmem:20kB slab_reclaimable:396kB slab_unreclaimable:2520kB kernel_stack:104kB pagetables:44kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[940031.140948] lowmem_reserve[]: 0 97 97
[940031.145100] Normal free:13712kB min:14076kB low:17592kB high:21112kB active_anon:3904kB inactive_anon:56kB active_file:1556kB inactive_file:1488kB unevictable:0kB isolated(anon):0kB isolated(file):352kB present:114688kB managed:99936kB mlocked:0kB dirty:0kB writeback:0kB mapped:792kB shmem:88kB slab_reclaimable:1592kB slab_unreclaimable:14292kB kernel_stack:760kB pagetables:396kB unstable:0kB bounce:0kB free_pcp:100kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:376 all_unreclaimable? no
[940031.190330] lowmem_reserve[]: 0 0 0
[940031.194458] DMA: 94*4kB (U) 76*8kB (UM) 49*16kB (U) 29*32kB (U) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2760kB
[940031.207537] Normal: 345*4kB (UME) 464*8kB (UM) 269*16kB (UM) 59*32kB (M) 28*64kB (UM) 7*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 13972kB
[940031.221806] 987 total pagecache pages
[940031.225766] 0 pages in swap cache
[940031.229374] Swap cache stats: add 0, delete 0, find 0/0
[940031.234878] Free swap = 0kB
[940031.238101] Total swap = 0kB
[940031.241108] 32768 pages RAM
[940031.244015] 0 pages HighMem/MovableOnly
[940031.248030] 3689 pages reserved
[940031.251292] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[940031.260252] [ 654] 0 654 297 10 4 0 0 0 ubusd
[940031.269171] [ 655] 0 655 225 6 3 0 0 0 askfirst
[940031.278300] [ 1277] 0 1277 548 253 4 0 0 0 logd
[940031.287161] [ 1309] 0 1309 362 13 3 0 0 0 rpcd
[940031.295852] [ 1453] 0 1453 364 41 3 0 0 0 zmsg
[940031.304628] [ 1542] 0 1542 422 34 3 0 0 0 netifd
[940031.313645] [ 1566] 0 1566 351 11 3 0 0 0 odhcpd
[940031.322644] [ 1660] 0 1660 305 22 3 0 0 0 udhcpc
[940031.331616] [ 1672] 0 1672 305 22 4 0 0 0 udhcpc
[940031.340639] [ 1684] 0 1684 256 8 3 0 0 0 odhcp6c
[940031.349639] [ 1788] 0 1788 267 7 3 0 0 0 dropbear
[940031.358853] [ 1804] 0 1804 256 8 3 0 0 0 odhcp6c
[940031.367987] [ 1891] 0 1891 381 28 3 0 0 0 uhttpd
[940031.376973] [ 1933] 0 1933 357 12 4 0 0 0 zdisplay
[940031.386896] [ 1979] 0 1979 635 11 4 0 0 0 zdtool
[940031.396188] [ 2013] 0 2013 879 62 5 0 0 0 zmqttproxy
[940031.405512] [ 2031] 0 2031 311 29 4 0 0 0 zihomescript.sh
[940031.415776] [ 2059] 0 2059 1088 79 4 0 0 0 zapclient
[940031.425109] [ 2160] 0 2160 401 14 4 0 0 0 zdetect
[940031.434347] [ 2185] 0 2185 305 8 3 0 0 0 telnetd
[940031.443588] [ 2207] 0 2207 384 36 3 0 0 0 zdetect
[940031.452739] [ 2272] 453 2272 328 10 4 0 0 0 dnsmasq
[940031.461824] [14087] 0 14087 845 66 5 0 0 0 wapp
[940031.470593] [14088] 0 14088 820 52 5 0 0 0 wapp
[940031.479403] [14089] 0 14089 523 51 4 0 0 0 p1905_managerd
[940031.489111] [14090] 0 14090 470 47 4 0 0 0 mapd
[940031.498003] [31857] 0 31857 228 9 4 0 0 0 htpdate
[940031.507119] [26573] 0 26573 311 15 4 0 0 0 zihomescript.sh
[940031.516833] [26576] 0 26576 446 12 3 0 0 0 curl
[940031.525562] Out of memory: Kill process 1277 (logd) score 8 or sacrifice child
[940031.533175] Killed process 1277 (logd) total-vm:2192kB, anon-rss:1012kB, file-rss:0kB
[940031.677537] wapp invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[940031.685326] CPU: 0 PID: 14087 Comm: wapp Not tainted 4.4.198 #0
[940031.691370] Stack : 81d168c2 00000033 00000000 00000000 81d168c2 00000000 00000000 00000000
[940031.691370] 863bbdf4 819086e3 817d4110 00000000 00003707 81d13690 00000003 00000001
[940031.691370] 819087e0 8107446c 00000000 00000004 00000006 00000000 817db220 83683b6c
[940031.691370] 00000000 8107213c 81d168c2 00000045 83683b50 00000000 81d168c2 00683b6c
[940031.691370] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[940031.691370] ...
[940031.727377] Call Trace:
[940031.729959] [<810188b4>] show_stack+0x54/0x88
[940031.734459] [<811e4bd4>] dump_stack+0x8c/0xc4
[940031.738967] [<810b17d8>] dump_header.isra.4+0x54/0x19c
[940031.744237] [<810b1d68>] oom_kill_process+0xf0/0x578
[940031.749354] [<810b2504>] out_of_memory+0x314/0x3a8
[940031.754300] [<810b66f0>] __alloc_pages_nodemask+0x6ec/0x824
[940031.760036] [<810b0b98>] filemap_fault+0x35c/0x500
[940031.764980] [<810d04fc>] __do_fault+0x64/0xd4
[940031.769475] [<810d3f60>] handle_mm_fault+0x544/0xdfc
[940031.774577] [<81021c80>] __do_page_fault+0x138/0x48c
[940031.779662] [<81005420>] ret_from_exception+0x0/0x10
[940031.784735]
[940039.905445] Mem-Info:
[940039.907867] active_anon:415 inactive_anon:18 isolated_anon:0
[940039.907867] active_file:376 inactive_file:532 isolated_file:32
[940039.907867] unevictable:0 dirty:0 writeback:0 unstable:0
[940039.907867] slab_reclaimable:478 slab_unreclaimable:4239
[940039.907867] mapped:144 shmem:27 pagetables:73 bounce:0
[940039.907867] free:3638 free_pcp:0 free_cma:0
[940039.940722] DMA free:2568kB min:2304kB low:2880kB high:3456kB active_anon:216kB inactive_anon:16kB active_file:528kB inactive_file:784kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:16384kB managed:16380kB mlocked:0kB dirty:0kB writeback:0kB mapped:252kB shmem:20kB slab_reclaimable:404kB slab_unreclaimable:2516kB kernel_stack:96kB pagetables:32kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:156 all_unreclaimable? no
[940039.984232] lowmem_reserve[]: 0 97 97
[940039.988439] Normal free:11836kB min:14076kB low:17592kB high:21112kB active_anon:1444kB inactive_anon:56kB active_file:976kB inactive_file:1516kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:114688kB managed:99936kB mlocked:0kB dirty:0kB writeback:0kB mapped:324kB shmem:88kB slab_reclaimable:1508kB slab_unreclaimable:14440kB kernel_stack:696kB pagetables:260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1840 all_unreclaimable? no
[940040.033091] lowmem_reserve[]: 0 0 0
[940040.036744] DMA: 143*4kB (UM) 116*8kB (U) 74*16kB (UM) 3*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2780kB
[940040.050026] Normal: 569*4kB (UME) 439*8kB (UM) 286*16kB (UM) 8*32kB (UM) 15*64kB (UM) 2*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 11836kB
[940040.064766] 909 total pagecache pages
[940040.068584] 0 pages in swap cache
[940040.072334] Swap cache stats: add 0, delete 0, find 0/0
[940040.078136] Free swap = 0kB
[940040.081589] Total swap = 0kB
[940040.084587] 32768 pages RAM
[940040.087964] 0 pages HighMem/MovableOnly
[940040.092312] 3689 pages reserved
[940040.096087] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[940040.105230] [ 654] 0 654 297 10 4 0 0 0 ubusd
[940040.114380] [ 655] 0 655 225 6 3 0 0 0 askfirst
[940040.123882] [ 1309] 0 1309 362 13 3 0 0 0 rpcd
[940040.132961] [ 1453] 0 1453 364 27 3 0 0 0 zmsg
[940040.141654] [ 1542] 0 1542 422 34 3 0 0 0 netifd
[940040.150504] [ 1566] 0 1566 351 45 3 0 0 0 odhcpd
[940040.160038] [ 1660] 0 1660 305 22 3 0 0 0 udhcpc
[940040.169236] [ 1672] 0 1672 305 22 4 0 0 0 udhcpc
[940040.179054] [ 1684] 0 1684 256 8 3 0 0 0 odhcp6c
[940040.188399] [ 1788] 0 1788 267 7 3 0 0 0 dropbear
[940040.197855] [ 1804] 0 1804 256 8 3 0 0 0 odhcp6c
[940040.207217] [ 1891] 0 1891 381 28 3 0 0 0 uhttpd
[940040.216402] [ 1933] 0 1933 357 21 4 0 0 0 zdisplay
[940040.225873] [ 1979] 0 1979 635 11 4 0 0 0 zdtool
[940040.235113] [ 2031] 0 2031 311 29 4 0 0 0 zihomescript.sh
[940040.245101] [ 2185] 0 2185 305 8 3 0 0 0 telnetd
[940040.254072] [ 2272] 453 2272 328 10 4 0 0 0 dnsmasq
[940040.263456] [14090] 0 14090 470 52 4 0 0 0 mapd
[940040.272473] [31857] 0 31857 228 9 4 0 0 0 htpdate
[940040.281790] [26573] 0 26573 311 15 4 0 0 0 zihomescript.sh
[940040.291889] Out of memory: Kill process 14090 (mapd) score 1 or sacrifice child
[940040.299716] Killed process 14090 (mapd) total-vm:1880kB, anon-rss:208kB, file-rss:0kB
[940047.227817] MacTableDeleteEntry(): wcid 8 =====
[940047.238415] hw_ctrl_flow_v2_disconnt_act: wdev_idx=3
[940085.994632] ACT - SendBSS2040CoexistMgmtAction(BSSCoexist2040=0x6)
[940086.001068] hw_ctrl_flow_v2_peer_update: wdev_idx=0
[940224.529243] ACT - SendBSS2040CoexistMgmtAction(BSSCoexist2040=0x4)
[940224.536168] hw_ctrl_flow_v2_peer_update: wdev_idx=0
[940276.725992] peer_auth_req: 44 usec
[940276.732087] IE_WLAN_EXTENSION: no handler for extension_id:35
[940276.738337] @@@ ap_cmm_peer_assoc_req_action(): (wcid=24), HTC_ICVErrCnt(0), HTC_AAD_OM_Freeze(0), HTC_AAD_OM_CountDown(0), HTC_AAD_OM_Freeze(1) is in Asso. stage!
[940276.753598] add he assoc_rsp, len=59
网友评论