美文网首页
64位架构 - Kernel的内存管理(mm - memory

64位架构 - Kernel的内存管理(mm - memory

作者: 偷油考拉 | 来源:发表于2021-08-22 00:58 被阅读0次

    https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt
    https://www.kernel.org/doc/html/latest/x86/x86_64/5level-paging.html

    一、4层页表实现虚拟内存映射


    原始的x86-64架构,以4层页表并受限于此,实现了256 TiB的虚拟地址空间和64 TiB的物理地址空间。 我们已经受限于此:一些厂商提供64TiB的内存。
    为了克服这一限制,即将推出的硬件将引入对5级分页的支持。它是当前页表结构的直接扩展,增加了一层翻译。
    它将虚拟地址空间的限制提高到128PiB,物理地址空间的限制提高到4PiB。
    QEMU 2.9及更高版本支持5级分页。


    ========================================================================================================================
        Start addr    |   Offset   |     End addr     |  Size   | VM area description
    ========================================================================================================================
                      |            |                  |         |
     0000000000000000 |    0       | 00007fffffffffff |  128 TB | user-space virtual memory, different per mm
    __________________|____________|__________________|_________|___________________________________________________________
                      |            |                  |         |
     0000800000000000 | +128    TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical
                      |            |                  |         |     virtual memory addresses up to the -128 TB
                      |            |                  |         |     starting offset of kernel mappings.
    __________________|____________|__________________|_________|___________________________________________________________
                                                                |
                                                                | Kernel-space virtual memory, shared between all processes:
    ____________________________________________________________|___________________________________________________________
                      |            |                  |         |
     ffff800000000000 | -128    TB | ffff87ffffffffff |    8 TB | ... guard hole, also reserved for hypervisor
     ffff880000000000 | -120    TB | ffff887fffffffff |  0.5 TB | LDT remap for PTI
     ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
     ffffc88000000000 |  -55.5  TB | ffffc8ffffffffff |  0.5 TB | ... unused hole
     ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)
     ffffe90000000000 |  -23    TB | ffffe9ffffffffff |    1 TB | ... unused hole
     ffffea0000000000 |  -22    TB | ffffeaffffffffff |    1 TB | virtual memory map (vmemmap_base)
     ffffeb0000000000 |  -21    TB | ffffebffffffffff |    1 TB | ... unused hole
     ffffec0000000000 |  -20    TB | fffffbffffffffff |   16 TB | KASAN shadow memory
    __________________|____________|__________________|_________|____________________________________________________________
                                                                |
                                                                | Identical layout to the 56-bit one from here on:
    ____________________________________________________________|____________________________________________________________
                      |            |                  |         |
     fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
                      |            |                  |         | vaddr_end for KASLR
     fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
     fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
     ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
     ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
     ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
     ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
     ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
     ffffffff80000000 |-2048    MB |                  |         |
     ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
     ffffffffff000000 |  -16    MB |                  |         |
        FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
     ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
     ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
    __________________|____________|__________________|_________|___________________________________________________________
    

    二、5层页表实现虚拟内存映射


    CONFIG_X86_5LEVEL=y 开启该特性。
    配置CONFIG_X86_5LEVEL=y的内核仍然可以运行在4-level的硬件上。这种情况,会在运行时会包含一个额外的页表级别 – p4d – 。

    在x86架构上,5级分页支持56位用户空间虚拟地址空间。并非所有用户空间都准备好处理宽地址。众所周知,至少有一些JIT编译器使用指针中的高位对其信息进行编码。它与具有5级分页的有效指针冲突,并导致崩溃。为了缓解这种情况,默认我们不会分配47位以上的虚拟地址空间。

    但是,用户空间可以通过指定47位以上的hint addresswith or without MAP_FIXED),从整个地址空间请求分配。
    hint address设置在47位以上,但没有指定MAP_FIXED,我们将尝试按指定的地址查找未映射的区域。若它已经被占用,我们将在整个的地址空间中查找未映射的区域,而不是从47位窗口中查找。
    high hint address只会影响相关的分配,而不会影响将来的mmap()s。
    在旧内核上或在没有5级分页支持的计算机上指定high hint address是安全的。Hint将被忽略,内核将退回到47位地址空间的分配。
    该方法有助于轻松地使应用程序的内存分配器分配大地址空间,而无需手动跟踪分配的虚拟地址空间。

    一个重要问题:处理与MPX的交互。MPX(没有MAWA扩展)无法处理47位以上的地址,因此我们需要确保无法启用MPX。我们已经在边界上方有VMA,并且一旦启用MPX,就禁止创建此类VMA。

    ========================================================================================================================
        Start addr    |   Offset   |     End addr     |  Size   | VM area description
    ========================================================================================================================
                      |            |                  |         |
     0000000000000000 |    0       | 00ffffffffffffff |   64 PB | user-space virtual memory, different per mm
    __________________|____________|__________________|_________|___________________________________________________________
                      |            |                  |         |
     0000800000000000 |  +64    PB | ffff7fffffffffff | ~16K PB | ... huge, still almost 64 bits wide hole of non-canonical
                      |            |                  |         |     virtual memory addresses up to the -64 PB
                      |            |                  |         |     starting offset of kernel mappings.
    __________________|____________|__________________|_________|___________________________________________________________
                                                                |
                                                                | Kernel-space virtual memory, shared between all processes:
    ____________________________________________________________|___________________________________________________________
                      |            |                  |         |
     ff00000000000000 |  -64    PB | ff0fffffffffffff |    4 PB | ... guard hole, also reserved for hypervisor
     ff10000000000000 |  -60    PB | ff10ffffffffffff | 0.25 PB | LDT remap for PTI
     ff11000000000000 |  -59.75 PB | ff90ffffffffffff |   32 PB | direct mapping of all physical memory (page_offset_base)
     ff91000000000000 |  -27.75 PB | ff9fffffffffffff | 3.75 PB | ... unused hole
     ffa0000000000000 |  -24    PB | ffd1ffffffffffff | 12.5 PB | vmalloc/ioremap space (vmalloc_base)
     ffd2000000000000 |  -11.5  PB | ffd3ffffffffffff |  0.5 PB | ... unused hole
     ffd4000000000000 |  -11    PB | ffd5ffffffffffff |  0.5 PB | virtual memory map (vmemmap_base)
     ffd6000000000000 |  -10.5  PB | ffdeffffffffffff | 2.25 PB | ... unused hole
     ffdf000000000000 |   -8.25 PB | fffffdffffffffff |   ~8 PB | KASAN shadow memory
    __________________|____________|__________________|_________|____________________________________________________________
                                                                |
                                                                | Identical layout to the 47-bit one from here on:
    ____________________________________________________________|____________________________________________________________
                      |            |                  |         |
     fffffc0000000000 |   -4    TB | fffffdffffffffff |    2 TB | ... unused hole
                      |            |                  |         | vaddr_end for KASLR
     fffffe0000000000 |   -2    TB | fffffe7fffffffff |  0.5 TB | cpu_entry_area mapping
     fffffe8000000000 |   -1.5  TB | fffffeffffffffff |  0.5 TB | ... unused hole
     ffffff0000000000 |   -1    TB | ffffff7fffffffff |  0.5 TB | %esp fixup stacks
     ffffff8000000000 | -512    GB | ffffffeeffffffff |  444 GB | ... unused hole
     ffffffef00000000 |  -68    GB | fffffffeffffffff |   64 GB | EFI region mapping space
     ffffffff00000000 |   -4    GB | ffffffff7fffffff |    2 GB | ... unused hole
     ffffffff80000000 |   -2    GB | ffffffff9fffffff |  512 MB | kernel text mapping, mapped to physical address 0
     ffffffff80000000 |-2048    MB |                  |         |
     ffffffffa0000000 |-1536    MB | fffffffffeffffff | 1520 MB | module mapping space
     ffffffffff000000 |  -16    MB |                  |         |
        FIXADDR_START | ~-11    MB | ffffffffff5fffff | ~0.5 MB | kernel-internal fixmap range, variable size and offset
     ffffffffff600000 |  -10    MB | ffffffffff600fff |    4 kB | legacy vsyscall ABI
     ffffffffffe00000 |   -2    MB | ffffffffffffffff |    2 MB | ... unused hole
    __________________|____________|__________________|_________|___________________________________________________________
    

    架构定义了64位虚拟地址。工程实现支持少一些。目前支持的是48位和57位虚拟地址。

    63位到最高有效实现位是sign扩展的。如果将其解释为unsighed,则会导致用户空间和内核地址之间出现漏洞。

    直接映射覆盖系统中直到最高内存地址的所有内存(这意味着在某些情况下,它还可以包括PCI内存洞)。

    vmalloc空间使用page fault处理程序(以init_top_pgt为参考)惰性地同步到进程的不同PML4/PML5页面中。

    我们将在EFI_pgdPGD中的EFI runtime服务映射到64Gb大型虚拟内存窗口中(此大小是任意的,如果需要,可以稍后提高)。这些映射不属于任何其他内核PGD,仅在EFI runtime调用期间可用。

    请注意,如果启用了CONFIG_RANDOMIZE_MEMORY,则直接映射所有物理内存、vmalloc/ioremap空间和虚拟内存,都将随机化。
    它们的顺序保留,但它们的基数早在boot时就偏移了。

    与KASLR相比,在这里更改任何内容时都要非常小心。KASLR地址范围不得与除KASAN阴影区域以外的任何区域重叠,因为KASAN禁用了KASLR。

    对于4层和5层布局,STACKLEAK_POISON值在最后2MB

    相关文章

      网友评论

          本文标题:64位架构 - Kernel的内存管理(mm - memory

          本文链接:https://www.haomeiwen.com/subject/fsnqiltx.html