大师兄的Python源码学习笔记(四十七）: Python的内存

作者: superkmi | 来源:发表于2021-12-24 20:17 被阅读0次

大师兄的Python源码学习笔记(四十八）: Python的内存
大师兄的Python源码学习笔记(五十一）: Python的内存
大师兄的Python源码学习笔记(五十九）: Python的内存
大师兄的Python源码学习笔记(五十六）: Python的内存
大师兄的Python源码学习笔记(五十七）: Python的内存
大师兄的Python源码学习笔记(五十八）: Python的内存
大师兄的Python源码学习笔记(五十三）: Python的内存
大师兄的Python源码学习笔记(五十二）: Python的内存
大师兄的Python源码学习笔记(五十五）: Python的内存
大师兄的Python源码学习笔记(五十四）: Python的内存

大师兄的Python源码学习笔记(四十六）: Python的内存管理机制(一）
大师兄的Python源码学习笔记(四十八）: Python的内存管理机制(三）

二、小块空间的内存池

在Python中，经常需要申请小块内存，这些小块内存在申请后，很快会被释放。
由于这些小块内存的申请并不是为了创建对象，所以没有对象一级的内存池机制，这意味着需要执行大量的malloc和free操作。
为了提高效率，Python引入了一个内存池机制，用于管理对小块内存的申请和释放，也就是Pymalloc机制，通过PyObject_Malloc、PyObject_Realloc和PyObject_Free三个接口显示给Python。
这个小块内存的内存池可以视为一个层次结构，从下至上分别是block、pool、arena和内存池。
其中block、pool、arena在代码中可以找到实体，而内存池只是一个概念，表示Python对整个小块内存分配和释放行为的内存管理机制。

1. block

在最底层，block是一个确定大小的内存块。
block有很多种，不同种类的block有不同的内存大小，这个值为size class。
为了在当前32位和64位平台获得最佳性能，所有block的size class都是8字节对齐的。

Objects\obmalloc.c

 * Allocation strategy abstract:
 *
 * For small requests, the allocator sub-allocates <Big> blocks of memory.
 * Requests greater than SMALL_REQUEST_THRESHOLD bytes are routed to the
 * system's allocator.
 *
 * Small requests are grouped in size classes spaced 8 bytes apart, due
 * to the required valid alignment of the returned address. Requests of
 * a particular size are serviced from memory pools of 4K (one VMM page).
 * Pools are fragmented on demand and contain free lists of blocks of one
 * particular size class. In other words, there is a fixed-size allocator
 * for each size class. Free pools are shared by the different allocators
 * thus minimizing the space reserved for a particular size class.
 *
 * This allocation strategy is a variant of what is known as "simple
 * segregated storage based on array of free lists". The main drawback of
 * simple segregated storage is that we might end up with lot of reserved
 * memory for the different free lists, which degenerate in time. To avoid
 * this, we partition each free list in pools and we share dynamically the
 * reserved space between all free lists. This technique is quite efficient
 * for memory intensive programs which allocate mainly small-sized blocks.
 *
 * For small requests we have the following table:
 *
 * Request in bytes     Size of allocated block      Size class idx
 * ----------------------------------------------------------------
 *        1-8                     8                       0
 *        9-16                   16                       1
 *       17-24                   24                       2
 *       25-32                   32                       3
 *       33-40                   40                       4
 *       41-48                   48                       5
 *       49-56                   56                       6
 *       57-64                   64                       7
 *       65-72                   72                       8
 *        ...                   ...                     ...
 *      497-504                 504                      62
 *      505-512                 512                      63
 *
 *      0, SMALL_REQUEST_THRESHOLD + 1 and up: routed to the underlying
 *      allocator.
 */

/*==========================================================================*/

/*
 * -- Main tunable settings section --
 */

/*
 * Alignment of addresses returned to the user. 8-bytes alignment works
 * on most current architectures (with 32-bit or 64-bit address busses).
 * The alignment value is also used for grouping small requests in size
 * classes spaced ALIGNMENT bytes apart.
 *
 * You shouldn't change this unless you know what you are doing.
 */
#define ALIGNMENT               8               /* must be 2^N */
#define ALIGNMENT_SHIFT         3

/* Return the number of bytes in size class I, as a uint. */
#define INDEX2SIZE(I) (((uint)(I) + 1) << ALIGNMENT_SHIFT)

此外，block的上限被设定为512，当申请内存小于这个上限时，交给不同种类的block处理；当申请的内存超过上限，则交给第一层的内存管理机制，即PyMem函数来处理。

Objects\obmalloc.c

#define SMALL_REQUEST_THRESHOLD 512
#define NB_SMALL_SIZE_CLASSES   (SMALL_REQUEST_THRESHOLD / ALIGNMENT)

根据SMALL_REQUEST_THRESHOLD和ALIGNMENT的限定，可以看到不同种类block的size class对应一个Size class idx，下面两个公式用来转换size class和size class idx：

Objects\obmalloc.c

#define INDEX2SIZE(I) (((uint)(I) + 1) << ALIGNMENT_SHIFT)

size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT;

但是在Python中，block只是一个概念，源码中并没有与之对应的实体存在，我们知道他是具有一定大小的内存，但不与Python中的某个实体对应。
然后Python提供了一个用来管理block的实体，就是pool。

2. pool

一组block的集合称为一个pool，一个pool管理着一堆固定大小的内存块。
事实上，pool管理着一大块内存，并通过策略将这块大内存划分为多个小内存块。
一个pool的大小通常为一个系统内存页，由于当前大多数Python支持的系统内存页都是4KB，所以pool的大小通常定义为4KB。

Objects\obmalloc.c

/*
 * The system's VMM page size can be obtained on most unices with a
 * getpagesize() call or deduced from various header files. To make
 * things simpler, we assume that it is 4K, which is OK for most systems.
 * It is probably better if this is the native page size, but it doesn't
 * have to be.  In theory, if SYSTEM_PAGE_SIZE is larger than the native page
 * size, then `POOL_ADDR(p)->arenaindex' could rarely cause a segmentation
 * violation fault.  4K is apparently OK for all the platforms that python
 * currently targets.
 */
#define SYSTEM_PAGE_SIZE        (4 * 1024)
#define SYSTEM_PAGE_SIZE_MASK   (SYSTEM_PAGE_SIZE - 1)
... ...
/*
 * Size of the pools used for small blocks. Should be a power of 2,
 * between 1K and SYSTEM_PAGE_SIZE, that is: 1k, 2k, 4k.
 */
#define POOL_SIZE               SYSTEM_PAGE_SIZE        /* must be 2^N */
#define POOL_SIZE_MASK          SYSTEM_PAGE_SIZE_MASK

pool在Python对应的结构体是pool_header:

Objects\obmalloc.c

/* Pool for small blocks. */
struct pool_header {
    union { block *_padding;
            uint count; } ref;          /* number of allocated blocks    */
    block *freeblock;                   /* pool's free list head         */
    struct pool_header *nextpool;       /* next pool of this size class  */
    struct pool_header *prevpool;       /* previous pool       ""        */
    uint arenaindex;                    /* index into arenas of base adr */
    uint szidx;                         /* block size class index        */
    uint nextoffset;                    /* bytes to virgin block         */
    uint maxnextoffset;                 /* largest valid nextoffset      */
};

从名字可以看出，pool_header是一个pool的头部。
一个pool管理的所有block大小都是一样的，也就是他们的size class idx是一样的，由pool_header中的szidx定义。
假如有一块4KB的内存，看看Python如何将其改造成pool：

Objects\obmalloc.c
static int
pymalloc_alloc(void *ctx, void **ptr_p, size_t nbytes)
{
    block *bp;
    poolp pool;
    poolp next;
    uint size;
 ... ...
    init_pool:
        /* Frontlink to used pools. */
        next = usedpools[size + size]; /* == prev */
        pool->nextpool = next;
        pool->prevpool = next;
        next->nextpool = pool;
        next->prevpool = pool;
        pool->ref.count = 1;
        if (pool->szidx == size) {
            /* Luckily, this pool last contained blocks
             * of the same size class, so its header
             * and free list are already initialized.
             */
            bp = pool->freeblock;
            assert(bp != NULL);
            pool->freeblock = *(block **)bp;
            goto success;
        }
        /*
         * Initialize the pool header, set up the free list to
         * contain just the second block, and return the first
         * block.
         */
        pool->szidx = size;
        size = INDEX2SIZE(size);
        bp = (block *)pool + POOL_OVERHEAD;
        pool->nextoffset = POOL_OVERHEAD + (size << 1);
        pool->maxnextoffset = POOL_SIZE - size;
        pool->freeblock = bp + size;
        *(block **)(pool->freeblock) = NULL;
        goto success;
    }
... ...
success:
    UNLOCK();
    assert(bp != NULL);
    *ptr_p = (void *)bp;
    return 1;
}

最后传出的*ptr_p(bp)也就是从pool中取出的第一块block的指针。
由于第一块block已经分配了，也就是已经分配了1块block，所以pool->ref.count = 1。
此外，*ptr_p(bp)指向的地址后有将近4KB内存都是可用的，但申请内存的函数只会使用[bp,bp+size]这个区间的内存，这是由size class idx保证的，所以经过改造后的4KB内存如下：
图中的实线箭头是指针，虚线箭头是偏移位置，在nextoffset和maxnextoffset中存储的是相对于pool头部的偏移位置
现在假设申请5块28字节的内存，由于28个字节对应的size class idx为3，所以实际会申请5块32字节的内存：

Objects\obmalloc.c

static int
pymalloc_alloc(void *ctx, void **ptr_p, size_t nbytes)
{
    block *bp;
    poolp pool;
    poolp next;
    uint size;

... ...
    if (pool != pool->nextpool) {
        /*
         * There is a used pool for this size class.
         * Pick up the head block of its free list.
         */
        ++pool->ref.count;
        bp = pool->freeblock;
        assert(bp != NULL);
        if ((pool->freeblock = *(block **)bp) != NULL) {
            goto success;
        }

        /*
         * Reached the end of the free list, try to extend it.
         */
        if (pool->nextoffset <= pool->maxnextoffset) {
            /* There is room for another block. */
            pool->freeblock = (block*)pool +
                              pool->nextoffset;
            pool->nextoffset += INDEX2SIZE(size);
            *(block **)(pool->freeblock) = NULL;
            goto success;
        }

        ... ...
    }
}

可以看出freeblock是下一个可用block的起始地址，当再次申请32字节的block时，只需要返回freeblock只想的地址就可以。
nextoffset和maxoffset是两个用于对pool中的block集合进行迭代的变量，它所指的偏移位置正好指向了freeblock之后的下一个可用的block地址，在分配block之后，freeblock和nextoffset都会向前移动一个block的距离，如此反复进行遍历。
maxoffset指向pool中的最后一个可用block，它界定了pool的边界。
除了freeblock之外，Python还设置了*freeblock：

Objects\obmalloc.c

static int
pymalloc_free(void *ctx, void *p)
{
    poolp pool;
    block *lastfree;
    poolp next, prev;
    uint size;

... ...

    /* Link p to the start of the pool's freeblock list.  Since
     * the pool had at least the p block outstanding, the pool
     * wasn't empty (so it's already in a usedpools[] list, or
     * was full and is in no list -- it's not in the freeblocks
     * list in any case).
     */
    assert(pool->ref.count > 0);            /* else it was empty */
    *(block **)p = lastfree = pool->freeblock;
    pool->freeblock = (block *)p;
}

当一个block被释放时，freeblock虽然指向了一个有效的pool内地址，但*freeblock为NULL。
假设Python释放的是blockA，在代码清单1之后，A中的第一个字节的值被设置成为当前的freeblock值，而在代码清单2之后，freeblock的值被更新了，指向block A的首地址。
这样，一个block被插入到了离散自由的block链表中。
当第2块和第4块block都被释放后，可以看到一个初具规模的离散自由block链表：
对于这个离散自由block链表，可以从freeblock开始很容易的以freeblock = *freeblock的方式遍历这条链表，当*freeblock为NULL时，表明到达了链表尾部。

Objects\obmalloc.c

static int
pymalloc_alloc(void *ctx, void **ptr_p, size_t nbytes)
{
    block *bp;
    poolp pool;
    poolp next;
    uint size;

... ...
    if (pool != pool->nextpool) {
        /*
         * There is a used pool for this size class.
         * Pick up the head block of its free list.
         */
        ++pool->ref.count;
        bp = pool->freeblock;
        assert(bp != NULL);
        if ((pool->freeblock = *(block **)bp) != NULL) {
            goto success;
        }
... ...
}

在前面出现过的这段代码中，可以看到判断pool->freeblock = *(block **)bp，如果为真，则表示离散链表已经不存在了。
如果可能则继续分配pool的nextoffset指定的下一块block，否则就提供另一个pool。
这意味着也存在一个pool的集合

大师兄的Python源码学习笔记(四十八）: Python的内存
大师兄的Python源码学习笔记(四十七）: Python的内存管理机制(二）[https://www.jians...
大师兄的Python源码学习笔记(五十一）: Python的内存
大师兄的Python源码学习笔记(五十）: Python的内存管理机制(五）[https://www.jiansh...
大师兄的Python源码学习笔记(五十九）: Python的内存
大师兄的Python源码学习笔记(五十八）: Python的内存管理机制(十三）[https://www.jian...
大师兄的Python源码学习笔记(五十六）: Python的内存
大师兄的Python源码学习笔记(五十五）: Python的内存管理机制(十）[https://www.jians...
大师兄的Python源码学习笔记(五十七）: Python的内存
大师兄的Python源码学习笔记(五十六）: Python的内存管理机制(十一）[https://www.jian...
大师兄的Python源码学习笔记(五十八）: Python的内存
大师兄的Python源码学习笔记(五十七）: Python的内存管理机制(十二）[https://www.jian...
大师兄的Python源码学习笔记(五十三）: Python的内存
大师兄的Python源码学习笔记(五十二）: Python的内存管理机制(七）[https://www.jians...
大师兄的Python源码学习笔记(五十二）: Python的内存
大师兄的Python源码学习笔记(五十一）: Python的内存管理机制(六）[https://www.jians...
大师兄的Python源码学习笔记(五十五）: Python的内存
大师兄的Python源码学习笔记(五十四）: Python的内存管理机制(九）[https://www.jians...
大师兄的Python源码学习笔记(五十四）: Python的内存
大师兄的Python源码学习笔记(五十三）: Python的内存管理机制(八）[https://www.jians...