美文网首页
大师兄的Python源码学习笔记(五): List对象

大师兄的Python源码学习笔记(五): List对象

作者: superkmi | 来源:发表于2021-02-23 10:23 被阅读0次

    大师兄的Python源码学习笔记(四): 字符串对象
    大师兄的Python源码学习笔记(六): Dict对象

    一、List对象

    • Python中的List对象为PyListObject,类似C++中的vector。
    Objects\listobject.h
    
    typedef struct {
        PyObject_VAR_HEAD
        /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
        PyObject **ob_item;
    
        /* ob_item contains space for 'allocated' elements.  The number
         * currently in use is ob_size.
         * Invariants:
         *     0 <= ob_size <= allocated
         *     len(list) == ob_size
         *     ob_item == NULL implies ob_size == allocated == 0
         * list.sort() temporarily sets allocated to -1 to detect mutations.
         *
         * Items must normally not be NULL, except during construction when
         * the list is not yet visible outside the function that builds it.
         */
        Py_ssize_t allocated;
    } PyListObject;
    
    • 观察代码,其结构如下:
    PyListObject
    PyObject_VAR_HEAD
    **ob_item
    allocated
    • 头部的PyObject_VAR_HEAD说明PyListObject是变长对象。
    • **ob_item指向包含元素的指针。
    • 为了避免频繁申请内存,PyListObject会先申请一大片内存,allocated记录了申请的内存,而PyObject_VAR_HEAD中的ob_size记录了实际使用的内存。
    • 所以PyListObject一定存在以下关系:
    0 <= ob_size <= allocated
    len(list) = ob_size
    ob_item == NULL 意味 ob_size = allocated == 0
    

    二、类型对象

    • PyListObject对应的类型对象为PyList_Type
    Objects\listobject.c
    
    PyTypeObject PyList_Type = {
        PyVarObject_HEAD_INIT(&PyType_Type, 0)
        "list",
        sizeof(PyListObject),
        0,
        (destructor)list_dealloc,                   /* tp_dealloc */
        0,                                          /* tp_print */
        0,                                          /* tp_getattr */
        0,                                          /* tp_setattr */
        0,                                          /* tp_reserved */
        (reprfunc)list_repr,                        /* tp_repr */
        0,                                          /* tp_as_number */
        &list_as_sequence,                          /* tp_as_sequence */
        &list_as_mapping,                           /* tp_as_mapping */
        PyObject_HashNotImplemented,                /* tp_hash */
        0,                                          /* tp_call */
        0,                                          /* tp_str */
        PyObject_GenericGetAttr,                    /* tp_getattro */
        0,                                          /* tp_setattro */
        0,                                          /* tp_as_buffer */
        Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
            Py_TPFLAGS_BASETYPE | Py_TPFLAGS_LIST_SUBCLASS, /* tp_flags */
        list___init____doc__,                       /* tp_doc */
        (traverseproc)list_traverse,                /* tp_traverse */
        (inquiry)_list_clear,                       /* tp_clear */
        list_richcompare,                           /* tp_richcompare */
        0,                                          /* tp_weaklistoffset */
        list_iter,                                  /* tp_iter */
        0,                                          /* tp_iternext */
        list_methods,                               /* tp_methods */
        0,                                          /* tp_members */
        0,                                          /* tp_getset */
        0,                                          /* tp_base */
        0,                                          /* tp_dict */
        0,                                          /* tp_descr_get */
        0,                                          /* tp_descr_set */
        0,                                          /* tp_dictoffset */
        (initproc)list___init__,                    /* tp_init */
        PyType_GenericAlloc,                        /* tp_alloc */
        PyType_GenericNew,                          /* tp_new */
        PyObject_GC_Del,                            /* tp_free */
    };
    

    三、创建对象

    • 创建列表的唯一途径是PyList_New
    Objects\listobject.c
    
    PyObject *
    PyList_New(Py_ssize_t size)
    {
        PyListObject *op;
    #ifdef SHOW_ALLOC_COUNT
        static int initialized = 0;
        if (!initialized) {
            Py_AtExit(show_alloc);
            initialized = 1;
        }
    #endif
    
        if (size < 0) {
            PyErr_BadInternalCall();
            return NULL;
        }
        if (numfree) {
            numfree--;
            op = free_list[numfree];
            _Py_NewReference((PyObject *)op);
    #ifdef SHOW_ALLOC_COUNT
            count_reuse++;
    #endif
        } else {
            op = PyObject_GC_New(PyListObject, &PyList_Type);
            if (op == NULL)
                return NULL;
    #ifdef SHOW_ALLOC_COUNT
            count_alloc++;
    #endif
        }
        if (size <= 0)
            op->ob_item = NULL;
        else {
            op->ob_item = (PyObject **) PyMem_Calloc(size, sizeof(PyObject *));
            if (op->ob_item == NULL) {
                Py_DECREF(op);
                return PyErr_NoMemory();
            }
        }
        Py_SIZE(op) = size;
        op->allocated = size;
        _PyObject_GC_TRACK(op);
        return (PyObject *) op;
    }
    
    • PyList_New函数接受一个参数:size,用来在创建时指定元素的个数,并在创建对象后首先检查size的合法性。
    if (size < 0) {
            PyErr_BadInternalCall();
            return NULL;
        }
    
    • 之后查看PyListObject对象缓冲池是否可用。
     if (numfree) {
            numfree--;
            op = free_list[numfree];
            _Py_NewReference((PyObject *)op);
    #ifdef SHOW_ALLOC_COUNT
            count_reuse++;
    #endif
        } 
    
    • 如果不可用则向系统申请内存并创建新的PyListObject对象
    else {
            op = PyObject_GC_New(PyListObject, &PyList_Type);
            if (op == NULL)
                return NULL;
    #ifdef SHOW_ALLOC_COUNT
            count_alloc++;
    #endif
        }
    
    • 最后计算对象的内存总数是否溢出。
    if (size <= 0)
            op->ob_item = NULL;
        else {
            op->ob_item = (PyObject **) PyMem_Calloc(size, sizeof(PyObject *));
            if (op->ob_item == NULL) {
                Py_DECREF(op);
                return PyErr_NoMemory();
            }
        }
    
    • PyListObject缓冲池free_list默认大小为80。
    Objects\listobject.c
    
    /* Empty list reuse scheme to save calls to malloc and free */
    #ifndef PyList_MAXFREELIST
    #define PyList_MAXFREELIST 80
    #endif
    static PyListObject *free_list[PyList_MAXFREELIST];
    static int numfree = 0;
    
    • Python中有一个称为0代(Generation Zero)的链表来追踪活跃对象,_PyObject_GC_TRACK(op)将对象链接到了第0代对象集合中,与垃圾回收机制相关。

    四、设置元素

    • 为PyListObject元素设置值的方法,如:list[0]=5PyList_SetItem
    Objects\listobject.c
    
    int
    PyList_SetItem(PyObject *op, Py_ssize_t i,
                   PyObject *newitem)
    {
        PyObject **p;
        if (!PyList_Check(op)) {
            Py_XDECREF(newitem);
            PyErr_BadInternalCall();
            return -1;
        }
        if (i < 0 || i >= Py_SIZE(op)) {
            Py_XDECREF(newitem);
            PyErr_SetString(PyExc_IndexError,
                            "list assignment index out of range");
            return -1;
        }
        p = ((PyListObject *)op) -> ob_item + i;
        Py_XSETREF(*p, newitem);
        return 0;
    }
    
    • PyList_SetItem方法在对PyListObject进行有效性检查和内存空间检查后,将新元素添加到列表的索引位置。

    五、获取元素

    • 获取PyListObject元素值的方法为PyList_GetItem
    Objects\listobject.c
    
    PyObject *
    PyList_GetItem(PyObject *op, Py_ssize_t i)
    {
        if (!PyList_Check(op)) {
            PyErr_BadInternalCall();
            return NULL;
        }
        if (i < 0 || i >= Py_SIZE(op)) {
            if (indexerr == NULL) {
                indexerr = PyUnicode_FromString(
                    "list index out of range");
                if (indexerr == NULL)
                    return NULL;
            }
            PyErr_SetObject(PyExc_IndexError, indexerr);
            return NULL;
        }
        return ((PyListObject *)op) -> ob_item[i];
    }
    

    六、追加元素

    • 追加元素就是Python中的list.append(),在源码中调用PyList_Append方法。
    Objects\listobject.c
    
    int
    PyList_Append(PyObject *op, PyObject *newitem)
    {
        if (PyList_Check(op) && (newitem != NULL))
            return app1((PyListObject *)op, newitem);
        PyErr_BadInternalCall();
        return -1;
    }
    
    static int
    app1(PyListObject *self, PyObject *v)
    {
        Py_ssize_t n = PyList_GET_SIZE(self);
    
        assert (v != NULL);
        if (n == PY_SSIZE_T_MAX) {
            PyErr_SetString(PyExc_OverflowError,
                "cannot add more objects to list");
            return -1;
        }
    
        if (list_resize(self, n+1) < 0)
            return -1;
    
        Py_INCREF(v);
        PyList_SET_ITEM(self, n, v);
        return 0;
    }
    
    • 追加操作会改变PyListObject的尺寸,在确认空间满足后,将列表size+1。
    if (list_resize(self, n+1) < 0)
            return -1;
    
    • 最后调用PyList_SET_ITEM将元素添加到列表末尾。
        Py_INCREF(v);
        PyList_SET_ITEM(self, n, v);
    

    七、插入元素

    • 插入元素就是Python中的list.insert(),在源码中调用PyList_Insert方法。
    Objects\listobject.c
    
    int
    PyList_Insert(PyObject *op, Py_ssize_t where, PyObject *newitem)
    {
        if (!PyList_Check(op)) {
            PyErr_BadInternalCall();
            return -1;
        }
        return ins1((PyListObject *)op, where, newitem);
    }
    
    static int
    ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
    {
        Py_ssize_t i, n = Py_SIZE(self);
        PyObject **items;
        if (v == NULL) {
            PyErr_BadInternalCall();
            return -1;
        }
        if (n == PY_SSIZE_T_MAX) {
            PyErr_SetString(PyExc_OverflowError,
                "cannot add more objects to list");
            return -1;
        }
    
        if (list_resize(self, n+1) < 0)
            return -1;
    
        if (where < 0) {
            where += n;
            if (where < 0)
                where = 0;
        }
        if (where > n)
            where = n;
        items = self->ob_item;
        for (i = n; --i >= where; )
            items[i+1] = items[i];
        Py_INCREF(v);
        items[where] = v;
        return 0;
    }
    
    • 与append类似,insert首先检查内存后改变对象尺寸。
        if (list_resize(self, n+1) < 0)
            return -1;
    
    • 区别是将改变尺寸后将指定位置后的元素往后移动了一个位置,并在指定位置设置元素。
    if (where < 0) {
            where += n;
            if (where < 0)
                where = 0;
        }
        if (where > n)
            where = n;
        items = self->ob_item;
        for (i = n; --i >= where; )
            items[i+1] = items[i];
        Py_INCREF(v);
        items[where] = v;
    

    八、删除元素

    • 删除元素就是Python中的list.remove(),在源码中调用list_remove方法。
    Objects\listobject.c
    
    static PyObject *
    list_remove(PyListObject *self, PyObject *value)
    /*[clinic end generated code: output=f087e1951a5e30d1 input=2dc2ba5bb2fb1f82]*/
    {
        Py_ssize_t i;
    
        for (i = 0; i < Py_SIZE(self); i++) {
            int cmp = PyObject_RichCompareBool(self->ob_item[i], value, Py_EQ);
            if (cmp > 0) {
                if (list_ass_slice(self, i, i+1,
                                   (PyObject *)NULL) == 0)
                    Py_RETURN_NONE;
                return NULL;
            }
            else if (cmp < 0)
                return NULL;
        }
        PyErr_SetString(PyExc_ValueError, "list.remove(x): x not in list");
        return NULL;
    }
    
    • list_remove方法首先在列表中查找第一个匹配的元素。
     int cmp = PyObject_RichCompareBool(self->ob_item[i], value, Py_EQ);
    
    • 如果找到元素,则将指定位置元素删除,并将列表尺寸-1。
    if (list_ass_slice(self, i, i+1,
                                   (PyObject *)NULL) == 0)
    

    九、改变列表尺寸

    • 在上面的代码中,经常出现方法list_resize,该方法用于调节list存储空间大小。
    Objects\listobject.c
    
    /* Ensure ob_item has room for at least newsize elements, and set
     * ob_size to newsize.  If newsize > ob_size on entry, the content
     * of the new slots at exit is undefined heap trash; it's the caller's
     * responsibility to overwrite them with sane values.
     * The number of allocated elements may grow, shrink, or stay the same.
     * Failure is impossible if newsize <= self.allocated on entry, although
     * that partly relies on an assumption that the system realloc() never
     * fails when passed a number of bytes <= the number of bytes last
     * allocated (the C standard doesn't guarantee this, but it's hard to
     * imagine a realloc implementation where it wouldn't be true).
     * Note that self->ob_item may change, and even if newsize is less
     * than ob_size on entry.
     */
    static int
    list_resize(PyListObject *self, Py_ssize_t newsize)
    {
        PyObject **items;
        size_t new_allocated, num_allocated_bytes;
        Py_ssize_t allocated = self->allocated;
    
        /* Bypass realloc() when a previous overallocation is large enough
           to accommodate the newsize.  If the newsize falls lower than half
           the allocated size, then proceed with the realloc() to shrink the list.
        */
        if (allocated >= newsize && newsize >= (allocated >> 1)) {
            assert(self->ob_item != NULL || newsize == 0);
            Py_SIZE(self) = newsize;
            return 0;
        }
    
        /* This over-allocates proportional to the list size, making room
         * for additional growth.  The over-allocation is mild, but is
         * enough to give linear-time amortized behavior over a long
         * sequence of appends() in the presence of a poorly-performing
         * system realloc().
         * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
         * Note: new_allocated won't overflow because the largest possible value
         *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
         */
        new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
        if (new_allocated > (size_t)PY_SSIZE_T_MAX / sizeof(PyObject *)) {
            PyErr_NoMemory();
            return -1;
        }
    
        if (newsize == 0)
            new_allocated = 0;
        num_allocated_bytes = new_allocated * sizeof(PyObject *);
        items = (PyObject **)PyMem_Realloc(self->ob_item, num_allocated_bytes);
        if (items == NULL) {
            PyErr_NoMemory();
            return -1;
        }
        self->ob_item = items;
        Py_SIZE(self) = newsize;
        self->allocated = new_allocated;
        return 0;
    }
    
    • allocated/2 <= newsize <= allocated时,只改变ob_sizeallocated不动。
    • 否则将调用PyMem_Realloc分配新的空间存储列表元素。

    十、PyListObject缓冲池

    • 缓冲池free_list中的对象在PyListObject对象被销毁过程中获得。
    Objects\listobject.c
    
    static void
    list_dealloc(PyListObject *op)
    {
        Py_ssize_t i;
        PyObject_GC_UnTrack(op);
        Py_TRASHCAN_SAFE_BEGIN(op)
        if (op->ob_item != NULL) {
            /* Do it backwards, for Christian Tismer.
               There's a simple test case where somehow this reduces
               thrashing when a *very* large list is created and
               immediately deleted. */
            i = Py_SIZE(op);
            while (--i >= 0) {
                Py_XDECREF(op->ob_item[i]);
            }
            PyMem_FREE(op->ob_item);
        }
        if (numfree < PyList_MAXFREELIST && PyList_CheckExact(op))
            free_list[numfree++] = op;
        else
            Py_TYPE(op)->tp_free((PyObject *)op);
        Py_TRASHCAN_SAFE_END(op)
    }
    
    • 在删除对象时,Python会检查缓冲池free_list是否已满,如果未满,则将PyListObject放入备用。
       if (numfree < PyList_MAXFREELIST && PyList_CheckExact(op))
            free_list[numfree++] = op;
    
    • 这样未来在创建新的PyListObject对象时,会首先唤醒缓冲池中的对象。
    • 但这些被唤醒的对象只是PyListObject对象,而不包含曾经的PyObject*元素列表。

    相关文章

      网友评论

          本文标题:大师兄的Python源码学习笔记(五): List对象

          本文链接:https://www.haomeiwen.com/subject/tvhhfltx.html