美文网首页
大师兄的Python源码学习笔记(二十三): 虚拟机中的类机制(

大师兄的Python源码学习笔记(二十三): 虚拟机中的类机制(

作者: superkmi | 来源:发表于2021-07-09 10:13 被阅读0次

    大师兄的Python源码学习笔记(二十二): 虚拟机中的类机制(一)
    大师兄的Python源码学习笔记(二十四): 虚拟机中的类机制(三)

    二. 从type对象到class对象

    1. 处理基类和type信息
    • 在Python启动时,会对内置类型对应的PyTypeObject填充一些重要内容,这个过程从PyType_Ready开始:
    Objects\typeobject.c
    
    int
    PyType_Ready(PyTypeObject *type)
    {
        PyObject *dict, *bases;
        PyTypeObject *base;
        Py_ssize_t i, n;
    
        if (type->tp_flags & Py_TPFLAGS_READY) {
            assert(_PyType_CheckConsistency(type));
            return 0;
        }
        assert((type->tp_flags & Py_TPFLAGS_READYING) == 0);
    
        type->tp_flags |= Py_TPFLAGS_READYING;
    
    #ifdef Py_TRACE_REFS
        /* PyType_Ready is the closest thing we have to a choke point
         * for type objects, so is the best place I can think of to try
         * to get type objects into the doubly-linked list of all objects.
         * Still, not all type objects go through PyType_Ready.
         */
        _Py_AddToAllObjects((PyObject *)type, 0);
    #endif
    
        if (type->tp_name == NULL) {
            PyErr_Format(PyExc_SystemError,
                         "Type does not define the tp_name field.");
            goto error;
        }
    
        /* Initialize tp_base (defaults to BaseObject unless that's us) */
        base = type->tp_base;
        if (base == NULL && type != &PyBaseObject_Type) {
            base = type->tp_base = &PyBaseObject_Type;
            Py_INCREF(base);
        }
    
        /* Now the only way base can still be NULL is if type is
         * &PyBaseObject_Type.
         */
    
        /* Initialize the base class */
        if (base != NULL && base->tp_dict == NULL) {
            if (PyType_Ready(base) < 0)
                goto error;
        }
    
        /* Initialize ob_type if NULL.      This means extensions that want to be
           compilable separately on Windows can call PyType_Ready() instead of
           initializing the ob_type field of their type objects. */
        /* The test for base != NULL is really unnecessary, since base is only
           NULL when type is &PyBaseObject_Type, and we know its ob_type is
           not NULL (it's initialized to &PyType_Type).      But coverity doesn't
           know that. */
        if (Py_TYPE(type) == NULL && base != NULL)
            Py_TYPE(type) = Py_TYPE(base);
    ... ...
    }
    
    • 在这里首先尝试获得typetp_base中指定的基类:
    • 如果指定了tp_base,则使用指定的基类。
    • 如果没有指定tp_base,则为其指定一个默认基类: PyBaseObject_Type,也就是<class 'object'>
    • 在获得基类后,则需要判断基类是否已经被初始化,如果没有,则先对基类进行初始化
    • 最后,将设置type信息。
    #define Py_TYPE(ob)             (((PyObject*)(ob))->ob_type)
    
    • 这里的ob_type就是metaclass
    • 一些内置class对象的基类信息如下:
    class对象 基类信息
    PyType_Type NULL
    PyInt_Type NULL
    PyBool_Type &PyInt_Type
    2. 处理基类列表
    • 由于Python支持多重继承,所以每一个Python的class对象都会有一个基类列表,接下来PyType_Ready开始处理基类列表:
    Objects\typeobject.c
    
    int
    PyType_Ready(PyTypeObject *type)
    {
        PyObject *dict, *bases;
        PyTypeObject *base;
        Py_ssize_t i, n;
    
        ... ...
        /* Initialize tp_bases */
        bases = type->tp_bases;
        if (bases == NULL) {
            if (base == NULL)
                bases = PyTuple_New(0);
            else
                bases = PyTuple_Pack(1, base);
            if (bases == NULL)
                goto error;
            type->tp_bases = bases;
        }
    
        ... ...
    }
    
    • 如果bases为空,则将其设置为一个空的PyTuple对象。
    • 如果base不为空,则将其压入bases中。
    3. 填充tp_dict
    • 填充tp_dict是一个复杂的过程:
    Objects\typeobject.c
    
    int
    PyType_Ready(PyTypeObject *type)
    {
        PyObject *dict, *bases;
        PyTypeObject *base;
        Py_ssize_t i, n;
    
       ... ...
    
        /* Initialize tp_dict */
        dict = type->tp_dict;
        if (dict == NULL) {
            dict = PyDict_New();
            if (dict == NULL)
                goto error;
            type->tp_dict = dict;
        }
    
        /* Add type-specific descriptors to tp_dict */
        if (add_operators(type) < 0)
            goto error;
        if (type->tp_methods != NULL) {
            if (add_methods(type, type->tp_methods) < 0)
                goto error;
        }
        if (type->tp_members != NULL) {
            if (add_members(type, type->tp_members) < 0)
                goto error;
        }
        if (type->tp_getset != NULL) {
            if (add_getset(type, type->tp_getset) < 0)
                goto error;
        }
    
       ... ...
    }
    
    • 在这个阶段完成了将__add__&nb_add加入到tp_dict的过程。
    3.1 slot与操作排序
    • 在Python内部,slot可以视为表示PyTypeObject中定义的操作,一个操作对应一个slot
    • slot不仅仅包含一个函数指针,还包含一些其它信息。
    • slot是通过slotdef结构体来实现的,它是一个全局数组。
    Objects\typeobject.c
    
    /*
    Table mapping __foo__ names to tp_foo offsets and slot_tp_foo wrapper functions.
    
    The table is ordered by offsets relative to the 'PyHeapTypeObject' structure,
    which incorporates the additional structures used for numbers, sequences and
    mappings.  Note that multiple names may map to the same slot (e.g. __eq__,
    __ne__ etc. all map to tp_richcompare) and one name may map to multiple slots
    (e.g. __str__ affects tp_str as well as tp_repr). The table is terminated with
    an all-zero entry.  (This table is further initialized in init_slotdefs().)
    */
    
    typedef struct wrapperbase slotdef;
    
    Include\descrobject.h
    
    struct wrapperbase {
        const char *name;
        int offset;
        void *function;
        wrapperfunc wrapper;
        const char *doc;
        int flags;
        PyObject *name_strobj;
    };
    
    • 在一个slot中,存储着与PyTypeObject中一种操作相对应的各种信息。
    • Python中提供了多个宏来定义一个slot,其中最近本的是TPSLOTFLSLOTETSLOT
    Objects\typeobject.c
    
    #define TPSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
        {NAME, offsetof(PyTypeObject, SLOT), (void *)(FUNCTION), WRAPPER, \
         PyDoc_STR(DOC)}
    #define FLSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC, FLAGS) \
        {NAME, offsetof(PyTypeObject, SLOT), (void *)(FUNCTION), WRAPPER, \
         PyDoc_STR(DOC), FLAGS}
    #define ETSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
        {NAME, offsetof(PyHeapTypeObject, SLOT), (void *)(FUNCTION), WRAPPER, \
         PyDoc_STR(DOC)}
    
    • TPSLOT计算的是操作对应的函数指针在PyTypeObject中的偏移offset
    • ETSLOT计算的是函数指针在PyHeapTypeObject中的偏移量offset
    • FLSLOTTPSLOT的区别在于增加了FLAGS参数。
    • 观察PyHeapTypeObject:
    Include\object.h
    
    typedef struct _heaptypeobject {
        /* Note: there's a dependency on the order of these members
           in slotptr() in typeobject.c . */
        PyTypeObject ht_type;
        PyAsyncMethods as_async;
        PyNumberMethods as_number;
        PyMappingMethods as_mapping;
        PySequenceMethods as_sequence; /* as_sequence comes after as_mapping,
                                          so that the mapping wins when both
                                          the mapping and the sequence define
                                          a given operator (e.g. __getitem__).
                                          see add_operators() in typeobject.c . */
        PyBufferProcs as_buffer;
        PyObject *ht_name, *ht_slots, *ht_qualname;
        struct _dictkeysobject *ht_cached_keys;
        /* here are optional user slots, followed by the members. */
    } PyHeapTypeObject;
    
    • PyHeapTypeObject中的第一个域就是PyTypeObject,所以TPSLOTFLSLOT计算出的偏移量实际上也就是相对于PyHeapTypeObject的偏移量offset
    • 实际上,Python预先定义了slot的集合——slotdefs:
    Objects\typeobject.c
    
    static slotdef slotdefs[] = {
        ... ...
        BINSLOT("__matmul__", nb_matrix_multiply, slot_nb_matrix_multiply,
                "@"),
        RBINSLOT("__rmatmul__", nb_matrix_multiply, slot_nb_matrix_multiply,
                 "@"),
        IBSLOT("__imatmul__", nb_inplace_matrix_multiply, slot_nb_inplace_matrix_multiply,
               wrap_binaryfunc, "@="),
        MPSLOT("__len__", mp_length, slot_mp_length, wrap_lenfunc,
               "__len__($self, /)\n--\n\nReturn len(self)."),
        ... ...
    };
    
    • 其中BINSLOTMPSLOT等这些宏实际上都是对ETSLOT的简单包装:
    Objects\typeobject.c
    
    ... ...
    #define AMSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
        ETSLOT(NAME, as_async.SLOT, FUNCTION, WRAPPER, DOC)
    #define SQSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
        ETSLOT(NAME, as_sequence.SLOT, FUNCTION, WRAPPER, DOC)
    #define MPSLOT(NAME, SLOT, FUNCTION, WRAPPER, DOC) \
        ETSLOT(NAME, as_mapping.SLOT, FUNCTION, WRAPPER, DOC)
    ... ... 
    
    • slotdefs中可以发现,操作名和操作并不是一一对应的,对于同操作名对应不同操作的情况,在填充tp_dict时可能会出现问题。
    • 为此,需要利用slot中的offset信息对slot进行排序,而这个排序的过程是在init_slotdefs中完成的:
    Objects\typeobject.c
    
    static int slotdefs_initialized = 0;
    /* Initialize the slotdefs table by adding interned string objects for the
       names. */
    static void
    init_slotdefs(void)
    {
        slotdef *p;
    
        if (slotdefs_initialized)
            return;
        for (p = slotdefs; p->name; p++) {
            /* Slots must be ordered by their offset in the PyHeapTypeObject. */
            assert(!p[1].name || p->offset <= p[1].offset);
            p->name_strobj = PyUnicode_InternFromString(p->name);
            if (!p->name_strobj || !PyUnicode_CHECK_INTERNED(p->name_strobj))
                Py_FatalError("Out of memory interning slotdef names");
        }
        slotdefs_initialized = 1;
    }
    
    3.2 建立联系
    • 排序后的结果仍然存放在slotdefs中,虚拟机将从头到尾遍历slotdefs,基于每一个slot建立一个descriptor,然后在tp_dict中建立从操作名到descriptor的关联,这个过程在add_operators中完成:
    Objects\typeobject.c
    
    static int
    add_operators(PyTypeObject *type)
    {
        PyObject *dict = type->tp_dict;
        slotdef *p;
        PyObject *descr;
        void **ptr;
    
        init_slotdefs();
        for (p = slotdefs; p->name; p++) {
            if (p->wrapper == NULL)
                continue;
            ptr = slotptr(type, p->offset);
            if (!ptr || !*ptr)
                continue;
            if (PyDict_GetItem(dict, p->name_strobj))
                continue;
            if (*ptr == (void *)PyObject_HashNotImplemented) {
                /* Classes may prevent the inheritance of the tp_hash
                   slot by storing PyObject_HashNotImplemented in it. Make it
                   visible as a None value for the __hash__ attribute. */
                if (PyDict_SetItem(dict, p->name_strobj, Py_None) < 0)
                    return -1;
            }
            else {
                descr = PyDescr_NewWrapper(type, p, *ptr);
                if (descr == NULL)
                    return -1;
                if (PyDict_SetItem(dict, p->name_strobj, descr) < 0) {
                    Py_DECREF(descr);
                    return -1;
                }
                Py_DECREF(descr);
            }
        }
        if (type->tp_new != NULL) {
            if (add_tp_new_wrapper(type) < 0)
                return -1;
        }
        return 0;
    }
    
    • add_operators中,首先会调用init_slotdefs对操作进行排序。
    • 然后遍历排序完成后的slotdefs结构体数组,通过slotptr获得每一个slot对应的操作在PyTypeObject中的函数指针。
    • 在这里,虚拟机会检查在tp_dict中操作名是否已经存在,如果已经存在则不会再次建立从操作名到操作的关联。
    • 接着创建descriptor,并在tp_dict中建立从操作名(slotdef.name_strobj)到操作(descriptor)的关联。
    • 由于slot中存放的offset是相对于PyHeapTypeObject的偏移,而操作的真实函数指针则在PyTypeObject中指定,而且PyTypeObjectPyHeapTypeObject是不同构的,所以需要slotptr函数将slotslot对应操作的真实函数指针进行转换:
    Objects\typeobject.c
    
    static void **
    slotptr(PyTypeObject *type, int ioffset)
    {
        char *ptr;
        long offset = ioffset;
    
        /* Note: this depends on the order of the members of PyHeapTypeObject! */
        assert(offset >= 0);
        assert((size_t)offset < offsetof(PyHeapTypeObject, as_buffer));
        if ((size_t)offset >= offsetof(PyHeapTypeObject, as_sequence)) {
            ptr = (char *)type->tp_as_sequence;
            offset -= offsetof(PyHeapTypeObject, as_sequence);
        }
        else if ((size_t)offset >= offsetof(PyHeapTypeObject, as_mapping)) {
            ptr = (char *)type->tp_as_mapping;
            offset -= offsetof(PyHeapTypeObject, as_mapping);
        }
        else if ((size_t)offset >= offsetof(PyHeapTypeObject, as_number)) {
            ptr = (char *)type->tp_as_number;
            offset -= offsetof(PyHeapTypeObject, as_number);
        }
        else if ((size_t)offset >= offsetof(PyHeapTypeObject, as_async)) {
            ptr = (char *)type->tp_as_async;
            offset -= offsetof(PyHeapTypeObject, as_async);
        }
        else {
            ptr = (char *)type;
        }
        if (ptr != NULL)
            ptr += offset;
        return (void **)ptr;
    }
    
    • 判断从PyHeapTypeObject中排在后面的PySequenceMethods开始。
    • add_operators完成后的PyList_Type如下:
    • PyList_Type.tp_as_mapping中延伸出去的部分是在编译时就已经确定好了的。
    • 而从tp_dict中延伸出的的部分是在Python运行环境初始化时才建立的。
    • PyType_Ready在通过add_operators添加PyTypeObject对象中的一些operator后,还会通过add_methodsadd_membersadd_getset添加在PyTypeObject中定义的tp_methodstp_memberstp_getset函数集:
    Objects\typeobject.c
    
    int
    PyType_Ready(PyTypeObject *type)
    {
        PyObject *dict, *bases;
        PyTypeObject *base;
        Py_ssize_t i, n;
        ... ...
        /* Add type-specific descriptors to tp_dict */
        if (add_operators(type) < 0)
            goto error;
        if (type->tp_methods != NULL) {
            if (add_methods(type, type->tp_methods) < 0)
                goto error;
        }
        if (type->tp_members != NULL) {
            if (add_members(type, type->tp_members) < 0)
                goto error;
        }
        if (type->tp_getset != NULL) {
            if (add_getset(type, type->tp_getset) < 0)
                goto error;
        }
    
       ... ...
    }
    
    • 这些add过程与add_operators类似,不过最后添加到tp_dict中的descriptor不再是PyWrapperDescrObject,而分别是PyMethodDescrObjectPyMemberDescrObjectPyGetSetDescrObject
    3.2.1 覆盖list特殊操作的类
    demo.py
    
    >>>class A(list):
    >>>    def __repr__(self):
    >>>        return "Hello!"
    
    >>>if __name__ == '__main__':
    >>>    print(f"{A()}")
    Hello!
    
    • 当调用Python魔法函数__repr__时,最终会调用tp_repr
    • 如果按照正常的布局,demo.py应该调用list_repr函数,但实际调用的是A.repr()。
    • 这是因为在slotdefs中,有一条特殊的slot:
    Objects\typeobject.c
    
    static slotdef slotdefs[] = {
        ... ...
        TPSLOT("__repr__", tp_repr, slot_tp_repr, wrap_unaryfunc,
               "__repr__($self, /)\n--\n\nReturn repr(self)."),
        ... ...
    
    • 虚拟机在初始化类时,会检查类是否的tp_dict中是否存在__repr__,并在定义<class A>时重写了__repr__操作,将其替换成slot_tp_repr
    • 所以当虚拟机执行tp_repr时,实际执行的是slot_tp_repr
    Objects\typeobject.c
    
    static PyObject *
    slot_tp_repr(PyObject *self)
    {
        PyObject *func, *res;
        _Py_IDENTIFIER(__repr__);
        int unbound;
    
        func = lookup_maybe_method(self, &PyId___repr__, &unbound);
        if (func != NULL) {
            res = call_unbound_noarg(unbound, func, self);
            Py_DECREF(func);
            return res;
        }
        PyErr_Clear();
        return PyUnicode_FromFormat("<%s object at %p>",
                                   Py_TYPE(self)->tp_name, self);
    }
    
    • slot_tp_repr中会寻找__repr__属性对应的对象,也就是A的定义中重写的__repr__()函数,它实际上是一个PyFunctionObject对象。
    • 对于A来说,其初始化结束后的内存布局如下:


    相关文章

      网友评论

          本文标题:大师兄的Python源码学习笔记(二十三): 虚拟机中的类机制(

          本文链接:https://www.haomeiwen.com/subject/ghalultx.html