美文网首页Python
大师兄的Python源码学习笔记(五十六): Python的内存

大师兄的Python源码学习笔记(五十六): Python的内存

作者: superkmi | 来源:发表于2022-02-25 08:58 被阅读0次

    大师兄的Python源码学习笔记(五十五): Python的内存管理机制(十)
    大师兄的Python源码学习笔记(五十七): Python的内存管理机制(十二)

    五、Python中的垃圾收集

    3. 标记——清除方法
    3.2 垃圾标记
    • 在成功寻找到root object集合后,就可以开始从root object出发,沿着引用链一个一个标记不能回收的内存。
    • 由于root object集合中的对象是不能回收的,所以他们直接或间接引用的对象也是不能回收的。
    • 在从root object出发之前,首先要将现有的内存链表一分为二:
    • 一条链表中维护root object集合,成为root链表;
    • 另一条链表维护剩下的对象,成为unreachable链表。
    • Python将通过move_unreachable对原始链表进行剖分:
    Modules/gcmodule.c
    
    /* Move the unreachable objects from young to unreachable.  After this,
     * all objects in young have gc_refs = GC_REACHABLE, and all objects in
     * unreachable have gc_refs = GC_TENTATIVELY_UNREACHABLE.  All tracked
     * gc objects not in young or unreachable still have gc_refs = GC_REACHABLE.
     * All objects in young after this are directly or indirectly reachable
     * from outside the original young; and all objects in unreachable are
     * not.
     */
    static void
    move_unreachable(PyGC_Head *young, PyGC_Head *unreachable)
    {
        PyGC_Head *gc = young->gc.gc_next;
    
        /* Invariants:  all objects "to the left" of us in young have gc_refs
         * = GC_REACHABLE, and are indeed reachable (directly or indirectly)
         * from outside the young list as it was at entry.  All other objects
         * from the original young "to the left" of us are in unreachable now,
         * and have gc_refs = GC_TENTATIVELY_UNREACHABLE.  All objects to the
         * left of us in 'young' now have been scanned, and no objects here
         * or to the right have been scanned yet.
         */
    
        while (gc != young) {
            PyGC_Head *next;
    
            if (_PyGCHead_REFS(gc)) {
                /* gc is definitely reachable from outside the
                 * original 'young'.  Mark it as such, and traverse
                 * its pointers to find any other objects that may
                 * be directly reachable from it.  Note that the
                 * call to tp_traverse may append objects to young,
                 * so we have to wait until it returns to determine
                 * the next object to visit.
                 */
                PyObject *op = FROM_GC(gc);
                traverseproc traverse = Py_TYPE(op)->tp_traverse;
                assert(_PyGCHead_REFS(gc) > 0);
                _PyGCHead_SET_REFS(gc, GC_REACHABLE);
                (void) traverse(op,
                                (visitproc)visit_reachable,
                                (void *)young);
                next = gc->gc.gc_next;
                if (PyTuple_CheckExact(op)) {
                    _PyTuple_MaybeUntrack(op);
                }
            }
            else {
                /* This *may* be unreachable.  To make progress,
                 * assume it is.  gc isn't directly reachable from
                 * any object we've already traversed, but may be
                 * reachable from an object we haven't gotten to yet.
                 * visit_reachable will eventually move gc back into
                 * young if that's so, and we'll see it again.
                 */
                next = gc->gc.gc_next;
                gc_list_move(gc, unreachable);
                _PyGCHead_SET_REFS(gc, GC_TENTATIVELY_UNREACHABLE);
            }
            gc = next;
        }
    }
    
    Modules/gcmodule.c
    
    /* A traversal callback for move_unreachable. */
    static int
    visit_reachable(PyObject *op, PyGC_Head *reachable)
    {
        if (PyObject_IS_GC(op)) {
            PyGC_Head *gc = AS_GC(op);
            const Py_ssize_t gc_refs = _PyGCHead_REFS(gc);
    
            if (gc_refs == 0) {
                /* This is in move_unreachable's 'young' list, but
                 * the traversal hasn't yet gotten to it.  All
                 * we need to do is tell move_unreachable that it's
                 * reachable.
                 */
                _PyGCHead_SET_REFS(gc, 1);
            }
            else if (gc_refs == GC_TENTATIVELY_UNREACHABLE) {
                /* This had gc_refs = 0 when move_unreachable got
                 * to it, but turns out it's reachable after all.
                 * Move it back to move_unreachable's 'young' list,
                 * and move_unreachable will eventually get to it
                 * again.
                 */
                gc_list_move(gc, reachable);
                _PyGCHead_SET_REFS(gc, 1);
            }
            /* Else there's nothing to do.
             * If gc_refs > 0, it must be in move_unreachable's 'young'
             * list, and move_unreachable will eventually get to it.
             * If gc_refs == GC_REACHABLE, it's either in some other
             * generation so we don't care about it, or move_unreachable
             * already dealt with it.
             * If gc_refs == GC_UNTRACKED, it must be ignored.
             */
             else {
                assert(gc_refs > 0
                       || gc_refs == GC_REACHABLE
                       || gc_refs == GC_UNTRACKED);
             }
        }
        return 0;
    }
    
    • move_unreachable中,沿着可收集对象链表依次向前,并检查其PyGC_Head.gc.gc_ref值。
    • 可以看到,这里的动作是遍历链表,而不是从root object集合出发遍历引用链。
    • 这将导致一个结果,就是当检查到一个对象gc_refs为0时,并不能立刻断定它就是垃圾对象,因为这个对象之后的对象链表上,也许还会遇到一个root object
    • 因此将这个对象暂时标注为GC_TENTATIVELY_UNREACHABLE,但还是通过gc_list_move将其搬倒了unreachable对象链表中。
    • 当在move_unreachable中遇到一个gc_refs不为0的对象A时,可以判断A是root object或从某个root object能引用到的对象,而A所引用的所有对象也都是不可回收对象。
    • 因此会调用traverse操作,依次对A中所引用的对象进行调用visit_reachable:
    Modules/gcmodule.c
    
    static void
    move_unreachable(PyGC_Head *young, PyGC_Head *unreachable)
    {
        PyGC_Head *gc = young->gc.gc_next;
    
        while (gc != young) {
            PyGC_Head *next;
    
            if (_PyGCHead_REFS(gc)) {
                /* gc is definitely reachable from outside the
                 * original 'young'.  Mark it as such, and traverse
                 * its pointers to find any other objects that may
                 * be directly reachable from it.  Note that the
                 * call to tp_traverse may append objects to young,
                 * so we have to wait until it returns to determine
                 * the next object to visit.
                 */
                PyObject *op = FROM_GC(gc);
                traverseproc traverse = Py_TYPE(op)->tp_traverse;
                assert(_PyGCHead_REFS(gc) > 0);
                _PyGCHead_SET_REFS(gc, GC_REACHABLE);
                (void) traverse(op,
                                (visitproc)visit_reachable,
                                (void *)young);
                next = gc->gc.gc_next;
                if (PyTuple_CheckExact(op)) {
                    _PyTuple_MaybeUntrack(op);
                }
            }
    ... ...
    
    • 如果A所引用的对象之前被标注为GC_TENTATIVELY_UNREACHABLE,但是现在通过A可以访问到它,则以为着它也是一个不可回收对象
    • 因此Python会重新将其从unreachable链表中搬移回原来的链表。
    • 这里的reachable就是move_unreachable中的young,也就是root object链表。
    • 从代码中可以看到,这里Python还会将这个对象的gc_refs设置为1,表示该对象是一个不可回收对象。
    • 同样,在visit_reachable中,A所引用的gc_refs为0的对象的gc_refs,也会被设置为1,这意味着将链表中move_unreachable还没有访问到的对象掐断了移动到unreachable链表的诱因。
    • move_unreachable完成后,最初的一条链表就被分成了两条链表:
    • unreachable链表中就是发现的垃圾对象,也是垃圾回收的目标。
    • 但这些垃圾对象未必都能被安全回收,问题出在一种特殊的container对象,即从类对象实例化得到的实例对象:
    • 当Python定义一个class时,可以为class定义一个特殊方法:__del__,也就是finalizer
    • 当一个包含finalizer的实例被销毁时,首先会调用finalizer,因为它是开发人员提供的在对象销毁时进行某些资源释放的Hook机制。
    • 现在的问题在于,最终在unreachable链表中出现的对象都是只存在循环引用的对象,需要被销毁。
    • 假设在unreachable中有两个对象,对象B在finalizer中调用了对象A的某个操作,这意味着安全的垃圾回收必须保证对象A一定要在对象B之后被回收。
    • 但是Python在回收垃圾时不能保证回收的顺序,所以有可能在A被销毁之后,B在销毁时访问已经不存在的A。
    • 所以Python采用了一种相对保守的方法,将unreachable链表中的拥有finalizerPyInstanceObject对象统统移到一个名为garbagePyListObject对象中。

    相关文章

      网友评论

        本文标题:大师兄的Python源码学习笔记(五十六): Python的内存

        本文链接:https://www.haomeiwen.com/subject/fjsplrtx.html