美文网首页
大师兄的Python源码学习笔记(四十二): Python的多线

大师兄的Python源码学习笔记(四十二): Python的多线

作者: superkmi | 来源:发表于2021-11-19 08:55 被阅读0次

    大师兄的Python源码学习笔记(四十一): Python的多线程机制(三)
    大师兄的Python源码学习笔记(四十三): Python的多线程机制(五)

    四、创建线程

    2. 线程状态保护机制
    • 我们已经知道,在Python中每个线程都会有一个PyThreadState对象与之关联,它保存着对应线程的状态和独有信息。
    Include\pystate.h
    
    typedef struct _ts {
        /* See Python/ceval.c for comments explaining most fields */
    
        struct _ts *prev;
        struct _ts *next;
        PyInterpreterState *interp;
    
        struct _frame *frame;
        int recursion_depth;
        char overflowed; /* The stack has overflowed. Allow 50 more calls
                            to handle the runtime error. */
        char recursion_critical; /* The current calls must not cause
                                    a stack overflow. */
        int stackcheck_counter;
    
        /* 'tracing' keeps track of the execution depth when tracing/profiling.
           This is to prevent the actual trace/profile code from being recorded in
           the trace/profile. */
        int tracing;
        int use_tracing;
    
        Py_tracefunc c_profilefunc;
        Py_tracefunc c_tracefunc;
        PyObject *c_profileobj;
        PyObject *c_traceobj;
    
        /* The exception currently being raised */
        PyObject *curexc_type;
        PyObject *curexc_value;
        PyObject *curexc_traceback;
    
        /* The exception currently being handled, if no coroutines/generators
         * are present. Always last element on the stack referred to be exc_info.
         */
        _PyErr_StackItem exc_state;
    
        /* Pointer to the top of the stack of the exceptions currently
         * being handled */
        _PyErr_StackItem *exc_info;
    
        PyObject *dict;  /* Stores per-thread state */
    
        int gilstate_counter;
    
        PyObject *async_exc; /* Asynchronous exception to raise */
        unsigned long thread_id; /* Thread id where this tstate was created */
    
        int trash_delete_nesting;
        PyObject *trash_delete_later;
    
        /* Called when a thread state is deleted normally, but not when it
         * is destroyed after fork().
         * Pain:  to prevent rare but fatal shutdown errors (issue 18808),
         * Thread.join() must wait for the join'ed thread's tstate to be unlinked
         * from the tstate chain.  That happens at the end of a thread's life,
         * in pystate.c.
         * The obvious way doesn't quite work:  create a lock which the tstate
         * unlinking code releases, and have Thread.join() wait to acquire that
         * lock.  The problem is that we _are_ at the end of the thread's life:
         * if the thread holds the last reference to the lock, decref'ing the
         * lock will delete the lock, and that may trigger arbitrary Python code
         * if there's a weakref, with a callback, to the lock.  But by this time
         * _PyThreadState_Current is already NULL, so only the simplest of C code
         * can be allowed to run (in particular it must not be possible to
         * release the GIL).
         * So instead of holding the lock directly, the tstate holds a weakref to
         * the lock:  that's the value of on_delete_data below.  Decref'ing a
         * weakref is harmless.
         * on_delete points to _threadmodule.c's static release_sentinel() function.
         * After the tstate is unlinked, release_sentinel is called with the
         * weakref-to-lock (on_delete_data) argument, and release_sentinel releases
         * the indirectly held lock.
         */
        void (*on_delete)(void *);
        void *on_delete_data;
    
        int coroutine_origin_tracking_depth;
    
        PyObject *coroutine_wrapper;
        int in_coroutine_wrapper;
    
        PyObject *async_gen_firstiter;
        PyObject *async_gen_finalizer;
    
        PyObject *context;
        uint64_t context_ver;
    
        /* Unique thread state id. */
        uint64_t id;
    
        /* XXX signal handlers should also be here */
    
    } PyThreadState;
    
    • 从结构体代码可以看出,PyThreadState对象中保存着当前线程的PyFrameObject对象及线程id等信息。
    • Python内部有一套机制,用来保证进程始终在自己的上下文环境中运行,所以需要访问PyThreadState中的信息。
    • 再观察结构体代码的头部,可以发现PyThreadState对象是一个链表结构,将所有PyThreadState对象串联起来。
        struct _ts *prev;
        struct _ts *next;
    
    • Python会使用一套TSS(Thread Specific Storage)机制来管理线程信息。
    Include\pythread.h
    
    typedef struct _Py_tss_t Py_tss_t;  /* opaque */
    
    struct _Py_tss_t {
        int _is_initialized;
        NATIVE_TSS_KEY_T _key;
    };
    
    Python\thread.c
    
    Py_tss_t *
    PyThread_tss_alloc(void)
    {
        Py_tss_t *new_key = (Py_tss_t *)PyMem_RawMalloc(sizeof(Py_tss_t));
        if (new_key == NULL) {
            return NULL;
        }
        new_key->_is_initialized = 0;
        return new_key;
    }
    
    • 同时还会创建一个独立的锁和TSSkey密钥:
    Python\pystate.c
    
    static _PyInitError
    _PyRuntimeState_Init_impl(_PyRuntimeState *runtime)
    {
        memset(runtime, 0, sizeof(*runtime));
    
        _PyGC_Initialize(&runtime->gc);
        _PyEval_Initialize(&runtime->ceval);
    
        runtime->gilstate.check_enabled = 1;
    
        /* A TSS key must be initialized with Py_tss_NEEDS_INIT
           in accordance with the specification. */
        Py_tss_t initial = Py_tss_NEEDS_INIT;
        runtime->gilstate.autoTSSkey = initial;
    
        runtime->interpreters.mutex = PyThread_allocate_lock();
        if (runtime->interpreters.mutex == NULL) {
            return _Py_INIT_ERR("Can't initialize threads for interpreter");
        }
        runtime->interpreters.next_id = -1;
    
        return _Py_INIT_OK();
    }
    
    • TSS有一套API用于处理线程信息。
    3. 从GIL到字节码解释器
    • 回顾线程创建的过程:
    Python\pystate.c
    
    static PyThreadState *
    new_threadstate(PyInterpreterState *interp, int init)
    {
        PyThreadState *tstate = (PyThreadState *)PyMem_RawMalloc(sizeof(PyThreadState));
    
        if (_PyThreadState_GetFrame == NULL)
            _PyThreadState_GetFrame = threadstate_getframe;
    
        if (tstate != NULL) {
           ... ...
    
            if (init)
                _PyThreadState_Init(tstate);
    
            ... ...
        }
    
        return tstate;
    }
    
    • 观察_PyThreadState_Init函数:
    Python\pystate.c
    
    void
    _PyThreadState_Init(PyThreadState *tstate)
    {
        _PyGILState_NoteThreadState(tstate);
    }
    
    Python\pystate.c
    
    static void
    _PyGILState_NoteThreadState(PyThreadState* tstate)
    {
        /* If autoTSSkey isn't initialized, this must be the very first
           threadstate created in Py_Initialize().  Don't do anything for now
           (we'll be back here when _PyGILState_Init is called). */
        if (!_PyRuntime.gilstate.autoInterpreterState)
            return;
    
        /* Stick the thread state for this thread in thread specific storage.
    
           The only situation where you can legitimately have more than one
           thread state for an OS level thread is when there are multiple
           interpreters.
    
           You shouldn't really be using the PyGILState_ APIs anyway (see issues
           #10915 and #15751).
    
           The first thread state created for that given OS level thread will
           "win", which seems reasonable behaviour.
        */
        if (PyThread_tss_get(&_PyRuntime.gilstate.autoTSSkey) == NULL) {
            if ((PyThread_tss_set(&_PyRuntime.gilstate.autoTSSkey, (void *)tstate)
                 ) != 0)
            {
                Py_FatalError("Couldn't create autoTSSkey mapping");
            }
        }
    
        /* PyGILState_Release must not try to delete this thread state. */
        tstate->gilstate_counter = 1;
    }
    
    • _PyGILState_NoteThreadState配置了线程对象状态密钥。
    • 这里要注意的是当前活动的线程不一定获得了GIL
    • 由于主线程子线程都对应操作系统的原生线程,而操作系统级别的线程调度和python级别的线程调度不同,所以操作系统系统是可能在主线程子线程之间切换的。
    • 但是当所有的线程都完成了初始化动作之后,操作系统的线程调度和python的线程调度才会统一
    • 那时python的线程调度会迫使当前活动线程释放GIL,而这一操作会触发操作系统内核的用于管理线程调度的对象,进而触发操作系统对线程的调度。
    • 回到上一章,子线程开始了与主线程对GIL的竞争:
    Modules\_threadmodule.c
    
    static void
    t_bootstrap(void *boot_raw)
    {
        struct bootstate *boot = (struct bootstate *) boot_raw;
        PyThreadState *tstate;
        PyObject *res;
    
        tstate = boot->tstate;
        tstate->thread_id = PyThread_get_thread_ident();
        _PyThreadState_Init(tstate);
        PyEval_AcquireThread(tstate);
        tstate->interp->num_threads++;
        res = PyObject_Call(boot->func, boot->args, boot->keyw);
        ... ...
    }
    
    • 主线程和子线程通过PyEval_AcquireThread争夺GIL:
    Python\ceval.c
    
    void
    PyEval_AcquireThread(PyThreadState *tstate)
    {
        if (tstate == NULL)
            Py_FatalError("PyEval_AcquireThread: NULL new thread state");
        /* Check someone has called PyEval_InitThreads() to create the lock */
        assert(gil_created());
        take_gil(tstate);
        if (PyThreadState_Swap(tstate) != NULL)
            Py_FatalError(
                "PyEval_AcquireThread: non-NULL old thread state");
    }
    
    • 这里有一个关键方法PyThreadState_Swap之前没有提到:
    Python\pystate.c
    
    PyThreadState *
    PyThreadState_Swap(PyThreadState *newts)
    {
        PyThreadState *oldts = GET_TSTATE();
    
        SET_TSTATE(newts);
        /* It should not be possible for more than one thread state
           to be used for a thread.  Check this the best we can in debug
           builds.
        */
    #if defined(Py_DEBUG)
        if (newts) {
            /* This can be called from PyEval_RestoreThread(). Similar
               to it, we need to ensure errno doesn't change.
            */
            int err = errno;
            PyThreadState *check = PyGILState_GetThisThreadState();
            if (check && check->interp == newts->interp && check != newts)
                Py_FatalError("Invalid thread state for this thread");
            errno = err;
        }
    #endif
        return oldts;
    }
    
    • 当子线程被Python的线程调度机制唤醒后,首先就要通过PyThreadState_Swap将Python维护的当前线程状态对象设置为其自身的状态对象。
    • 之后子线程继续完成初始化,并最终进入解释器,被Python线程调度机制控制。
    • 这里需要再次强调一下,thread_PyThread_start_new_thread是从主线程中执行,而从t_bootstrap开始,则是在子线程中执行的。

    相关文章

      网友评论

          本文标题:大师兄的Python源码学习笔记(四十二): Python的多线

          本文链接:https://www.haomeiwen.com/subject/esfizltx.html