美文网首页
大师兄的Python源码学习笔记(十一): Python的虚拟机

大师兄的Python源码学习笔记(十一): Python的虚拟机

作者: superkmi | 来源:发表于2021-04-11 11:11 被阅读0次

    大师兄的Python源码学习笔记(十): Python的编译过程
    大师兄的Python源码学习笔记(十二): Python虚拟机中的一般表达式(一)

    一、关于字节码虚拟机

    • 字节码虚拟机是Python的核心。
    • 在源码被编译为字节码指令序列后,字节码虚拟机将接手整个工作。
    • 字节码虚拟机会从编译得到的PyCodeObject对象中依次读入并执行每一条字节码指令

    二、可执行文件的运作方式

    1. 可执行文件在x86机器上的大致运行原理
    void a(int n)
    {
        printf("%d\n",n);
    }
    
    void b()
    {
        a(1);
    }
    
    int main()
    {
        b();
    }
    
    • 当程序进入a()时,调用者的栈b()的栈帧,当前帧a()的栈帧。
    • 函数所有的局部变量操作都在自己的栈帧中完成,函数之间的调用则通过创建新的栈帧完成。
    • 运行时栈是从地址空间的高地址向低地址延伸的,当b()调用a()时,系统就会在地址空间中,b()的栈帧之后创建a()的栈帧,并在a()中保存b()栈指针esp帧指针ebp
    • a()执行完成后,系统会把espebp的值恢复为创建a()的栈帧之前的值,这样程序流程又回到b()中,程序工作的空间又回到了b()的栈帧中。
    2. 关于执行环境
    • 在Python中,PyCodeObject中储存着所有字节码指令和静态信息,但不包含程序运行的动态信息,即执行环境
    >>>i = 1
    >
    >>>def a():
    >>>    i = 2
    >>>    print(i)
    >
    >>>a()
    >>>print(i)
    2
    1
    
    • 上面的代码之所以i的值不同,是因为它们在不同的命名空间,而命名空间就是执行环境的一部分。
    • 在开始执行.py程序时,Python会建立一个执行环境A,当函数调用时,会重新创建一个新的执行环境B,B实际就是一个新的栈帧。
    • 所以,在Python真正执行的时候,对应的不是一个PyCodeObject,而是一个执行环境,即PyFrameObject

    三、关于PyFrameObject

    1. PyFrameObject源码
    Include\frameobject.h
    
    typedef struct _frame {
        PyObject_VAR_HEAD
        struct _frame *f_back;      /* previous frame, or NULL */
        PyCodeObject *f_code;       /* code segment */
        PyObject *f_builtins;       /* builtin symbol table (PyDictObject) */
        PyObject *f_globals;        /* global symbol table (PyDictObject) */
        PyObject *f_locals;         /* local symbol table (any mapping) */
        PyObject **f_valuestack;    /* points after the last local */
        /* Next free slot in f_valuestack.  Frame creation sets to f_valuestack.
           Frame evaluation usually NULLs it, but a frame that yields sets it
           to the current stack top. */
        PyObject **f_stacktop;
        PyObject *f_trace;          /* Trace function */
        char f_trace_lines;         /* Emit per-line trace events? */
        char f_trace_opcodes;       /* Emit per-opcode trace events? */
    
        /* Borrowed reference to a generator, or NULL */
        PyObject *f_gen;
    
        int f_lasti;                /* Last instruction if called */
        /* Call PyFrame_GetLineNumber() instead of reading this field
           directly.  As of 2.3 f_lineno is only valid when tracing is
           active (i.e. when f_trace is set).  At other times we use
           PyCode_Addr2Line to calculate the line from the current
           bytecode index. */
        int f_lineno;               /* Current line number */
        int f_iblock;               /* index in f_blockstack */
        char f_executing;           /* whether the frame is still executing */
        PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
        PyObject *f_localsplus[1];  /* locals+stack, dynamically sized */
    } PyFrameObject;
    
    参数 含义
    *f_back 执行环境链的前一个frame
    *f_code PyCodeObject对象
    *f_builtins 内置命名空间
    *f_globals global命名空间
    *f_locals local命名空间
    **f_valuestack 栈底
    **f_stacktop 栈顶
    f_lasti 上一条字节码指令在f_code中的偏移位置
    f_lineno 当前字节码对应的源代码行
    f_executing 正在运行的帧位置
    *f_localsplus[1] 动态所需空间
    • 包含一个PyObject_VAR_HEAD,表示这是一个变长对象。

    • f_back参数可以看出,许多PyFrameObject形成了一个链表结构,这是模拟栈帧关系中的esp和ebp指针。

    • f_code中存放了一个待执行的PyCodeObject对象。

    • *f_builtins*f_globals*f_locals分别维护着builtin、global和local的键值对对应关系。

    • 类型对象如下:

    Objects\frameobject.c
    
    PyTypeObject PyFrame_Type = {
        PyVarObject_HEAD_INIT(&PyType_Type, 0)
        "frame",
        sizeof(PyFrameObject),
        sizeof(PyObject *),
        (destructor)frame_dealloc,                  /* tp_dealloc */
        0,                                          /* tp_print */
        0,                                          /* tp_getattr */
        0,                                          /* tp_setattr */
        0,                                          /* tp_reserved */
        (reprfunc)frame_repr,                       /* tp_repr */
        0,                                          /* tp_as_number */
        0,                                          /* tp_as_sequence */
        0,                                          /* tp_as_mapping */
        0,                                          /* tp_hash */
        0,                                          /* tp_call */
        0,                                          /* tp_str */
        PyObject_GenericGetAttr,                    /* tp_getattro */
        PyObject_GenericSetAttr,                    /* tp_setattro */
        0,                                          /* tp_as_buffer */
        Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC,/* tp_flags */
        0,                                          /* tp_doc */
        (traverseproc)frame_traverse,               /* tp_traverse */
        (inquiry)frame_tp_clear,                    /* tp_clear */
        0,                                          /* tp_richcompare */
        0,                                          /* tp_weaklistoffset */
        0,                                          /* tp_iter */
        0,                                          /* tp_iternext */
        frame_methods,                              /* tp_methods */
        frame_memberlist,                           /* tp_members */
        frame_getsetlist,                           /* tp_getset */
        0,                                          /* tp_base */
        0,                                          /* tp_dict */
    };
    
    2. PyFrameObject动态内存空间
    • PyFrameObject源码中,我们看到*f_localsplus[1]维护动态所需空间。
    • 从创建PyFrameObject的过程可见,这段内存不只给栈使用。
    Objects\frameobject.c
    
    PyFrameObject*
    PyFrame_New(PyThreadState *tstate, PyCodeObject *code,
                PyObject *globals, PyObject *locals)
    {
        PyFrameObject *f = _PyFrame_New_NoTrack(tstate, code, globals, locals);
        if (f)
            _PyObject_GC_TRACK(f);
        return f;
    }
    
    PyFrameObject* _Py_HOT_FUNCTION
    _PyFrame_New_NoTrack(PyThreadState *tstate, PyCodeObject *code,
                         PyObject *globals, PyObject *locals)
    {
        PyFrameObject *back = tstate->frame;
        PyFrameObject *f;
    ... ...
            Py_ssize_t extras, ncells, nfrees;
            ncells = PyTuple_GET_SIZE(code->co_cellvars);
            nfrees = PyTuple_GET_SIZE(code->co_freevars);
            extras = code->co_stacksize + code->co_nlocals + ncells +
                nfrees;
            if (free_list == NULL) {
                f = PyObject_GC_NewVar(PyFrameObject, &PyFrame_Type,
                extras);
                if (f == NULL) {
                    Py_DECREF(builtins);
                    return NULL;
                }
            }
    ... ...
        f->f_stacktop = f->f_valuestack;
    ... ...
    }
    
    • 可以看到由code->co_stacksizecode->co_nlocalsncellsnfrees四部分构成了维护的动态内存区,与闭包实现相关,其大小由extra确定,而另一部分才是给运行栈使用的。
    • 所以PyFrameObject对象的栈底由f_valuestack维护,栈顶由f_stacktop维护。
    3. 在Python中访问PyFrameObject对象
    • 在Python中可以使用sys._getframe()函数获得当前调用函数的函数信息。
    >>>import sys
    >
    >>>def sample():
    >>>    f= sys._getframe()
    >>>    for a in dir(f):
    >>>        print(a,':',eval(f"f.{a}"))
    >
    >>>if __name__ == '__main__':
    >>>    sample()
    __class__ : <class 'frame'>
    __delattr__ : <method-wrapper '__delattr__' of frame object at 0x000002B851031278>
    __dir__ : <built-in method __dir__ of frame object at 0x000002B851031278>
    __doc__ : None
    __eq__ : <method-wrapper '__eq__' of frame object at 0x000002B851031278>
    __format__ : <built-in method __format__ of frame object at 0x000002B851031278>
    __ge__ : <method-wrapper '__ge__' of frame object at 0x000002B851031278>
    __getattribute__ : <method-wrapper '__getattribute__' of frame object at 0x000002B851031278>
    __gt__ : <method-wrapper '__gt__' of frame object at 0x000002B851031278>
    __hash__ : <method-wrapper '__hash__' of frame object at 0x000002B851031278>
    __init__ : <method-wrapper '__init__' of frame object at 0x000002B851031278>
    __init_subclass__ : <built-in method __init_subclass__ of type object at 0x00007FF858A15D90>
    __le__ : <method-wrapper '__le__' of frame object at 0x000002B851031278>
    __lt__ : <method-wrapper '__lt__' of frame object at 0x000002B851031278>
    __ne__ : <method-wrapper '__ne__' of frame object at 0x000002B851031278>
    __new__ : <built-in method __new__ of type object at 0x00007FF858A19EC0>
    __reduce__ : <built-in method __reduce__ of frame object at 0x000002B851031278>
    __reduce_ex__ : <built-in method __reduce_ex__ of frame object at 0x000002B851031278>
    __repr__ : <method-wrapper '__repr__' of frame object at 0x000002B851031278>
    __setattr__ : <method-wrapper '__setattr__' of frame object at 0x000002B851031278>
    __sizeof__ : <built-in method __sizeof__ of frame object at 0x000002B851031278>
    __str__ : <method-wrapper '__str__' of frame object at 0x000002B851031278>
    __subclasshook__ : <built-in method __subclasshook__ of type object at 0x00007FF858A15D90>
    clear : <built-in method clear of frame object at 0x000002B851031278>
    f_back : <frame at 0x000002B852D4E9F8, file 'temp.py', line 21, code <module>>
    f_builtins : {'__name__': 'builtins', '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.", '__package__': '', '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>), '__build_class__': <built-in function __build_class__>, '__import__': <built-in function __import__>, 'abs': <built-in function abs>, 'all': <built-in function all>, 'any': <built-in function any>, 'ascii': <built-in function ascii>, 'bin': <built-in function bin>, 'breakpoint': <built-in function breakpoint>, 'callable': <built-in function callable>, 'chr': <built-in function chr>, 'compile': <built-in function compile>, 'delattr': <built-in function delattr>, 'dir': <built-in function dir>, 'divmod': <built-in function divmod>, 'eval': <built-in function eval>, 'exec': <built-in function exec>, 'format': <built-in function format>, 'getattr': <built-in function getattr>, 'globals': <built-in function globals>, 'hasattr': <built-in function hasattr>, 'hash': <built-in function hash>, 'hex': <built-in function hex>, 'id': <built-in function id>, 'input': <built-in function input>, 'isinstance': <built-in function isinstance>, 'issubclass': <built-in function issubclass>, 'iter': <built-in function iter>, 'len': <built-in function len>, 'locals': <built-in function locals>, 'max': <built-in function max>, 'min': <built-in function min>, 'next': <built-in function next>, 'oct': <built-in function oct>, 'ord': <built-in function ord>, 'pow': <built-in function pow>, 'print': <built-in function print>, 'repr': <built-in function repr>, 'round': <built-in function round>, 'setattr': <built-in function setattr>, 'sorted': <built-in function sorted>, 'sum': <built-in function sum>, 'vars': <built-in function vars>, 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': <class 'bool'>, 'memoryview': <class 'memoryview'>, 'bytearray': <class 'bytearray'>, 'bytes': <class 'bytes'>, 'classmethod': <class 'classmethod'>, 'complex': <class 'complex'>, 'dict': <class 'dict'>, 'enumerate': <class 'enumerate'>, 'filter': <class 'filter'>, 'float': <class 'float'>, 'frozenset': <class 'frozenset'>, 'property': <class 'property'>, 'int': <class 'int'>, 'list': <class 'list'>, 'map': <class 'map'>, 'object': <class 'object'>, 'range': <class 'range'>, 'reversed': <class 'reversed'>, 'set': <class 'set'>, 'slice': <class 'slice'>, 'staticmethod': <class 'staticmethod'>, 'str': <class 'str'>, 'super': <class 'super'>, 'tuple': <class 'tuple'>, 'type': <class 'type'>, 'zip': <class 'zip'>, '__debug__': True, 'BaseException': <class 'BaseException'>, 'Exception': <class 'Exception'>, 'TypeError': <class 'TypeError'>, 'StopAsyncIteration': <class 'StopAsyncIteration'>, 'StopIteration': <class 'StopIteration'>, 'GeneratorExit': <class 'GeneratorExit'>, 'SystemExit': <class 'SystemExit'>, 'KeyboardInterrupt': <class 'KeyboardInterrupt'>, 'ImportError': <class 'ImportError'>, 'ModuleNotFoundError': <class 'ModuleNotFoundError'>, 'OSError': <class 'OSError'>, 'EnvironmentError': <class 'OSError'>, 'IOError': <class 'OSError'>, 'WindowsError': <class 'OSError'>, 'EOFError': <class 'EOFError'>, 'RuntimeError': <class 'RuntimeError'>, 'RecursionError': <class 'RecursionError'>, 'NotImplementedError': <class 'NotImplementedError'>, 'NameError': <class 'NameError'>, 'UnboundLocalError': <class 'UnboundLocalError'>, 'AttributeError': <class 'AttributeError'>, 'SyntaxError': <class 'SyntaxError'>, 'IndentationError': <class 'IndentationError'>, 'TabError': <class 'TabError'>, 'LookupError': <class 'LookupError'>, 'IndexError': <class 'IndexError'>, 'KeyError': <class 'KeyError'>, 'ValueError': <class 'ValueError'>, 'UnicodeError': <class 'UnicodeError'>, 'UnicodeEncodeError': <class 'UnicodeEncodeError'>, 'UnicodeDecodeError': <class 'UnicodeDecodeError'>, 'UnicodeTranslateError': <class 'UnicodeTranslateError'>, 'AssertionError': <class 'AssertionError'>, 'ArithmeticError': <class 'ArithmeticError'>, 'FloatingPointError': <class 'FloatingPointError'>, 'OverflowError': <class 'OverflowError'>, 'ZeroDivisionError': <class 'ZeroDivisionError'>, 'SystemError': <class 'SystemError'>, 'ReferenceError': <class 'ReferenceError'>, 'MemoryError': <class 'MemoryError'>, 'BufferError': <class 'BufferError'>, 'Warning': <class 'Warning'>, 'UserWarning': <class 'UserWarning'>, 'DeprecationWarning': <class 'DeprecationWarning'>, 'PendingDeprecationWarning': <class 'PendingDeprecationWarning'>, 'SyntaxWarning': <class 'SyntaxWarning'>, 'RuntimeWarning': <class 'RuntimeWarning'>, 'FutureWarning': <class 'FutureWarning'>, 'ImportWarning': <class 'ImportWarning'>, 'UnicodeWarning': <class 'UnicodeWarning'>, 'BytesWarning': <class 'BytesWarning'>, 'ResourceWarning': <class 'ResourceWarning'>, 'ConnectionError': <class 'ConnectionError'>, 'BlockingIOError': <class 'BlockingIOError'>, 'BrokenPipeError': <class 'BrokenPipeError'>, 'ChildProcessError': <class 'ChildProcessError'>, 'ConnectionAbortedError': <class 'ConnectionAbortedError'>, 'ConnectionRefusedError': <class 'ConnectionRefusedError'>, 'ConnectionResetError': <class 'ConnectionResetError'>, 'FileExistsError': <class 'FileExistsError'>, 'FileNotFoundError': <class 'FileNotFoundError'>, 'IsADirectoryError': <class 'IsADirectoryError'>, 'NotADirectoryError': <class 'NotADirectoryError'>, 'InterruptedError': <class 'InterruptedError'>, 'PermissionError': <class 'PermissionError'>, 'ProcessLookupError': <class 'ProcessLookupError'>, 'TimeoutError': <class 'TimeoutError'>, 'open': <built-in function open>, 'quit': Use quit() or Ctrl-Z plus Return to exit, 'exit': Use exit() or Ctrl-Z plus Return to exit, 'copyright': Copyright (c) 2001-2018 Python Software Foundation.
    All Rights Reserved.
    
    Copyright (c) 2000 BeOpen.com.
    All Rights Reserved.
    
    Copyright (c) 1995-2001 Corporation for National Research Initiatives.
    All Rights Reserved.
    
    Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
    All Rights Reserved., 'credits':     Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
        for supporting Python development.  See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object.}
    f_code : <code object sample at 0x000002B852E91660, file "xx/temp.py", line 15>
    f_globals : {'__name__': '__main__', '__doc__': '\n@File    :   temp.py    \n@Contact :   xxx@xxx.com\n\n@Modify Time      @Author           @Version    @Desciption\n------------      ---------------   --------    -----------\n2021/x/x 16:54   大师兄(superkmi)      1.0         None\n', '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x000002B852DD9518>, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': 'xx/temp.py', '__cached__': None, 'sys': <module 'sys' (built-in)>, 'sample': <function sample at 0x000002B852D8C268>}
    f_lasti : 38
    f_lineno : 18
    f_locals : {'f': <frame at 0x000002B851031278, file 'xx/temp.py', line 18, code sample>, 'a': 'f_locals'}
    f_trace : None
    f_trace_lines : True
    f_trace_opcodes : False
    

    四、虚拟机的运行框架

    • Python虚拟机由PyEval_EvalFram函数为入口,调用了一个巨大的函数EvalFrameDefault
    ceval.c
    
    /* Interpreter main loop */
    
    PyObject *
    PyEval_EvalFrame(PyFrameObject *f) {
        /* This is for backward compatibility with extension modules that
           used this API; core interpreter code should call
           PyEval_EvalFrameEx() */
        return PyEval_EvalFrameEx(f, 0);
    }
    
    PyObject *
    PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
    {
        PyThreadState *tstate = PyThreadState_GET();
        return tstate->interp->eval_frame(f, throwflag);
    }
    
    
    pystate.c
    
    PyInterpreterState *
    PyInterpreterState_New(void)
    {
        PyInterpreterState *interp = (PyInterpreterState *)
                                     PyMem_RawMalloc(sizeof(PyInterpreterState));
    
        if (interp == NULL) {
            return NULL;
        }
    
    ... ...
        interp->eval_frame = _PyEval_EvalFrameDefault;
    ... ...
    
    ceval.c
    
    PyObject* _Py_HOT_FUNCTION
    _PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
    {
    ... ...
        co = f->f_code;
        names = co->co_names;
        consts = co->co_consts;
        fastlocals = f->f_localsplus;
        freevars = f->f_localsplus + co->co_nlocals;
        first_instr = (_Py_CODEUNIT *) PyBytes_AS_STRING(co->co_code);
    ... ...
        next_instr = first_instr;
        if (f->f_lasti >= 0) {
            next_instr += f->f_lasti / sizeof(_Py_CODEUNIT) + 1;
        }
        stack_pointer = f->f_stacktop;
        f->f_stacktop = NULL;       /* remains NULL unless yield suspends frame */
    ... ...
    }
    
    • EvalFrameDefault中初始化了一些变量,包括PyFrameObjectPyCodeObject对象的重要信息。
    • 同时,也初始化了栈顶指针,指向f->f_stacktop
    • Python虚拟机执行字节码指令序列的过程就是从头到尾遍历整个co_code,而co_codePyCodeObject对象中保存字节码指令和字节码指令的参数,所以这个过程就是在依次执行字节码指令的过程。
    • Python虚拟机利用3个char*类型变量来完成整个遍历过程,first_instr指向字节码指令序列的开始位置,next_instr指向下一条执行的字节码指令位置,f_lasti指向上一条已经执行过的字节码指令的位置。
    • Python虚拟机执行字节码指令的整体架构,是一个for循环加上巨大的switch/case结构:
    ceval.c
    
    PyObject* _Py_HOT_FUNCTION
    _PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
    {
     why = WHY_NOT;
    
        if (throwflag) /* support for generator.throw() */
            goto error;
    
    #ifdef Py_DEBUG
        /* PyEval_EvalFrameEx() must not be called with an exception set,
           because it can clear it (directly or indirectly) and so the
           caller loses its exception */
        assert(!PyErr_Occurred());
    #endif
    
        for (;;) {
            assert(stack_pointer >= f->f_valuestack); /* else underflow */
            assert(STACK_LEVEL() <= co->co_stacksize);  /* else overflow */
            assert(!PyErr_Occurred());
    
            /* Do periodic things.  Doing this every time through
               the loop would add too much overhead, so we do it
               only every Nth instruction.  We also do it if
               ``pendingcalls_to_do'' is set, i.e. when an asynchronous
               event needs attention (e.g. a signal handler or
               async I/O handler); see Py_AddPendingCall() and
               Py_MakePendingCalls() above. */
    
            if (_Py_atomic_load_relaxed(&_PyRuntime.ceval.eval_breaker)) {
                opcode = _Py_OPCODE(*next_instr);
                if (opcode == SETUP_FINALLY ||
                    opcode == SETUP_WITH ||
                    opcode == BEFORE_ASYNC_WITH ||
                    opcode == YIELD_FROM) {
                    /* Few cases where we skip running signal handlers and other
                       pending calls:
                       - If we're about to enter the 'with:'. It will prevent
                         emitting a resource warning in the common idiom
                         'with open(path) as file:'.
                       - If we're about to enter the 'async with:'.
                       - If we're about to enter the 'try:' of a try/finally (not
                         *very* useful, but might help in some cases and it's
                         traditional)
                       - If we're resuming a chain of nested 'yield from' or
                         'await' calls, then each frame is parked with YIELD_FROM
                         as its next opcode. If the user hit control-C we want to
                         wait until we've reached the innermost frame before
                         running the signal handler and raising KeyboardInterrupt
                         (see bpo-30039).
                    */
                    goto fast_next_opcode;
                }
                if (_Py_atomic_load_relaxed(
                            &_PyRuntime.ceval.pending.calls_to_do))
                {
                    if (Py_MakePendingCalls() < 0)
                        goto error;
                }
                if (_Py_atomic_load_relaxed(
                            &_PyRuntime.ceval.gil_drop_request))
                {
                    /* Give another thread a chance */
                    if (PyThreadState_Swap(NULL) != tstate)
                        Py_FatalError("ceval: tstate mix-up");
                    drop_gil(tstate);
    
                    /* Other threads may run now */
    
                    take_gil(tstate);
    
                    /* Check if we should make a quick exit. */
                    if (_Py_IsFinalizing() &&
                        !_Py_CURRENTLY_FINALIZING(tstate))
                    {
                        drop_gil(tstate);
                        PyThread_exit_thread();
                    }
    
                    if (PyThreadState_Swap(tstate) != NULL)
                        Py_FatalError("ceval: orphan tstate");
                }
                /* Check for asynchronous exceptions. */
                if (tstate->async_exc != NULL) {
                    PyObject *exc = tstate->async_exc;
                    tstate->async_exc = NULL;
                    UNSIGNAL_ASYNC_EXC();
                    PyErr_SetNone(exc);
                    Py_DECREF(exc);
                    goto error;
                }
            }
    
        fast_next_opcode:
            f->f_lasti = INSTR_OFFSET();
    
            if (PyDTrace_LINE_ENABLED())
                maybe_dtrace_line(f, &instr_lb, &instr_ub, &instr_prev);
    
            /* line-by-line tracing support */
    
            if (_Py_TracingPossible &&
                tstate->c_tracefunc != NULL && !tstate->tracing) {
                int err;
                /* see maybe_call_line_trace
                   for expository comments */
                f->f_stacktop = stack_pointer;
    
                err = maybe_call_line_trace(tstate->c_tracefunc,
                                            tstate->c_traceobj,
                                            tstate, f,
                                            &instr_lb, &instr_ub, &instr_prev);
                /* Reload possibly changed frame fields */
                JUMPTO(f->f_lasti);
                if (f->f_stacktop != NULL) {
                    stack_pointer = f->f_stacktop;
                    f->f_stacktop = NULL;
                }
                if (err)
                    /* trace function raised an exception */
                    goto error;
            }
    
            /* Extract opcode and argument */
    
            NEXTOPARG();
        dispatch_opcode:
    #ifdef DYNAMIC_EXECUTION_PROFILE
    #ifdef DXPAIRS
            dxpairs[lastopcode][opcode]++;
            lastopcode = opcode;
    #endif
            dxp[opcode]++;
    #endif
    
    #ifdef LLTRACE
            /* Instruction tracing */
    
            if (lltrace) {
                if (HAS_ARG(opcode)) {
                    printf("%d: %d, %d\n",
                           f->f_lasti, opcode, oparg);
                }
                else {
                    printf("%d: %d\n",
                           f->f_lasti, opcode);
                }
            }
    #endif
    
            switch (opcode) {
          ...  ...        
        }
    ... ...
    }
    
    • 在这个执行架构中,对字节码的遍历是通过几个宏来实现的。
    ceval.c
    
    /* The integer overflow is checked by an assertion below. */
    #define INSTR_OFFSET()  \
        (sizeof(_Py_CODEUNIT) * (int)(next_instr - first_instr))
    #define NEXTOPARG()  do { \
            _Py_CODEUNIT word = *next_instr; \
            opcode = _Py_OPCODE(word); \
            oparg = _Py_OPARG(word); \
            next_instr++; \
        } while (0)
    #define JUMPTO(x)       (next_instr = first_instr + (x) / sizeof(_Py_CODEUNIT))
    #define JUMPBY(x)       (next_instr += (x) / sizeof(_Py_CODEUNIT))
    
    • 判断字节码是否带参是通过宏HAS_ARG实现的。
    Include\opcode.h
    
    #define HAS_ARG(op) ((op) >= HAVE_ARGUMENT)
    
    • Python在获得一条字节码指令和指令参数后,会对字节码指令利用switch进行判断,根据判断结果选择不同的case语句,每一条字节码会对应一个case语句,在case语句中,就是Python对字节码指令的实现。
    ceval.c
    
    #define TARGET(op) \
        case op:
    
    ... ...
    PyObject* _Py_HOT_FUNCTION
    _PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag)
    {
    ... ...
          TARGET(STORE_SUBSCR) {
                PyObject *sub = TOP();
                PyObject *container = SECOND();
                PyObject *v = THIRD();
                int err;
                STACKADJ(-3);
                /* container[sub] = v */
                err = PyObject_SetItem(container, sub, v);
                Py_DECREF(v);
                Py_DECREF(container);
                Py_DECREF(sub);
                if (err != 0)
                    goto error;
                DISPATCH();
            }
    
            TARGET(DELETE_SUBSCR) {
                PyObject *sub = TOP();
                PyObject *container = SECOND();
                int err;
                STACKADJ(-2);
                /* del container[sub] */
                err = PyObject_DelItem(container, sub);
                Py_DECREF(container);
                Py_DECREF(sub);
                if (err != 0)
                    goto error;
                DISPATCH();
            }
    ... ...
    
    Include\opcode.h
    
        /* Instruction opcodes for compiled code */
    #define POP_TOP                   1
    #define ROT_TWO                   2
    #define ROT_THREE                 3
    #define DUP_TOP                   4
    #define DUP_TOP_TWO               5
    #define NOP                       9
    #define UNARY_POSITIVE           10
    #define UNARY_NEGATIVE           11
    #define UNARY_NOT                12
    #define UNARY_INVERT             15
    #define BINARY_MATRIX_MULTIPLY   16
    #define INPLACE_MATRIX_MULTIPLY  17
    #define BINARY_POWER             19
    #define BINARY_MULTIPLY          20
    #define BINARY_MODULO            22
    #define BINARY_ADD               23
    #define BINARY_SUBTRACT          24
    #define BINARY_SUBSCR            25
    #define BINARY_FLOOR_DIVIDE      26
    #define BINARY_TRUE_DIVIDE       27
    #define INPLACE_FLOOR_DIVIDE     28
    #define INPLACE_TRUE_DIVIDE      29
    #define GET_AITER                50
    #define GET_ANEXT                51
    #define BEFORE_ASYNC_WITH        52
    #define INPLACE_ADD              55
    #define INPLACE_SUBTRACT         56
    #define INPLACE_MULTIPLY         57
    #define INPLACE_MODULO           59
    #define STORE_SUBSCR             60
    #define DELETE_SUBSCR            61
    #define BINARY_LSHIFT            62
    #define BINARY_RSHIFT            63
    #define BINARY_AND               64
    #define BINARY_XOR               65
    #define BINARY_OR                66
    #define INPLACE_POWER            67
    #define GET_ITER                 68
    #define GET_YIELD_FROM_ITER      69
    #define PRINT_EXPR               70
    #define LOAD_BUILD_CLASS         71
    #define YIELD_FROM               72
    #define GET_AWAITABLE            73
    #define INPLACE_LSHIFT           75
    #define INPLACE_RSHIFT           76
    #define INPLACE_AND              77
    #define INPLACE_XOR              78
    #define INPLACE_OR               79
    #define BREAK_LOOP               80
    #define WITH_CLEANUP_START       81
    #define WITH_CLEANUP_FINISH      82
    #define RETURN_VALUE             83
    #define IMPORT_STAR              84
    #define SETUP_ANNOTATIONS        85
    #define YIELD_VALUE              86
    #define POP_BLOCK                87
    #define END_FINALLY              88
    #define POP_EXCEPT               89
    #define HAVE_ARGUMENT            90
    #define STORE_NAME               90
    #define DELETE_NAME              91
    #define UNPACK_SEQUENCE          92
    #define FOR_ITER                 93
    #define UNPACK_EX                94
    #define STORE_ATTR               95
    #define DELETE_ATTR              96
    #define STORE_GLOBAL             97
    #define DELETE_GLOBAL            98
    #define LOAD_CONST              100
    #define LOAD_NAME               101
    #define BUILD_TUPLE             102
    #define BUILD_LIST              103
    #define BUILD_SET               104
    #define BUILD_MAP               105
    #define LOAD_ATTR               106
    #define COMPARE_OP              107
    #define IMPORT_NAME             108
    #define IMPORT_FROM             109
    #define JUMP_FORWARD            110
    #define JUMP_IF_FALSE_OR_POP    111
    #define JUMP_IF_TRUE_OR_POP     112
    #define JUMP_ABSOLUTE           113
    #define POP_JUMP_IF_FALSE       114
    #define POP_JUMP_IF_TRUE        115
    #define LOAD_GLOBAL             116
    #define CONTINUE_LOOP           119
    #define SETUP_LOOP              120
    #define SETUP_EXCEPT            121
    #define SETUP_FINALLY           122
    #define LOAD_FAST               124
    #define STORE_FAST              125
    #define DELETE_FAST             126
    #define RAISE_VARARGS           130
    #define CALL_FUNCTION           131
    #define MAKE_FUNCTION           132
    #define BUILD_SLICE             133
    #define LOAD_CLOSURE            135
    #define LOAD_DEREF              136
    #define STORE_DEREF             137
    #define DELETE_DEREF            138
    #define CALL_FUNCTION_KW        141
    #define CALL_FUNCTION_EX        142
    #define SETUP_WITH              143
    #define EXTENDED_ARG            144
    #define LIST_APPEND             145
    #define SET_ADD                 146
    #define MAP_ADD                 147
    #define LOAD_CLASSDEREF         148
    #define BUILD_LIST_UNPACK       149
    #define BUILD_MAP_UNPACK        150
    #define BUILD_MAP_UNPACK_WITH_CALL 151
    #define BUILD_TUPLE_UNPACK      152
    #define BUILD_SET_UNPACK        153
    #define SETUP_ASYNC_WITH        154
    #define FORMAT_VALUE            155
    #define BUILD_CONST_KEY_MAP     156
    #define BUILD_STRING            157
    #define BUILD_TUPLE_UNPACK_WITH_CALL 158
    #define LOAD_METHOD             160
    #define CALL_METHOD             161
    
    • 在成功执行完一条字节码指令后,Python的执行流程会跳转到fast_next_opcode或for循环处。
    • 无论如何,Python接下来的动作都是获得下一条字节码指令和指令参数,执行下一条指令。
    • 就这样一条条遍历co_code中的所有字节码指令,最终完成对Python程序的执行。
    • _PyEval_EvalFrameDefault函数中的why变量,指示了退出for循环时的状态,比如异常。
    ceval.c
    
    /* Status code for main loop (reason for stack unwind) */
    enum why_code {
            WHY_NOT =       0x0001, /* No error */
            WHY_EXCEPTION = 0x0002, /* Exception occurred */
            WHY_RETURN =    0x0008, /* 'return' statement */
            WHY_BREAK =     0x0010, /* 'break' statement */
            WHY_CONTINUE =  0x0020, /* 'continue' statement */
            WHY_YIELD =     0x0040, /* 'yield' operator */
            WHY_SILENCED =  0x0080  /* Exception silenced by 'with' */
    };
    

    五、Python的运行时环境

    1. 操作系统中的进程和线程
    • 原生的win32可执行文件,多会在一个进程(Process)中运行。
    • 但与机器指令序列相对应的活动对象是由线程Thread来进行抽象的,进程则是线程的活动环境。
    • 对于单线程可执行文件,在执行时操作系统会创建一个进程,在进程中,又会有一个主线程
    • 对于多线程的可执行文件,操作系统会创建一个进程和多个线程,CPU在线程中不断切换,在切换时需要执行线程环境的保存工作,以实现线程的同步。

    2. Python中的进程和线程

    2.1 Python中的线程环境
    • Python实现了对多线程的支持,并且Python中的每一个线程对应操作系统上的原生线程
    • 虚拟机就是Python对CPU的抽象,负责所有线程的计算工作。
    • Python中的任务切换,就是不同线程轮流使用虚拟机的机制。
    • Python使用PyThreadState保存当前的线程信息,每个线程都拥有一个PyThreadState对象,所以也可以将PyThreadState看做是线程状态的抽象。
    Include\pystate.h
    
    typedef struct _ts {
        /* See Python/ceval.c for comments explaining most fields */
    
        struct _ts *prev;
        struct _ts *next;
        PyInterpreterState *interp;
    
        struct _frame *frame;
        int recursion_depth;
        char overflowed; /* The stack has overflowed. Allow 50 more calls
                            to handle the runtime error. */
        char recursion_critical; /* The current calls must not cause
                                    a stack overflow. */
        int stackcheck_counter;
    
        /* 'tracing' keeps track of the execution depth when tracing/profiling.
           This is to prevent the actual trace/profile code from being recorded in
           the trace/profile. */
        int tracing;
        int use_tracing;
    
        Py_tracefunc c_profilefunc;
        Py_tracefunc c_tracefunc;
        PyObject *c_profileobj;
        PyObject *c_traceobj;
    
        /* The exception currently being raised */
        PyObject *curexc_type;
        PyObject *curexc_value;
        PyObject *curexc_traceback;
    
        /* The exception currently being handled, if no coroutines/generators
         * are present. Always last element on the stack referred to be exc_info.
         */
        _PyErr_StackItem exc_state;
    
        /* Pointer to the top of the stack of the exceptions currently
         * being handled */
        _PyErr_StackItem *exc_info;
    
        PyObject *dict;  /* Stores per-thread state */
    
        int gilstate_counter;
    
        PyObject *async_exc; /* Asynchronous exception to raise */
        unsigned long thread_id; /* Thread id where this tstate was created */
    
        int trash_delete_nesting;
        PyObject *trash_delete_later;
    
        /* Called when a thread state is deleted normally, but not when it
         * is destroyed after fork().
         * Pain:  to prevent rare but fatal shutdown errors (issue 18808),
         * Thread.join() must wait for the join'ed thread's tstate to be unlinked
         * from the tstate chain.  That happens at the end of a thread's life,
         * in pystate.c.
         * The obvious way doesn't quite work:  create a lock which the tstate
         * unlinking code releases, and have Thread.join() wait to acquire that
         * lock.  The problem is that we _are_ at the end of the thread's life:
         * if the thread holds the last reference to the lock, decref'ing the
         * lock will delete the lock, and that may trigger arbitrary Python code
         * if there's a weakref, with a callback, to the lock.  But by this time
         * _PyThreadState_Current is already NULL, so only the simplest of C code
         * can be allowed to run (in particular it must not be possible to
         * release the GIL).
         * So instead of holding the lock directly, the tstate holds a weakref to
         * the lock:  that's the value of on_delete_data below.  Decref'ing a
         * weakref is harmless.
         * on_delete points to _threadmodule.c's static release_sentinel() function.
         * After the tstate is unlinked, release_sentinel is called with the
         * weakref-to-lock (on_delete_data) argument, and release_sentinel releases
         * the indirectly held lock.
         */
        void (*on_delete)(void *);
        void *on_delete_data;
    
        int coroutine_origin_tracking_depth;
    
        PyObject *coroutine_wrapper;
        int in_coroutine_wrapper;
    
        PyObject *async_gen_firstiter;
        PyObject *async_gen_finalizer;
    
        PyObject *context;
        uint64_t context_ver;
    
        /* Unique thread state id. */
        uint64_t id;
    
        /* XXX signal handlers should also be here */
    
    } PyThreadState;
    
    • 在结构中包含了*frame对象,这表示在PyThreadState对象中,维护着一个帧栈列表。
    • 而在虚拟机源码中,也可以看到会将当前线程状态设置为当前的执行环境。
    ceval.c
    
    PyObject *
    PyEval_EvalFrameEx(PyFrameObject *f, int throwflag)
    {
        PyThreadState *tstate = PyThreadState_GET();
        return tstate->interp->eval_frame(f, throwflag);
    }
    
    pystate.c
    
    PyThreadState *
    PyThreadState_Get(void)
    {
        PyThreadState *tstate = GET_TSTATE();
        if (tstate == NULL)
            Py_FatalError("PyThreadState_Get: no current thread");
    
        return tstate;
    }
    
    • 在创建PyFrameObject对象时,也会调用当前线程对象。
    Objects\frameobject.c
    
    PyFrameObject*
    PyFrame_New(PyThreadState *tstate, PyCodeObject *code,
                PyObject *globals, PyObject *locals)
    {
        PyFrameObject *f = _PyFrame_New_NoTrack(tstate, code, globals, locals);
        if (f)
            _PyObject_GC_TRACK(f);
        return f;
    }
    
    2.2 Python中的进程环境
    • 而对于进程的概念,Python是以PyInterpreterState对象来实现的。
    • Python可以有多个逻辑上的PyInterpreterState,对应系统中的多进程。
    • 但在通常环境下,Python中只有一个interpreter,它维护多个PyThreadState对象,而这些线程对象轮流使用一个字节码执行引擎。
    • 而Python中使用全局解释器锁(Global Interpreter Lock,GIL)来实现所有线程的同步。
    Include\pystate.h
    
    typedef struct _is {
    
        struct _is *next;
        struct _ts *tstate_head;
    
        int64_t id;
        int64_t id_refcount;
        PyThread_type_lock id_mutex;
    
        PyObject *modules;
        PyObject *modules_by_index;
        PyObject *sysdict;
        PyObject *builtins;
        PyObject *importlib;
    
        /* Used in Python/sysmodule.c. */
        int check_interval;
    
        /* Used in Modules/_threadmodule.c. */
        long num_threads;
        /* Support for runtime thread stack size tuning.
           A value of 0 means using the platform's default stack size
           or the size specified by the THREAD_STACK_SIZE macro. */
        /* Used in Python/thread.c. */
        size_t pythread_stacksize;
    
        PyObject *codec_search_path;
        PyObject *codec_search_cache;
        PyObject *codec_error_registry;
        int codecs_initialized;
        int fscodec_initialized;
    
        _PyCoreConfig core_config;
        _PyMainInterpreterConfig config;
    #ifdef HAVE_DLOPEN
        int dlopenflags;
    #endif
    
        PyObject *builtins_copy;
        PyObject *import_func;
        /* Initialized to PyEval_EvalFrameDefault(). */
        _PyFrameEvalFunction eval_frame;
    
        Py_ssize_t co_extra_user_count;
        freefunc co_extra_freefuncs[MAX_CO_EXTRA_USERS];
    
    #ifdef HAVE_FORK
        PyObject *before_forkers;
        PyObject *after_forkers_parent;
        PyObject *after_forkers_child;
    #endif
        /* AtExit module */
        void (*pyexitfunc)(PyObject *);
        PyObject *pyexitmodule;
    
        uint64_t tstate_next_unique_id;
    } PyInterpreterState;
    
    • 综上所述,可以猜测Python的运行时环境如下:


    相关文章

      网友评论

          本文标题:大师兄的Python源码学习笔记(十一): Python的虚拟机

          本文链接:https://www.haomeiwen.com/subject/zvhahltx.html