美文网首页
venv, module and pkg in python 2

venv, module and pkg in python 2

作者: 9_SooHyun | 来源:发表于2023-03-06 16:19 被阅读0次

    venv

    .venv/bin里面的python是软链,但是.venv/bin下面的其他二进制文件(如pip)以及site-packages里面安装的第三方库都是实实在在的源码文件

    (.venv) [root@VM-165-116-centos /]# ls -l .venv/bin/python
    lrwxrwxrwx 1 root root 15 Mar  6 10:35 .venv/bin/python -> /usr/bin/python
    

    python不像go。python通过虚拟环境管理包的方式类似小仓模式,做的是物理隔离;而go通过go.mod进行包管理更像是大仓模式做逻辑隔离

    • 每个python的虚拟环境都存储了第三方pkg的源码,同一个pkg可能在不同的虚拟环境中被fork了多份;
    • 而go的第三方pkg源码全局一份,不同的module通过go.mod对源码进行引用。当然,go vendor 也提供了依赖包物理隔离的管理方式

    个人感觉,逻辑隔离相对优于物理隔离,因为不同工程可以复用同一份第三方pkg,包管理更加轻量和灵活

    module and pkg in python

    先说结论:

    1. python import的单位是从来都是module or items defined in module,而不是pkg。为什么?因为python是脚本语言,它是一个一个的.py文件。就像在shell脚本中引用其他shell脚本用的是source /path/to/import.sh
    2. pkg是特殊的module,import pkg其实是import了一个__init__.py文件。module的核心作用是提供和隔离namespace

    module

    A module is a single file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Within a module, the module’s name (as a string) is available as the value of the global variable __name__.

    >>> import math
    >>> math.__name__
    'math'
    

    Modules can import other modules. The imported module names, if placed at the top level of a module (outside any functions or classes), are added to the module’s global namespace.——
    import其实是把被import的命名空间attach到当前module的命名空间上

    (Note: For efficiency reasons, each module is only imported once per interpreter session.)

    refers to https://docs.python.org/3/tutorial/modules.html

    pkg

    什么是package

    Packages are a way of structuring Python’s module namespace with attribute __path__. Packages have __path__ set to the directory or directories they were created from.

    >>> import urllib
    >>> urllib.__name__
    'urllib'
    >>> urllib.__path__
    ['/usr/lib64/python3.9/urllib']
    >>> 
    

    __path__ = pkg的绝对路径
    __path__,顾名思义就是它所在包在当前系统的绝对路径,也就是说不同路径下即使同名的包它们的__path__值也是不同的
    包的 __path__ 属性会在导入其子包期间被使用

    如果一个module具有 __path__ 属性(module.__path__),它就被视为包,pkg也只是特殊的module

    __path__ is initialized to be a list containing the name of the directory holding the package’s __init__.py before the code in that file is executed(__path__是预先初始化好的用于找__init__.py的,在init.py被执行前__path__就已经被赋值). This variable can be modified; doing so affects future searches for modules and subpackages contained in the package.

    如何成为package

    regular package

    A regular package is typically implemented as a directory containing an __init__.py file.

    When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package’s namespace. 当一个常规包被导入时,这个 __init__.py 文件会隐式地被执行;在 __init__.py 中定义的对象,会被绑定到import 指定的命名空间里面来。如果__init__.py啥也没有,那就只是把这pkg一个空的namespace attach到当前module的namespace,这意味着导入包会创建命名空间但没有其他效果

    namespace package

    一些定义

    "portion" refers to a set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package.
    "legacy portion" refers to a portion that uses __path__ manipulation in order to implement namespace packages.

    A namespace package is a virtual package whose portions can be distributed in various places along Python's lookup path.
    它将物理上的分布式包构成一个逻辑包,each portion contributes a subpackage to the parent namespace package。例如,这种使用场景经常出现在大的应用框架中,框架开发者希望鼓励用户发布插件或附加包

    how to packaging a namespace pkg

    native-namespace-package
    Package directory without __init__.py is "native namespace package". refer to https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages

    resource_plan_backend/
    ├ tests/               # the name of this dir doesn't matter
    │  └ ns1/              # namespace package name
    │      └ test1.py      # a = 1
    │      
    .
    .
    .
    └ ns1/            # namespace package name
    │   └ test2.py    # b = 2
    
    
    >>> import sys
    >>> sys.path.extend(['tests']) # must extend sys.path to find all subpackages
    >>>
    >>> import ns1 # ns1 is a namespace pkg
    >>> ns1.__path__
    _NamespacePath(['/root/workspace/resource_plan_backend/ns1', '/root/workspace/resource_plan_backend/tests/ns1'])
    >>> ns1
    <module 'ns1' (namespace)>
    >>> from ns1 import test1
    >>> test1.a
    1
    >>> from ns1 import test2
    >>> test2.b
    2
    

    pkgutil-style-namespace-package
    refer to https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#pkgutil-style-namespace-packages

    • 所有subpkg保持相同namespace
    • 所有subpkg的__init__.py仅写入__path__ = __import__('pkgutil').extend_path(__path__, __name__). the legacy behavior is to add any other regular packages on the searched path to its __path__. But in Python 3.3 and later, it also adds namespace packages.
    resource_plan_backend/
    ├ tests/               # the name of this dir doesn't matter
    │  └ ns/              # namespace package name
    │      └ ns1.py      # a = 1
    │      ├ __init__.py  # Sub-package __init__.py
    .
    .
    .
    └ ns/            # namespace package name
    │   └ ns2.py    # b = 2
    │   ├ __init__.py  # Sub-package __init__.py
    .
    
    
    
    >>> import sys
    >>> sys.path.extend(['tests'])  # must extend sys.path to find all subpackages
    >>>
    >>> import ns
    >>> ns.__path__
    ['/root/workspace/resource_plan_backend/ns', '/root/workspace/resource_plan_backend/./ns', '/root/workspace/resource_plan_backend/tests/ns'] # attention: ns is not a native namespace pkg here
    >>> ns
    <module 'ns' from '/root/workspace/resource_plan_backend/ns/__init__.py'>
    >>> from ns import ns1, ns2
    >>> ns1.a
    1
    >>> ns2.b
    2
    

    我们发现,pkgutil-style-namespace-package并不是纯正原生的namespace-package,因为__init__.py仍然存在,它仍然是regular package。 只是这种方式通过extend_path修改了顶层的公共namespace的search path,支持在多个路径下查找subpkg,从而实现了namespace-package的效果

    importing module/pkg

    With import <module_name>, a module is imported as an object of the module type. You can check which file is imported with print()

    >>> import sys
    >>> print(sys)
    <module 'sys' (built-in)>
    >>> print(type(sys))
    <class 'module'>
    >>>
    >>> from math import * # 将math内的items attach到当前namespace,而math这个namespace本身并没有被attached
    >>> print(math)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'math' is not defined
    >>> print(pi)
    3.141592653589793
    

    python如何查询module/pkg

    分2步

    • first, search sys.modules
    • second, search from sys.meta_path
    >>> import sys
    >>> sys.meta_path
    [<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
    

    sys.meta_path contains a list of meta path finder objects. These finders are queried in order to see if they know how to handle the named module. Meta path finders must implement a method called find_spec() which takes three arguments: a name, an import path, and (optionally) a target module(所以你当然可以自己实现finder插入到meta_path中去).

    The meta path may be traversed multiple times for a single import request.

    For example, assuming none of the modules involved has already been cached.

    importing foo.bar.baz will first perform a top level import(top level is in one of the paths on your sys.path), calling mpf.find_spec("foo", None, None) on each meta path finder (mpf).

    After foo has been imported, foo.bar will be imported by traversing the meta path a second time, calling mpf.find_spec("foo.bar", foo.__path__, None).

    Once foo.bar has been imported, the final traversal will call mpf.find_spec("foo.bar.baz", foo.bar.__path__, None).

    总结来说,通过多个mpf进行广度搜索,找到顶层包;然后再把顶层包的__path__作为参数,再进行一轮广度搜索。整个过程就是递归执行多次广度遍历

    import pkg,就是import pkg的__init__.py

    当python import一个pkg时,实际上只是import了pkg的__init__.py这一个文件,而不像go那样import了整个包。例如:

    Files (modules) are stored in the urllib directory as follows. __init__.py is empty.

    urllib/
    ├── __init__.py
    ├── error.py
    ├── parse.py
    ├── request.py
    ├── response.py
    └── robotparser.py
    

    If you write import urllib, you cannot use the modules under it. For example, urllib.error raises an error AttributeError.

    import urllib
    
    print(type(urllib))
    # <class 'module'>
    
    print(urllib)
    # <module 'urllib' from '/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/__init__.py'>
    
    # print(urllib.error)
    # AttributeError: module 'urllib' has no attribute 'error'
    

    import urllib引入的仅仅是xxx/urllib/__init__.py

    绝对import和相对import

    绝对导入的格式为 import A.Bfrom A import B
    relative import (使用. ..的import,如from . import moduleA
    相对导入不是相对于目录,而是相对于package
    开放给其他开发者使用的pkg,包内module互相import显然需要使用相对引入,这是内聚

    python import的单位是从来都是module or items defined in module,而不是pkg

    Python的“包”其实不是包。它只是简单地把文件放在一起,提供了一个统一的namespace,仅此而已。进入“包”内我们发现,甚至包内引用都要import,这算哪门子的包?包内还是一个个独立的single-file module
    所以,python的能力暴露单元本质上是module;而作为对比,go 则是将pkg作为一个整体能力对外暴露(go pkg 不存在包内互相import)

    相关文章

      网友评论

          本文标题:venv, module and pkg in python 2

          本文链接:https://www.haomeiwen.com/subject/fmumldtx.html