美文网首页
venv, module and pkg in python 2

venv, module and pkg in python 2

作者: 9_SooHyun | 来源:发表于2023-03-06 16:19 被阅读0次

venv

.venv/bin里面的python是软链,但是.venv/bin下面的其他二进制文件(如pip)以及site-packages里面安装的第三方库都是实实在在的源码文件

(.venv) [root@VM-165-116-centos /]# ls -l .venv/bin/python
lrwxrwxrwx 1 root root 15 Mar  6 10:35 .venv/bin/python -> /usr/bin/python

python不像go。python通过虚拟环境管理包的方式类似小仓模式,做的是物理隔离;而go通过go.mod进行包管理更像是大仓模式做逻辑隔离

  • 每个python的虚拟环境都存储了第三方pkg的源码,同一个pkg可能在不同的虚拟环境中被fork了多份;
  • 而go的第三方pkg源码全局一份,不同的module通过go.mod对源码进行引用。当然,go vendor 也提供了依赖包物理隔离的管理方式

个人感觉,逻辑隔离相对优于物理隔离,因为不同工程可以复用同一份第三方pkg,包管理更加轻量和灵活

module and pkg in python

先说结论:

  1. python import的单位是从来都是module or items defined in module,而不是pkg。为什么?因为python是脚本语言,它是一个一个的.py文件。就像在shell脚本中引用其他shell脚本用的是source /path/to/import.sh
  2. pkg是特殊的module,import pkg其实是import了一个__init__.py文件。module的核心作用是提供和隔离namespace

module

A module is a single file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Within a module, the module’s name (as a string) is available as the value of the global variable __name__.

>>> import math
>>> math.__name__
'math'

Modules can import other modules. The imported module names, if placed at the top level of a module (outside any functions or classes), are added to the module’s global namespace.——
import其实是把被import的命名空间attach到当前module的命名空间上

(Note: For efficiency reasons, each module is only imported once per interpreter session.)

refers to https://docs.python.org/3/tutorial/modules.html

pkg

什么是package

Packages are a way of structuring Python’s module namespace with attribute __path__. Packages have __path__ set to the directory or directories they were created from.

>>> import urllib
>>> urllib.__name__
'urllib'
>>> urllib.__path__
['/usr/lib64/python3.9/urllib']
>>> 

__path__ = pkg的绝对路径
__path__,顾名思义就是它所在包在当前系统的绝对路径,也就是说不同路径下即使同名的包它们的__path__值也是不同的
包的 __path__ 属性会在导入其子包期间被使用

如果一个module具有 __path__ 属性(module.__path__),它就被视为包,pkg也只是特殊的module

__path__ is initialized to be a list containing the name of the directory holding the package’s __init__.py before the code in that file is executed(__path__是预先初始化好的用于找__init__.py的,在init.py被执行前__path__就已经被赋值). This variable can be modified; doing so affects future searches for modules and subpackages contained in the package.

如何成为package

regular package

A regular package is typically implemented as a directory containing an __init__.py file.

When a regular package is imported, this __init__.py file is implicitly executed, and the objects it defines are bound to names in the package’s namespace. 当一个常规包被导入时,这个 __init__.py 文件会隐式地被执行;在 __init__.py 中定义的对象,会被绑定到import 指定的命名空间里面来。如果__init__.py啥也没有,那就只是把这pkg一个空的namespace attach到当前module的namespace,这意味着导入包会创建命名空间但没有其他效果

namespace package

一些定义

"portion" refers to a set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package.
"legacy portion" refers to a portion that uses __path__ manipulation in order to implement namespace packages.

A namespace package is a virtual package whose portions can be distributed in various places along Python's lookup path.
它将物理上的分布式包构成一个逻辑包,each portion contributes a subpackage to the parent namespace package。例如,这种使用场景经常出现在大的应用框架中,框架开发者希望鼓励用户发布插件或附加包

how to packaging a namespace pkg

native-namespace-package
Package directory without __init__.py is "native namespace package". refer to https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages

resource_plan_backend/
├ tests/               # the name of this dir doesn't matter
│  └ ns1/              # namespace package name
│      └ test1.py      # a = 1
│      
.
.
.
└ ns1/            # namespace package name
│   └ test2.py    # b = 2


>>> import sys
>>> sys.path.extend(['tests']) # must extend sys.path to find all subpackages
>>>
>>> import ns1 # ns1 is a namespace pkg
>>> ns1.__path__
_NamespacePath(['/root/workspace/resource_plan_backend/ns1', '/root/workspace/resource_plan_backend/tests/ns1'])
>>> ns1
<module 'ns1' (namespace)>
>>> from ns1 import test1
>>> test1.a
1
>>> from ns1 import test2
>>> test2.b
2

pkgutil-style-namespace-package
refer to https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#pkgutil-style-namespace-packages

  • 所有subpkg保持相同namespace
  • 所有subpkg的__init__.py仅写入__path__ = __import__('pkgutil').extend_path(__path__, __name__). the legacy behavior is to add any other regular packages on the searched path to its __path__. But in Python 3.3 and later, it also adds namespace packages.
resource_plan_backend/
├ tests/               # the name of this dir doesn't matter
│  └ ns/              # namespace package name
│      └ ns1.py      # a = 1
│      ├ __init__.py  # Sub-package __init__.py
.
.
.
└ ns/            # namespace package name
│   └ ns2.py    # b = 2
│   ├ __init__.py  # Sub-package __init__.py
.



>>> import sys
>>> sys.path.extend(['tests'])  # must extend sys.path to find all subpackages
>>>
>>> import ns
>>> ns.__path__
['/root/workspace/resource_plan_backend/ns', '/root/workspace/resource_plan_backend/./ns', '/root/workspace/resource_plan_backend/tests/ns'] # attention: ns is not a native namespace pkg here
>>> ns
<module 'ns' from '/root/workspace/resource_plan_backend/ns/__init__.py'>
>>> from ns import ns1, ns2
>>> ns1.a
1
>>> ns2.b
2

我们发现,pkgutil-style-namespace-package并不是纯正原生的namespace-package,因为__init__.py仍然存在,它仍然是regular package。 只是这种方式通过extend_path修改了顶层的公共namespace的search path,支持在多个路径下查找subpkg,从而实现了namespace-package的效果

importing module/pkg

With import <module_name>, a module is imported as an object of the module type. You can check which file is imported with print()

>>> import sys
>>> print(sys)
<module 'sys' (built-in)>
>>> print(type(sys))
<class 'module'>
>>>
>>> from math import * # 将math内的items attach到当前namespace,而math这个namespace本身并没有被attached
>>> print(math)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'math' is not defined
>>> print(pi)
3.141592653589793

python如何查询module/pkg

分2步

  • first, search sys.modules
  • second, search from sys.meta_path
>>> import sys
>>> sys.meta_path
[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]

sys.meta_path contains a list of meta path finder objects. These finders are queried in order to see if they know how to handle the named module. Meta path finders must implement a method called find_spec() which takes three arguments: a name, an import path, and (optionally) a target module(所以你当然可以自己实现finder插入到meta_path中去).

The meta path may be traversed multiple times for a single import request.

For example, assuming none of the modules involved has already been cached.

importing foo.bar.baz will first perform a top level import(top level is in one of the paths on your sys.path), calling mpf.find_spec("foo", None, None) on each meta path finder (mpf).

After foo has been imported, foo.bar will be imported by traversing the meta path a second time, calling mpf.find_spec("foo.bar", foo.__path__, None).

Once foo.bar has been imported, the final traversal will call mpf.find_spec("foo.bar.baz", foo.bar.__path__, None).

总结来说,通过多个mpf进行广度搜索,找到顶层包;然后再把顶层包的__path__作为参数,再进行一轮广度搜索。整个过程就是递归执行多次广度遍历

import pkg,就是import pkg的__init__.py

当python import一个pkg时,实际上只是import了pkg的__init__.py这一个文件,而不像go那样import了整个包。例如:

Files (modules) are stored in the urllib directory as follows. __init__.py is empty.

urllib/
├── __init__.py
├── error.py
├── parse.py
├── request.py
├── response.py
└── robotparser.py

If you write import urllib, you cannot use the modules under it. For example, urllib.error raises an error AttributeError.

import urllib

print(type(urllib))
# <class 'module'>

print(urllib)
# <module 'urllib' from '/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/__init__.py'>

# print(urllib.error)
# AttributeError: module 'urllib' has no attribute 'error'

import urllib引入的仅仅是xxx/urllib/__init__.py

绝对import和相对import

绝对导入的格式为 import A.Bfrom A import B
relative import (使用. ..的import,如from . import moduleA
相对导入不是相对于目录,而是相对于package
开放给其他开发者使用的pkg,包内module互相import显然需要使用相对引入,这是内聚

python import的单位是从来都是module or items defined in module,而不是pkg

Python的“包”其实不是包。它只是简单地把文件放在一起,提供了一个统一的namespace,仅此而已。进入“包”内我们发现,甚至包内引用都要import,这算哪门子的包?包内还是一个个独立的single-file module
所以,python的能力暴露单元本质上是module;而作为对比,go 则是将pkg作为一个整体能力对外暴露(go pkg 不存在包内互相import)

相关文章

网友评论

      本文标题:venv, module and pkg in python 2

      本文链接:https://www.haomeiwen.com/subject/fmumldtx.html