Python

作者: 宇宙湾 | 来源:发表于2019-05-13 11:51 被阅读0次

    什么是 Python?

    Python is a programming language that lets you work quickly and integrate systems more effectively.

    为什么要有 Python?

    胶水语言

    胶水语言,能够把用其他语言制作的各种模块(尤其是 C/C++)很轻松地联结在一起

    脚本语言

    ABC 语言的一种继承

    缩短传统的 编写 - 编译 - 链接 - 运行edit-compile-link-run)过程

    环境部署

    Python 安装

    Linux 基础环境

    $ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y
    

    Python 编译安装

    # 在 python ftp 服务器中下载到 对应版本的 python
    $ wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz
    
    # 编译
    $ tar -zxvf Python-3.6.8.tgz
    $ cd /usr/local/Python-3.6.8
    $ ./configure --prefix=/usr/local/python36
    $ make
    $ make install
    
    $ ls /usr/local/python36/ -al
      total 24
      drwxr-xr-x 6 root root 4096 Jan 30 11:10 .
      drwxr-xr-x 1 root root 4096 Jan 30 11:09 ..
      drwxr-xr-x 2 root root 4096 Jan 30 11:10 bin
      drwxr-xr-x 3 root root 4096 Jan 30 11:10 include
      drwxr-xr-x 4 root root 4096 Jan 30 11:10 lib
      drwxr-xr-x 3 root root 4096 Jan 30 11:10 share
    

    覆盖旧版 Python

    # 覆盖原来的 python6
    $ which python
      /usr/bin/python
    $ /usr/local/python36/bin/python3.6 -V
      Python 3.6.8
    $ mv /usr/bin/python /usr/bin/python_old
    $ ln -s /usr/local/python36/bin/python3.6 /usr/bin/python
    $ python -V
      Python 3.6.8
    

    恢复 yum 中旧版 Python 的引用

    # 修改 yum 引用的 python 版本为旧版 2.6 的 python
    $ vim /usr/bin/yum
      # 第一行修改为 python2.6
      #!/usr/bin/python2.6
    
    $ yum --version | sed '2,$d'
      3.2.29
    

    Pip

    安装

    在线
    $ pip --version
      pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)
    
    # upgrade setup tools and pip
    $ pip install --upgrade setuptools pip
    
    离线
    # https://pypi.org/project/setuptools/#files 下载 setuptools-40.7.1.zip
    $ unzip setuptools-40.7.1.zip
    $ cd setuptools-40.7.1
    $ python setup.py install
    
    # https://pypi.org/project/pip/#files 下载 pip-19.0.1.tar.gz
    $ tar zxvf pip-19.0.1.tar.gz
    $ cd pip-19.0.1
    $ python setup.py install
    
    $ python -m pip -V
      pip 18.1 from /usr/local/python36/lib/python3.6/site-packages/pip (python 3.6)
    
    # 环境变量
    $ vim ~/.bashrc
      export PATH=$PATH:/usr/local/python36/bin
    $ source ~/.bashrc
    $ pip -V
      pip 19.0.1 from /usr/local/python36/lib/python3.6/site-packages/pip-19.0.1-py3.6.egg/pip (python 3.6)
    

    VirtualEnv

    这里我们以 Apache Superset 为例,更多相关内容,详见我的另一篇博客《Apache Superset 二次开发

    解压安装

    $ pip install virtualenv
    
    # virtualenv is shipped in Python 3 as pyvenv
    $ virtualenv venv
    $ source venv/bin/activate
    # 如果希望 virtualEnv 的隔离环境,能够访问系统全局的 site-packages 目录,可以增加 `--system-site-packages` 参数
    # virtualenv -p /usr/local/bin/python --system-site-packages venv
    # 另外,如果考虑到便于拷贝,使得 virtualEnv 中依赖的文件,都是复制进来的,而非软链接,则增加 `--always-copy` 参数
    # virtualenv -p /usr/local/bin/python --always-copy venv
    
    ## 【Offline环境】安装 virtualenv
    # 在 https://pypi.python.org/pypi/virtualenv#downloads 页面,下载 virtualenv-15.1.0.tar.gz
    $ tar zxvf virtualenv-15.1.0.tar.gz
    $ cd virtualenv-15.1.0
    $ python setup.py install
    
    $ virtualenv --version
      15.1.0
    

    部署上线

    拷贝
    # rsync 替换 scp 可以确保软链接 也能被 cp
    $ rsync -avuz -e ssh /home/superset/superset-0.15.4/ yuzhouwan@middle:/home/yuzhouwan/superset-0.15.4
    
      //...
      sent 142935894 bytes  received 180102 bytes  3920986.19 bytes/sec
      total size is 359739823  speedup is 2.51
    
    # 在 本机 和 目标机器 的 Superset 目录下,校验文件数量
    $ find | wc -l
      10113
    
    # 重复以上步骤,从跳板机 rsync 到线上机器
    $ rsync -avuz -e ssh /home/yuzhouwan/superset-0.15.4/ root@192.168.2.10:/home/superset/superset-0.15.4
    
    # virtualenv 创建依赖的 python
    $ rsync -avuz -e ssh /root/software yuzhouwan@middle:/home/yuzhouwan
    $ rsync -avuz -e ssh /home/yuzhouwan/software root@druid-prd01:/root
    
    $ cd /root/software
    $ tar zxvf Python-2.7.12.tgz
    $ cd Python-2.7.12
    
    $ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
    $ make && make install
    $ /sbin/ldconfig -v | grep /      # nessnary!!
    $ python -V
      Python 2.7.12
    
    动态链接库
    # 虽然软链接已经 rsync 过来了,但是 目标机器相关目录下,没有对应的 python 的动态链接库
    $ file /root/superset/lib/python2.7/lib-dynload
    
      /root/superset/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload`
    
    # 需要和联网环境中,创建 virtualenv 时的 python 全局环境一致
    $ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
    $ make && make install
    $ /sbin/ldconfig -v | grep /
    
    $ ls /usr/local/python27/lib/python2.7/lib-dynload -sail
    

    VirtualEnvWrapper

    # VirtualEnv Wrapper 是 virtualenv 的扩展工具,可以方便的创建、删除、复制、切换不同的虚拟环境
    $ pip install virtualenvwrapper
    $ mkdir ~/workspaces
    $ vim ~/.bashrc
      # 增加
      export WORKON_HOME=~/virtualenv
      source /usr/local/bin/virtualenvwrapper.sh
    
    $ mkvirtualenv --python=/usr/bin/python superset
      Running virtualenv with interpreter /usr/bin/python
      New python executable in /root/virtualenv/superset/bin/python
      Installing setuptools, pip, wheel...done.
      virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
      virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
      virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
      virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
      virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
    (superset) [root@superset01 virtualenv]# deactivate
    
    $ workon superset
    (superset) [root@superset01 virtualenv]# lsvirtualenv -b
      superset
    

    基本语法

    基本数据类型

    int

    int 类型的最大值
    >>> import sys
    >>> sys.maxsize
      9223372036854775807
    
    # 该值取决于你的操作系统位数
    >>> pow(2, 63) - 1
      9223372036854775807
    >>> 1 << 64 - 1
      9223372036854775808
    

    float

    inf 无穷大
    >>> float('inf')
      inf
    >>> float('Inf')
      inf
    >>> float('inf') > 0
      True
    >>> float('inf') < 0
      False
    >>> float('inf') > 9999999999
      True
    >>> float('inf') > 9999999999999999999999
      True
    >>> float('-inf') < -9999999999999999999999
      True
    # inf、Inf、INF 都是可以表示无穷大的(infinity),这里没有大小写的规定
    # inf 表示正无穷,而 -inf 表示为负无穷
    >>> float('Inf') == float('inf') == -float('-inf') == -float('-Inf')
      True
    

    string

    split
    >>> 'a b c'.split(' ')
      ['a', 'b', 'c']
    
    >>> 'a b c'.split(' ', 1)
      ['a', 'b c']
    
    >>> 'a b c'.split(' ', 2)
      ['a', 'b', 'c']
    
    类型转换
    >>> int(1)
      1
    
    >>> float(1.0)
      1.0
    
    占位符
    >>> "speed: %skm/h" % 16.8
      'speed: 16.8km/h'
    
    >>> "(%s, %s)" % ("percent", 99.97)
      '(percent, 99.97)'
    

    打印

    不换行

    >>> print("[]", end="")
    []>>>
    

    OS

    操作系统相关

    # 获取操作系统特定的路径分割符(Windows: '\\';Linux/Unix: '/')
    os.sep
    # 字符串表示正在使用的平台(Windows: 'nt';Linux/Unix: 'posix')
    os.name
    # 字符串给出当前平台使用的行终止符(Windows: '\r\n';Linux: '\n';Mac: '\r')
    os.linesep
    # 函数用来运行 shell 命令
    os.system(shell)
    
    # 获得当前工作目录
    os.getcwd()
    # 获取 / 设置 环境变量
    os.getenv(key) / os.putenv(key, value)
    # 获得当前进程的 PID
    os.getpid()
    

    获取文件/路径信息

    # 返回指定目录下的所有文件和目录名,v3.5 之后被替换为 scandir
    os.listdir(path)
    # 函数返回路径 path 的目录名和文件名
    os.path.split(path)
    # 判断路径是一个文件还是目录
    os.path.isfile(path) / os.path.isdir(path)
    # 判断路径是否是软链接
    os.path.islink(path)
    # 判断是否存在文件或目录
    os.path.exists(path)
    # 获得文件大小,如果 path 是目录返回 0L
    os.path.getsize(path)
    # 获得绝对路径
    os.path.abspath(path)
    # 规范 path 字符串形式
    os.path.normpath(path)
    # 分割文件名与目录
    os.path.split(path)
    # 分离文件名与扩展名
    os.path.splitext(path)
    # 连接目录与文件名或目录
    os.path.join(path, file)
    # 返回文件名
    os.path.basename(path)
    # 返回文件路径
    os.path.dirname(path)
    

    实际操作文件 / 路径

    # 返回但前目录
    os.curdir
    # 改变工作目录到 path
    os.chdir(path)
    # 删除文件
    os.remove(path)
    # 删除目录
    os.rmdir(path)
    # 递归删除目录,删除 'foo/bar/baz',意味着依次删除 'foo/bar/baz' - 'foo/bar' - 'foo'
    os.removedirs(path)
    

    读取文件

    def open_file(f = ""):
        if not os.path.exists(f):
            print("File not exists, path is %s!" % f)
            return
        with open(f, "r+", encoding = "utf8") as of:
            return of.readlines()
    

    执行 shell 命令

    >>> import os
    >>> exit_code = os.system("source ~/.bashrc")
    >>> exit_code
    0
    

    JSON

    加载与提取

    >>> user = json.loads('{"name":"benedict","infos":{"age":0,"blog":"yuzhouwan.com"}}')
    >>> user['name']
      'benedict'
    
    >>> user['infos']['blog']
      'yuzhouwan.com'
    

    与 YAML 格式互换

    import json
    import sys
    
    import yaml
    
    # json2yaml
    sys.stdout.write(yaml.dump(json.load(sys.stdin)))
    # yaml2json
    sys.stdout.write(json.dumps(yaml.load(sys.stdin)))
    

    集合

    map

    赋值 / 取值
    >>> kv_map = {}
    >>> kv_map["k"] = "v"
    >>> kv_map
      {'k': 'v'}
    
    >>> kv_map["k"]
      'v'
    
    排序
    >>> costs = {"b": 2, "a": 1, "c": 3}
    >>> costs
      {'b': 2, 'c': 3, 'a': 1}
    
    # 按照 Key 排序
    >>> sorted(costs)
      ['a', 'b', 'c']
    >>> sorted(costs.keys())
      ['a', 'b', 'c']
    
    # 按照 Value 排序
    >>> sorted(costs.values())
      [1, 2, 3]
    >>> [ (k, costs[k]) for k in sorted(costs, key=costs.get, reverse=False) ]
      [('a', 1), ('b', 2), ('c', 3)]
    >>> sorted(costs.items(), key=lambda item: item[1], reverse=True)
      [('c', 3), ('b', 2), ('a', 1)]
    
    遍历
    >>> for k, v in costs_sorted:
    ...     print(k, v)
    ...
      a 1
      b 2
      c 3
    
    求和
    >>> sum({"b": 2, "a": 1, "c": 3}.values())
      6
    

    list

    # range(start, stop, step)
    # 参数三 如果是负数,则是倒序遍历
    # 注意 [start, stop) 是前闭后开的
    >>> [ _ for _ in range(3, 0, -1)]
      [3, 2, 1]
    

    流程控制

    if-else

    >>> -1 if True else 0
      -1
    >>> -1 if False else 0
      0
    

    算术运算

    除以并返回商的整数值

    >>> 1 // 1
    1
    >>> 2 // 1
    2
    >>> 3 // 1
    3
    
    >>> 1 // 2
    0
    >>> 2 // 2
    1
    >>> 3 // 2
    1
    >>> 4 // 2
    2
    >>> 5 // 2
    2
    >>> 6 // 2
    3
    

    逻辑运算

    & vs. and

    >>> True & False
      False
    
    >>> True and False
      False
    
    >>> 10 > 1 & 10 < 1
      True
    
    >>> 10 > 1 and 10 < 1
      False
    

    位运算

    位运算 运算符 运算规则
    与运算 & A 与 B 值均为 1 时,结果才为 1,否则为 0
    或运算 ` ` A 或 B 值为 1 时,结果才为 1,否则为 0
    异或运算 ^ A 与 B 不同为 0 或 1 时,结果才为 1,否则为 0
    按位取反 ~ 取反二进制数,0 取 1,1 取 0

    切片

    获取列表的一部分

    >>> [1, 2, 3][:1]
      [1]
    >>> [1, 2, 3][:2]
      [1, 2]
    >>> [1, 2, 3][:3]
      [1, 2, 3]
    >>> [1, 2, 3][1::]
      [2, 3]
    

    获取整个列表

    >>> [1, 2, 3][:]
      [1, 2, 3]
    

    反转

    # 反转列表
    >>> [1, 2, 3][::-1]
      [3, 2, 1]
    # 反转字符串
    >>> 'nawuohzuy'[::-1]
      'yuzhouwan'
    

    对列表的切片赋值

    >>> l = list(range(10))
    >>> l
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> l[0:3] = [0, -1, -2]
    >>> l
      [0, -1, -2, 3, 4, 5, 6, 7, 8, 9]
    >>> l[2::3] = [0, 0, 0]
    >>> l
      [0, -1, 0, 3, 4, 0, 6, 7, 0, 9]
    

    Python 标准库

    argparse

    datetime

    # 获取当前时间
    datetime.datetime.now().time()
    

    ftplib

    gettext

    制作 PO 文件

    # 生成模板
    $ python D:\apps\Python\Python35\Tools\i18n\pygettext.py
    $ cat messages.pot
      # SOME DESCRIPTIVE TITLE.
      # Copyright (C) YEAR ORGANIZATION
      # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
      #
      msgid ""
      msgstr ""
      "Project-Id-Version: PACKAGE VERSION\n"
      "POT-Creation-Date: 2017-12-28 11:24+0800\n"
      "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
      "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
      "Language-Team: LANGUAGE <LL@li.org>\n"
      "MIME-Version: 1.0\n"
      "Content-Type: text/plain; charset=cp936\n"
      "Content-Transfer-Encoding: 8bit\n"
      "Generated-By: pygettext.py 1.5\n"
    
    # 修改 charset 为 UTF-8,以及其他基本信息
    $ vim messages.pot
      # SOME DESCRIPTIVE TITLE.
      # Copyright (C) 2017 yuzhouwan.com
      # Benedict Jin <benedictjin2016@gmail.com>, 2017.
      #
      msgid ""
      msgstr ""
      "Project-Id-Version: Yuzhouwan v1.0.2\n"
      "POT-Creation-Date: 2017-12-28 11:24+0800\n"
      "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
      "Last-Translator: Benedict Jin <benedictjin2016@gmail.com>\n"
      "Language-Team: LANGUAGE <LL@li.org>\n"
      "MIME-Version: 1.0\n"
      "Content-Type: text/plain; charset=UTF-8\n"
      "Content-Transfer-Encoding: 8bit\n"
      "Generated-By: pygettext.py 1.5\n"
    
    # 使用 PoEdit 打开,并且保存为 po 文件(messages.pot - messages.po)
    # 移动到 locale 目录下
    $ mv messages.po locale/cn/LC_MESSAGES
    
    # 增加两段翻译
    $ vim messages.po
      # SOME DESCRIPTIVE TITLE.
      # Copyright (C) 2017 yuzhouwan.com
      # Benedict Jin <benedictjin2016@gmail.com>, 2017.
      #
      msgid ""
      msgstr ""
      "Project-Id-Version: Yuzhouwan v1.0.2\n"
      "POT-Creation-Date: 2017-12-28 11:39+0800\n"
      "PO-Revision-Date: 2017-12-28 11:43+0800\n"
      "Language-Team: \n"
      "MIME-Version: 1.0\n"
      "Content-Type: text/plain; charset=UTF-8\n"
      "Content-Transfer-Encoding: 8bit\n"
      "Generated-By: pygettext.py 1.5\n"
      "X-Generator: Poedit 2.0.1\n"
      "Last-Translator: \n"
      "Plural-Forms: nplurals=2; plural=(n != 1);\n"
      "Language: zh\n"
    
      msgid "Hello, world!"
      msgstr "世界,你好!"
    
      msgid "yuzhouwan.com"
      msgstr "宇宙湾"
    

    编写 PO 程序

    import gettext
    import os
    
    
    def getLocStrings():
        current_dir = os.path.dirname(os.path.realpath(__file__))
        locale_dir = os.path.join(current_dir, "locale")
        print("Locale directory:", locale_dir)
        return gettext.translation('messages', locale_dir, ["zh_CN", "en-US"]).gettext
    
    
    _ = getLocStrings()
    
    print(_("Hello, world!"))
    print(_("yuzhouwan.com"))
    
    Locale directory: E:\Core Code\leetcode\i18n\locale
    世界,你好!
    宇宙湾
    

    json

    time

    import time
    
    # 拿到当前时间的字符串
    time.strftime('%Y-%m-%d %H:%M:%S', time.localtime())
    
    # 拿到秒级的时间戳
    int(time.mktime(time.strptime("2016-3-1 0:0:0", "%Y-%m-%d %H:%M:%S")))
    
    # 获取当前时间戳
    datetime.datetime.now().time()
    

    urllib

    Python 第三方库

    数据分析核心库

    Pandas

    SciPy

    NumPy

    import numpy as np
    arr = [2, 4, 6, 8, 10]
    print np.mean(arr)     # 平均值
    print np.median(arr)   # 中位数
    print np.std(arr)      # 标准差
    
      6.0
      6.0
      2.82842712475
    

    Tips: Full code is here.

    统计学

    Scrapy

    StatsModels

    NLP

    NLTK

    Gensim

    机器学习

    Scikit-learn

    人工智能

    TensorFlow

    Theano

    Keras

    可视化

    Matplotlib

    import numpy as np
    import matplotlib.pyplot as plt
    
    plt.figure(1)
    plt.figure(2)
    plt.figure(3)
    
    x = np.linspace(0, 6, 100)
    for i in range(3):
      plt.figure(1)
      plt.plot(x, np.sin(i * x))
      plt.figure(2)
      plt.plot(x, np.cos(i * x))
      plt.figure(3)
      plt.plot(x, np.tan(i * x))
    
    plt.show()
    plt.close()
    

    https://picture.yuzhouwan.com/python_matplotlib_sin.png?imageslim

    https://picture.yuzhouwan.com/python_matplotlib_cos.png?imageslim

    https://picture.yuzhouwan.com/python_matplotlib_tan.png?imageslim

    Seaborn

    Bokeh

    Plotly

    地图

    GeoplotLib

    MapBox

    图像处理

    PIL

    爬虫

    lxml

    from lxml import etree
    import requests
    
    
    def get_ide_id(job_id, tag_name):
        # view-source:http://historyserver-yuzhouwan:19888/jobhistory/conf/job_1010101010101_0101010
        url = "http://historyserver-yuzhouwan:19888/jobhistory/conf/" + job_id
        page = requests.get(url)
        html = page.text
        selector = etree.HTML(html)
        tds = selector.xpath("//*[@id='conf']//tbody//tr//td//text()")
        exist = False
        for td in tds:
            if tag_name in td:
                exist = True
                continue
            if exist:
                return td.strip()
    
    
    print(get_ide_id("job_1010101010101_0101010", "hive.ide.job.id"))
    

    科学分析工具

    IPython Notebook

    安装

    # 安装之前需要确定 pip 版本足够高,以及环境变量中加入了 %PYTHON_HOME%/Script
    $ python -m pip install --upgrade pip
    
    # 下载 Enthought Canopy 套件 (https://www.enthought.com/canopy-subscriptions/)
    # 安装后,配置环境变量
    $ PATH=D:\apps\Enthought\Canopy\App;%PATH%
    # 安装
    $ pip install "ipython[all]"
    # 启动
    $ mkdir ipython
    $ cd ipython
    $ ipython notebook
    $ ipython notebook --pylab             # pylab 模式
    $ ipython notebook --pylab inline      # Matplotlib 生成的图片嵌入网页内显示
    

    配置

    # 创建默认配置文件
    $ jupyter notebook --generate-config
      Writing default config to: C:\Users\BenedictJin\.jupyter\jupyter_notebook_config.py
    
    # 修改默认工作区
    $ vim ~/.jupyter/jupyter_notebook_config.py
      c.NotebookApp.notebook_dir = 'F:\Github\_draft\ipython'
    
    # 重启,验证
    $ ipython notebook
    

    格式转换

    $ ipython c --to markdown --execute Basic.ipynb
    # 或者使用 notedown 进行转换 (https://github.com/aaren/notedown)
    $ pip install notedown
    

    实用技巧

    嵌入 Markdown

    iPython 创建好 .ipynb文件后,在 markdown 使用 <iframe>标签,就可以将完成嵌入

    <iframe src="https://nbviewer.jupyter.org/github/asdf2014/yuzhouwan/blob/master/yuzhouwan-hacker/yuzhouwan-hacker-python/src/main/resources/ipython/Basic.ipynb" width="640" height="700" frameborder="0"></iframe>
    

    如此一来,可以将 matplotlib 画出的可视化图形,展示出来,而非仅仅一段 python 脚本,实际效果如下:
    <iframe src="https://nbviewer.jupyter.org/github/asdf2014/yuzhouwan/blob/master/yuzhouwan-hacker/yuzhouwan-hacker-python/src/main/resources/ipython/Basic.ipynb" width="640" height="700" frameborder="0"></iframe>

    Tips: 如果你的博客也是全站 HTTPS 的话,则需要保证 iframe 里面加载的资源也是 https 的,否则 chrome 会阻止混合内容的展示

    帮助文档

    ? 单问号,可以展示出 对应函数、类、变量的文档,而使用 ?? 双问号,则可以将对应的源码展示出来

    $ a = 1
    $ a?
      Type:        int
      String form: 1
      Docstring:  
      int(x=0) -> int or long
      int(x, base=10) -> int or long
    
      Convert a number or string to an integer, or return 0 if no arguments
      are given.  If x is floating point, the conversion truncates towards zero.
      If x is outside the integer range, the function returns a long instead.
    
      If x is not a number or if base is given, then x must be a string or
      Unicode object representing an integer literal in the given base.  The
      literal can be preceded by '+' or '-' and be surrounded by whitespace.
      The base defaults to 10.  Valid bases are 0 and 2-36.  Base 0 means to
      interpret the base from the string as an integer literal.
      >>> int('0b100', base=0)
      4
    
    $ a??
      Type:        int
      String form: 1
    
    # 另外,推荐使用 "shift + tab",可以快速展示方法的详细描述
    
    配置 iPython Notebook 支持Python 3
    # 安装 python3
    $ which python
      /d/apps/Python/Python35/python
    
    # 安装 iPython kernel
    $ python -m pip install ipykernel
    $ python -m ipykernel install --user
    
    # 安装 notebook
    $ which pip
      /d/apps/Python/Python35/Scripts/pip
    $ pip install notebook
    

    Python 工程工具

    Tox

    VirtualEnv

    实战技巧

    设置 Proxy

    $ export http_proxy="http://127.0.0.1:1080"
    $ export https_proxy="https://127.0.0.1:1080"
    $ export socks5_proxy="socks5://127.0.0.1:1080"
    # pip install --upgrade pip
    

    Remote Debug

    我们需要达到的效果是,本地通过 断点直接对 Python 代码进行 Debug修改,并在 Ctrl+S 之后会通过 SFTP 直接上传远程服务器,待全部修改部署完成,自动通过 Flask 自动 reload 最新代码,并自动重启远程 Python 进程,在本地直接看到修改之后的线上效果。(这里我们以 Airbnb的 Superset 项目为基础来介绍)

    PyCharm

    Windows 开发机
    ## local
    # should shutdown local firewall firstly
    $ cd .\JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg
    $ easy_install pycharm-debug.egg
    # 若运行使用的是 Python3,则需要 pycharm-debug-py3k.egg
    
    # Run/Debug Configuration - SuperSet Remote Debug - 192.168.3.10(local ip) - 12345(port > 10000), will generate..
    import pydevd
    pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
    
    # Path mappings
    E:/Core Code/superset=/root/superset
    
    # SFTP
    # copy a project to a local directory.
    # configure: tools - deployment, to upload this local copy to remote server
    # config remote host
    
    192.168.1.10 SFTP 192.168.1.10 22 /root/superset-0.15.4 root/****** UTF-8       # 脱敏
    # Tools - Deployment - Options - Upload changed files automatically to the default server (On explicit save action (Ctrl+S))
    
    # make deployment automatic: tools - deployment - "automatic upload"
    # add remote interpreter: file - settings - python interpreters - "+" - "Remote.."
    
    # Start Debug
    Starting debug server at port 12345
    Use the following code to connect to the debugger:
    import pydevd
    pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
    Waiting for process connection...
    Connected to pydev debugger (build 162.1967.10)
    Starting server with command: gunicorn -w 2 --timeout 60 -b 0.0.0.0:9097 --limit-request-line 0 --limit-request-field_size 0 superset:app
    
    远程 Linux 运行环境
    ## remote
    $ cd /root/superset
    $ source bin/activate
    $ cd /root/superset/lib
    # cp \JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg 到 lib 目录中
    $ easy_install pycharm-debug.egg
    
    # trouble shooting
    >>> import pydevd
    
    # restart
    $ vim /root/superset/bin/superset
    
      import pydevd
      pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
    
    # After local debug, then start superset
    $ mkdir logs
    $ nohup superset runserver -a 0.0.0.0 -p 9097 2>&1 > logs/superset.log &
    
    
    # Flask - Werkzeug debugger
    2017-02-07 15:47:03,905:WARNING:werkzeug: * Debugger is active!
    2017-02-07 15:47:03,905:INFO:werkzeug: * Debugger pin code: 330-765-812
    
    $ pip install django-debug-toolbar
    
    $ vim lib/python2.7/site-packages/pycharm-debug.egg/tests_pydevd_python/my_django_proj_17/my_django_proj_17/settings.py
    
      INSTALLED_APPS = (
        'django.contrib.admin',
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.messages',
        'django.contrib.staticfiles',
        'debug_toolbar',                       # add
        'my_app',
      )
    
    # enable django
    Setting - Language & Frameworks - Django - "Enable Django Support"
    
    E:\Core Code\superset-0.15.4\bin\superset runserver -a '0.0.0.0' -p 9097
    
    ############################# PyDevd is so stiff! Let's Try Remote Python. #############################
    
    
    # 配置 SFTP (同上)
    # 配置 Remote Python
    
      File - Settings - Project: superset-0.15.4 - Project Interpreter - show all(+) - 
    
        name:                        Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)
        SSH Credentials
        Host:                        192.168.1.10 Port: 22
        User name:                   root
        Auth type:                   Password      # 脱敏
        Python interpreter path:     /root/superset-0.15.4/bin/python
        PyCharm helpers path:        /root/superset-0.15.4/.pycharm_helpers
    
    # 如果发现无法识别,可能是 python 缺少运行权限
    $ cd /root/superset-0.15.4/bin && chmod 777 *
    
    PyCharm 相关配置
    # 配置 Python 运行项目
    
      Run - Run/Debug Configurations(+) - Python - 
    
        Name:                     superset
        Script:                   E:\Core Code\superset-0.15.4\bin\superset
        Script parameters:        runserver -d -p 9097
        Environment Variables:    VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1
        Python interpreter:       Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)     # 上面配置的 remote python
        Working directory:        E:\Core Code\superset-0.15.4\bin
        Path mapping:             E:/Core Code/superset-0.15.4=/root/superset-0.15.4
    
    
    # 在用远程 python 进行 remote debug 之前,进入到 virtualenv 中
    # 这里有可能找不到 activate 文件,可直接添加
    
      File - Settings - Tools - Terminal - Shell path
    
        /bin/bash --rcfile ~/.pycharmrc
    
    
    $ vim '/e/Core Code/superset-0.15.4/.pycharmrc'     # 本地工程增加 .pycharmrc
    
      VIRTUAL_ENV="/root/superset-0.15.4"               # 远程服务器中的 virtualenv 目录 (可以直接将 bin/activate 文件内容复制过来)
      export VIRTUAL_ENV
    
    
    # 远程服务器上多了两个进程
    $ ps -ef | grep superset | grep -v grep
    
      root      8638 10912  0 15:24 pts/1    00:00:00 bash -c cd /root/superset-0.15.4/bin; env "IDE_PROJECT_ROOTS"="/root/superset-0.15.4" "IPYTHONENABLE"="True" "PYTHONPATH"="/root/superset-0.15.4:/root/superset-0.15.4/.pycharm_helpers/pydev" "PYTHONUNBUFFERED"="1" "PYCHARM_HOSTED"="1" "VIRTUALENVWRAPPER_PYTHON"="E:\Core Code\superset-0.15.4\bin\python" "LIBRARY_ROOTS"="C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/544046706;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/964856790;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1532312494;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/python_stubs/250609560;D:/apps/JetBrains/PyCharm 2016.3.2/helpers/python-skeletons" "PYTHONDONTWRITEBYTECODE"="1" "JETBRAINS_REMOTE_RUN"="1" "PYTHONIOENCODING"="UTF-8" /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client '0.0.0.0' --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
      root      8660  8638 11 15:24 pts/1    00:00:17 /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
      root      8715  8660 28 15:24 pts/1    00:00:38 /root/superset-0.15.4/bin/python /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
    
    完成
    # 本地 windows 上访问
    http://192.168.1.10:9097/login/
    

    Visual Studio Code

    Not good for me! You can still try it if you are interested.

    踩过的坑

    Gunicorn 预开启了多个 Work 子进程,无法 Remote Debug
    描述

    在本地 windows 开发机上,远程连接 linux 上运行在 virtualenv 里的 superset,发现可以 debug,但是 superset 里的 gunicorn 用的是 prefork 模型,开启了好多个 work 子进程

    解决

    a) 正常的 remote debug 来处理 --not ok

    Connected to pydev debugger (build 162.1967.10)
    [2017-02-06 18:13:22 +0000] [13609] [INFO] Starting gunicorn 19.6.0
    [2017-02-06 18:13:22 +0000] [13609] [INFO] Listening at: http://0.0.0.0:9097 (13609)
    [2017-02-06 18:13:22 +0000] [13609] [INFO] Using worker: sync
    [2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13624)
    [2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13623)
    

    b) 所以用 "Django server" 替换 "Python Remote Debug" 来进行调试 --not ok

    配置的 Remote Python 明明是 /root/superset/bin/python,但是看到 报错信息里面,用的却是 /usr/local/bin/python

    c) ipdb --not good

    将 gunicorn 进程切换到前台,在 命令行用 ipdb 进行 debug

    d) 增加 -w 参数,控制 work 数量 --not ok

    @manager.option(
    '-w', '--workers', default=config.get("SUPERSET_WORKERS", 2),    # default: 2
    help="Number of gunicorn web server workers to fire up")
    
    $ superset runserver -a 0.0.0.0 -p 9097 -w 0
    

    e) 关闭 gunicorn --ok

    只有在压测时候,才需要开启 gunicorn
     superset runserver -d -p 9097

    Trying to add breakpoint to file that does not exist
    描述
    pydev debugger: warning: trying to add breakpoint to file that does not exist: /root/superset/d:/apps/python27/lib/site-packages/gunicorn/arbiter.py
    
    解决

    a) 增加 python 中 site-packages 的 mapping 映射 --not good

    E:/Core Code/superset=/root/superset;D:/apps/Python27=/root/superset/lib/python2.7
    

    b) 修改 python 为 superset 项目中的 python,而不是本机的 python --ok

    同步到本机的 python 不是 python.exe --no
     使用 remote python --ok

    Couldn't obtain remote socket
    描述
    Error running superset
    Can't run remote python interpreter: Couldn't obtain remote socket from output ('0.0.0.0', 52703), stderr /usr/local/bin/python: No module named virtualenvwrapper virtualenvwrapper.sh: There was a problem running the initialization hooks. 
    If Python could not import the module virtualenvwrapper.hook_loader, check that virtualenvwrapper has been installed for VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python and that PATH is set properly.
    
    解决
    # 查看 PATH 是否包含 venvWapper 的环境变量
    $ echo $PATH
    
    # 没有,则检查 ~/.bashrc,将其注释
    # Source global definitions
    # export WORKON_HOME=~/virtualenv
    # source /usr/local/bin/virtualenvwrapper.sh
    

    Vagrant

    Vagrant 是一款可以自动化虚拟机的 安装和配置流程的软件

    下载

    # Vagrant
      https://www.vagrantup.com/downloads.html
    
    # VirtualBox
      https://www.virtualbox.org/wiki/Downloads
      http://download.virtualbox.org/virtualbox/5.1.12/      # better
      https://hashicorp-files.hashicorp.com/lucid32.box      # not good
      https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box      # best
    
    # 相关镜像
      https://atlas.hashicorp.com/boxes/search
      http://chef.github.io/bento/
    
    # 安装完成之后,需要 cmd/pycharm/git dash 等等,最好重启电脑
    

    使用

    $ vagrant box add superset /f/软件库/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box
    
      ==> box: Box file was not detected as metadata. Adding it directly...
      ==> box: Adding box 'superset' (v0) for provider:
          box: Unpacking necessary files from: file:///F:/%C8%ED%BC%FE%BF%E2/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box
          box:
      ==> box: Successfully added box 'superset' (v0) for 'virtualbox'!
    
    $ vagrant box list
      superset (virtualbox, 0)
    
    $ vagrant init
    
      A `Vagrantfile` has been placed in this directory. You are now ready to `vagrant up` your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on `vagrantup.com` for more information on using Vagrant.
    
    $ vim /e/vagrant/superset-0.15.4/Vagrantfile
    
      # -*- mode: ruby -*-
      # vi: set ft=ruby :
    
      # Vagrant.configure("2") do |config|
      #   config.vm.box = "superset"
      #   config.vm.box_check_update = false
      #   config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
      #   config.vm.synced_folder "./", "/root/superset-0.15.4"
      # 
      #   config.vm.network "public_network"
      #   config.vm.provider "virtualbox" do |vb|
      #     vb.gui = true
      #     vb.memory = "1024"
      #   end
      #   config.vm.provision "shell", inline: <<-SHELL
      #     apt-get update
      #   SHELL
      # end
    
    $ vagrant up --provide virtualbox
    
      Bringing machine 'default' up with 'virtualbox' provider...
      ==> default: Importing base box 'superset'...
      ==> default: Matching MAC address for NAT networking...
      ==> default: Setting the name of the VM: superset-0154_default_1486969836220_44233
      ==> default: Clearing any previously set forwarded ports...
      ==> default: Clearing any previously set network interfaces...
      ==> default: Preparing network interfaces based on configuration...
          default: Adapter 1: nat
          default: Adapter 2: hostonly
      ==> default: Forwarding ports...
          default: 22 (guest) => 2122 (host) (adapter 1)
          default: 80 (guest) => 6080 (host) (adapter 1)
          default: 6079 (guest) => 6079 (host) (adapter 1)
          default: 22 (guest) => 2222 (host) (adapter 1)
      ==> default: Running 'pre-boot' VM customizations...
      ==> default: Booting VM...
      ==> default: Waiting for machine to boot. This may take a few minutes...
          default: SSH address: 127.0.0.1:2222
          default: SSH username: vagrant
          default: SSH auth method: private key
    

    踩过的坑

    Provider 'virtualbox' not found
    描述
    $ vagrant up
      ==>  Provider 'virtualbox' not found. We'll automatically install it now...
      The installation process will start below. Human interaction may be required at some points. If you're uncomfortable with automatically installing this provider, you can safely Ctrl-C this process and install it manually.
      ==>  Downloading VirtualBox 5.0.10...
      This may not be the latest version of VirtualBox, but it is a version that is known to work well. Over time, we'll update the version that is installed.
    
    解决

    vagrant up --provider=virtualbox

    Timed out while waiting for the machine to boot
    描述
    子目录或文件 -p 已经存在。
    处理: -p 时出错。
    子目录或文件 charms 已经存在。
    处理: charms 时出错。
    Timed out while waiting for the machine to boot. This means that Vagrant was unable to communicate with the guest machine within the configured ("config.vm.boot_timeout" value) time period.
    
    If you look above, you should be able to see the error(s) that Vagrant had when attempting to connect to the machine. These errors are usually good hints as to what may be wrong.
    
    If you're using a custom box, make sure that networking is properly working and you're able to connect to the machine. It is a common problem that networking isn't setup properly in these boxes. Verify that authentication configurations are also setup properly, as well.
    
    If the box appears to be booting properly, you may want to increase the timeout ("config.vm.boot_timeout") value.'
    
    解决

    升级 VirtualBox 到 5.1.12

    default: stdin: is not a tty
    描述

    default: stdin: is not a tty

    解决
    config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
    
    参考

    Unittest

    -t 改变 顶级 package 路径

    The discover sub-command has the following options:
    
      -v, --verbose                         Verbose output
      -s, --start-directory directory       Directory to start discovery (. default)
      -p, --pattern pattern                 Pattern to match test files (test*.py default)
      -t, --top-level-directory directory   Top level directory of project (defaults to start directory)
    
    Name                          druid_tests
    Script                        E:\Core Code\superset-0.15.4\code\tests\druid_tests.py
    Environment variables         VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1
    Python interpreter            Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)
    Interpreter options           -m tests.druid_tests
    Working directory             E:\Core Code\superset-0.15.4\code\
    Path mappings                 E:/Core Code/superset-0.15.4=/root/superset-0.15.4
    
    $ export SUPERSET_CONFIG=tests.superset_test_config
    $ python -m tests.druid_tests discover . "druid_tests.py"
    
    # 测试完成之后,需要 unset掉 SUPERSET_CONFIG
    $ unset SUPERSET_CONFIG
    

    踩过的坑

    UnicodeDecodeError: 'gbk' codec can't decode byte 0x87 in position illegal multibyte sequence

    解决

    # 在程序开头,指定编码,并在 open 文件的时候,指定 encoding 属性
    
    # -*- coding:utf8 -*-
    open(fname, "r", encoding="utf8")
    

    connection broken by SSLError

    解决

    $ python -m pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org --upgrade pip
    

    ModuleNotFoundError: No module named 'yaml'

    解决

    $ pip install pyyaml
    

    欢迎直接访问我的个人博客,阅读效果更佳:https://yuzhouwan.com/posts/43687/

    相关文章

      网友评论

        本文标题:Python

        本文链接:https://www.haomeiwen.com/subject/ykdlaqtx.html