什么是 Python?
Python is a programming language that lets you work quickly and integrate systems more effectively.
为什么要有 Python?
胶水语言
胶水语言,能够把用其他语言制作的各种模块(尤其是 C
/C++
)很轻松地联结在一起
脚本语言
ABC 语言的一种继承
缩短传统的 编写
- 编译
- 链接
- 运行
(edit
-compile
-link
-run
)过程
环境部署
Python 安装
Linux 基础环境
$ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y
Python 编译安装
# 在 python ftp 服务器中下载到 对应版本的 python
$ wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz
# 编译
$ tar -zxvf Python-3.6.8.tgz
$ cd /usr/local/Python-3.6.8
$ ./configure --prefix=/usr/local/python36
$ make
$ make install
$ ls /usr/local/python36/ -al
total 24
drwxr-xr-x 6 root root 4096 Jan 30 11:10 .
drwxr-xr-x 1 root root 4096 Jan 30 11:09 ..
drwxr-xr-x 2 root root 4096 Jan 30 11:10 bin
drwxr-xr-x 3 root root 4096 Jan 30 11:10 include
drwxr-xr-x 4 root root 4096 Jan 30 11:10 lib
drwxr-xr-x 3 root root 4096 Jan 30 11:10 share
覆盖旧版 Python
# 覆盖原来的 python6
$ which python
/usr/bin/python
$ /usr/local/python36/bin/python3.6 -V
Python 3.6.8
$ mv /usr/bin/python /usr/bin/python_old
$ ln -s /usr/local/python36/bin/python3.6 /usr/bin/python
$ python -V
Python 3.6.8
恢复 yum 中旧版 Python 的引用
# 修改 yum 引用的 python 版本为旧版 2.6 的 python
$ vim /usr/bin/yum
# 第一行修改为 python2.6
#!/usr/bin/python2.6
$ yum --version | sed '2,$d'
3.2.29
Pip
安装
在线
$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)
# upgrade setup tools and pip
$ pip install --upgrade setuptools pip
离线
# https://pypi.org/project/setuptools/#files 下载 setuptools-40.7.1.zip
$ unzip setuptools-40.7.1.zip
$ cd setuptools-40.7.1
$ python setup.py install
# https://pypi.org/project/pip/#files 下载 pip-19.0.1.tar.gz
$ tar zxvf pip-19.0.1.tar.gz
$ cd pip-19.0.1
$ python setup.py install
$ python -m pip -V
pip 18.1 from /usr/local/python36/lib/python3.6/site-packages/pip (python 3.6)
# 环境变量
$ vim ~/.bashrc
export PATH=$PATH:/usr/local/python36/bin
$ source ~/.bashrc
$ pip -V
pip 19.0.1 from /usr/local/python36/lib/python3.6/site-packages/pip-19.0.1-py3.6.egg/pip (python 3.6)
VirtualEnv
这里我们以 Apache Superset 为例,更多相关内容,详见我的另一篇博客《Apache Superset 二次开发》
解压安装
$ pip install virtualenv
# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ source venv/bin/activate
# 如果希望 virtualEnv 的隔离环境,能够访问系统全局的 site-packages 目录,可以增加 `--system-site-packages` 参数
# virtualenv -p /usr/local/bin/python --system-site-packages venv
# 另外,如果考虑到便于拷贝,使得 virtualEnv 中依赖的文件,都是复制进来的,而非软链接,则增加 `--always-copy` 参数
# virtualenv -p /usr/local/bin/python --always-copy venv
## 【Offline环境】安装 virtualenv
# 在 https://pypi.python.org/pypi/virtualenv#downloads 页面,下载 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install
$ virtualenv --version
15.1.0
部署上线
拷贝
# rsync 替换 scp 可以确保软链接 也能被 cp
$ rsync -avuz -e ssh /home/superset/superset-0.15.4/ yuzhouwan@middle:/home/yuzhouwan/superset-0.15.4
//...
sent 142935894 bytes received 180102 bytes 3920986.19 bytes/sec
total size is 359739823 speedup is 2.51
# 在 本机 和 目标机器 的 Superset 目录下,校验文件数量
$ find | wc -l
10113
# 重复以上步骤,从跳板机 rsync 到线上机器
$ rsync -avuz -e ssh /home/yuzhouwan/superset-0.15.4/ root@192.168.2.10:/home/superset/superset-0.15.4
# virtualenv 创建依赖的 python
$ rsync -avuz -e ssh /root/software yuzhouwan@middle:/home/yuzhouwan
$ rsync -avuz -e ssh /home/yuzhouwan/software root@druid-prd01:/root
$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12
$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep / # nessnary!!
$ python -V
Python 2.7.12
动态链接库
# 虽然软链接已经 rsync 过来了,但是 目标机器相关目录下,没有对应的 python 的动态链接库
$ file /root/superset/lib/python2.7/lib-dynload
/root/superset/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload`
# 需要和联网环境中,创建 virtualenv 时的 python 全局环境一致
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ ls /usr/local/python27/lib/python2.7/lib-dynload -sail
VirtualEnvWrapper
# VirtualEnv Wrapper 是 virtualenv 的扩展工具,可以方便的创建、删除、复制、切换不同的虚拟环境
$ pip install virtualenvwrapper
$ mkdir ~/workspaces
$ vim ~/.bashrc
# 增加
export WORKON_HOME=~/virtualenv
source /usr/local/bin/virtualenvwrapper.sh
$ mkvirtualenv --python=/usr/bin/python superset
Running virtualenv with interpreter /usr/bin/python
New python executable in /root/virtualenv/superset/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
(superset) [root@superset01 virtualenv]# deactivate
$ workon superset
(superset) [root@superset01 virtualenv]# lsvirtualenv -b
superset
基本语法
基本数据类型
int
int 类型的最大值
>>> import sys
>>> sys.maxsize
9223372036854775807
# 该值取决于你的操作系统位数
>>> pow(2, 63) - 1
9223372036854775807
>>> 1 << 64 - 1
9223372036854775808
float
inf 无穷大
>>> float('inf')
inf
>>> float('Inf')
inf
>>> float('inf') > 0
True
>>> float('inf') < 0
False
>>> float('inf') > 9999999999
True
>>> float('inf') > 9999999999999999999999
True
>>> float('-inf') < -9999999999999999999999
True
# inf、Inf、INF 都是可以表示无穷大的(infinity),这里没有大小写的规定
# inf 表示正无穷,而 -inf 表示为负无穷
>>> float('Inf') == float('inf') == -float('-inf') == -float('-Inf')
True
string
split
>>> 'a b c'.split(' ')
['a', 'b', 'c']
>>> 'a b c'.split(' ', 1)
['a', 'b c']
>>> 'a b c'.split(' ', 2)
['a', 'b', 'c']
类型转换
>>> int(1)
1
>>> float(1.0)
1.0
占位符
>>> "speed: %skm/h" % 16.8
'speed: 16.8km/h'
>>> "(%s, %s)" % ("percent", 99.97)
'(percent, 99.97)'
打印
不换行
>>> print("[]", end="")
[]>>>
OS
操作系统相关
# 获取操作系统特定的路径分割符(Windows: '\\';Linux/Unix: '/')
os.sep
# 字符串表示正在使用的平台(Windows: 'nt';Linux/Unix: 'posix')
os.name
# 字符串给出当前平台使用的行终止符(Windows: '\r\n';Linux: '\n';Mac: '\r')
os.linesep
# 函数用来运行 shell 命令
os.system(shell)
# 获得当前工作目录
os.getcwd()
# 获取 / 设置 环境变量
os.getenv(key) / os.putenv(key, value)
# 获得当前进程的 PID
os.getpid()
获取文件/路径信息
# 返回指定目录下的所有文件和目录名,v3.5 之后被替换为 scandir
os.listdir(path)
# 函数返回路径 path 的目录名和文件名
os.path.split(path)
# 判断路径是一个文件还是目录
os.path.isfile(path) / os.path.isdir(path)
# 判断路径是否是软链接
os.path.islink(path)
# 判断是否存在文件或目录
os.path.exists(path)
# 获得文件大小,如果 path 是目录返回 0L
os.path.getsize(path)
# 获得绝对路径
os.path.abspath(path)
# 规范 path 字符串形式
os.path.normpath(path)
# 分割文件名与目录
os.path.split(path)
# 分离文件名与扩展名
os.path.splitext(path)
# 连接目录与文件名或目录
os.path.join(path, file)
# 返回文件名
os.path.basename(path)
# 返回文件路径
os.path.dirname(path)
实际操作文件 / 路径
# 返回但前目录
os.curdir
# 改变工作目录到 path
os.chdir(path)
# 删除文件
os.remove(path)
# 删除目录
os.rmdir(path)
# 递归删除目录,删除 'foo/bar/baz',意味着依次删除 'foo/bar/baz' - 'foo/bar' - 'foo'
os.removedirs(path)
读取文件
def open_file(f = ""):
if not os.path.exists(f):
print("File not exists, path is %s!" % f)
return
with open(f, "r+", encoding = "utf8") as of:
return of.readlines()
执行 shell 命令
>>> import os
>>> exit_code = os.system("source ~/.bashrc")
>>> exit_code
0
JSON
加载与提取
>>> user = json.loads('{"name":"benedict","infos":{"age":0,"blog":"yuzhouwan.com"}}')
>>> user['name']
'benedict'
>>> user['infos']['blog']
'yuzhouwan.com'
与 YAML 格式互换
import json
import sys
import yaml
# json2yaml
sys.stdout.write(yaml.dump(json.load(sys.stdin)))
# yaml2json
sys.stdout.write(json.dumps(yaml.load(sys.stdin)))
集合
map
赋值 / 取值
>>> kv_map = {}
>>> kv_map["k"] = "v"
>>> kv_map
{'k': 'v'}
>>> kv_map["k"]
'v'
排序
>>> costs = {"b": 2, "a": 1, "c": 3}
>>> costs
{'b': 2, 'c': 3, 'a': 1}
# 按照 Key 排序
>>> sorted(costs)
['a', 'b', 'c']
>>> sorted(costs.keys())
['a', 'b', 'c']
# 按照 Value 排序
>>> sorted(costs.values())
[1, 2, 3]
>>> [ (k, costs[k]) for k in sorted(costs, key=costs.get, reverse=False) ]
[('a', 1), ('b', 2), ('c', 3)]
>>> sorted(costs.items(), key=lambda item: item[1], reverse=True)
[('c', 3), ('b', 2), ('a', 1)]
遍历
>>> for k, v in costs_sorted:
... print(k, v)
...
a 1
b 2
c 3
求和
>>> sum({"b": 2, "a": 1, "c": 3}.values())
6
list
# range(start, stop, step)
# 参数三 如果是负数,则是倒序遍历
# 注意 [start, stop) 是前闭后开的
>>> [ _ for _ in range(3, 0, -1)]
[3, 2, 1]
流程控制
if-else
>>> -1 if True else 0
-1
>>> -1 if False else 0
0
算术运算
除以并返回商的整数值
>>> 1 // 1
1
>>> 2 // 1
2
>>> 3 // 1
3
>>> 1 // 2
0
>>> 2 // 2
1
>>> 3 // 2
1
>>> 4 // 2
2
>>> 5 // 2
2
>>> 6 // 2
3
逻辑运算
& vs. and
>>> True & False
False
>>> True and False
False
>>> 10 > 1 & 10 < 1
True
>>> 10 > 1 and 10 < 1
False
位运算
位运算 | 运算符 | 运算规则 | |
---|---|---|---|
与运算 | & |
A 与 B 值均为 1 时,结果才为 1,否则为 0 | |
或运算 | ` | ` | A 或 B 值为 1 时,结果才为 1,否则为 0 |
异或运算 | ^ |
A 与 B 不同为 0 或 1 时,结果才为 1,否则为 0 | |
按位取反 | ~ |
取反二进制数,0 取 1,1 取 0 |
切片
获取列表的一部分
>>> [1, 2, 3][:1]
[1]
>>> [1, 2, 3][:2]
[1, 2]
>>> [1, 2, 3][:3]
[1, 2, 3]
>>> [1, 2, 3][1::]
[2, 3]
获取整个列表
>>> [1, 2, 3][:]
[1, 2, 3]
反转
# 反转列表
>>> [1, 2, 3][::-1]
[3, 2, 1]
# 反转字符串
>>> 'nawuohzuy'[::-1]
'yuzhouwan'
对列表的切片赋值
>>> l = list(range(10))
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> l[0:3] = [0, -1, -2]
>>> l
[0, -1, -2, 3, 4, 5, 6, 7, 8, 9]
>>> l[2::3] = [0, 0, 0]
>>> l
[0, -1, 0, 3, 4, 0, 6, 7, 0, 9]
Python 标准库
argparse
datetime
# 获取当前时间
datetime.datetime.now().time()
ftplib
gettext
制作 PO 文件
# 生成模板
$ python D:\apps\Python\Python35\Tools\i18n\pygettext.py
$ cat messages.pot
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2017-12-28 11:24+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=cp936\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
# 修改 charset 为 UTF-8,以及其他基本信息
$ vim messages.pot
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2017 yuzhouwan.com
# Benedict Jin <benedictjin2016@gmail.com>, 2017.
#
msgid ""
msgstr ""
"Project-Id-Version: Yuzhouwan v1.0.2\n"
"POT-Creation-Date: 2017-12-28 11:24+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: Benedict Jin <benedictjin2016@gmail.com>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
# 使用 PoEdit 打开,并且保存为 po 文件(messages.pot - messages.po)
# 移动到 locale 目录下
$ mv messages.po locale/cn/LC_MESSAGES
# 增加两段翻译
$ vim messages.po
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2017 yuzhouwan.com
# Benedict Jin <benedictjin2016@gmail.com>, 2017.
#
msgid ""
msgstr ""
"Project-Id-Version: Yuzhouwan v1.0.2\n"
"POT-Creation-Date: 2017-12-28 11:39+0800\n"
"PO-Revision-Date: 2017-12-28 11:43+0800\n"
"Language-Team: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
"X-Generator: Poedit 2.0.1\n"
"Last-Translator: \n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"Language: zh\n"
msgid "Hello, world!"
msgstr "世界,你好!"
msgid "yuzhouwan.com"
msgstr "宇宙湾"
编写 PO 程序
import gettext
import os
def getLocStrings():
current_dir = os.path.dirname(os.path.realpath(__file__))
locale_dir = os.path.join(current_dir, "locale")
print("Locale directory:", locale_dir)
return gettext.translation('messages', locale_dir, ["zh_CN", "en-US"]).gettext
_ = getLocStrings()
print(_("Hello, world!"))
print(_("yuzhouwan.com"))
Locale directory: E:\Core Code\leetcode\i18n\locale
世界,你好!
宇宙湾
json
time
import time
# 拿到当前时间的字符串
time.strftime('%Y-%m-%d %H:%M:%S', time.localtime())
# 拿到秒级的时间戳
int(time.mktime(time.strptime("2016-3-1 0:0:0", "%Y-%m-%d %H:%M:%S")))
# 获取当前时间戳
datetime.datetime.now().time()
urllib
Python 第三方库
数据分析核心库
Pandas
SciPy
NumPy
import numpy as np
arr = [2, 4, 6, 8, 10]
print np.mean(arr) # 平均值
print np.median(arr) # 中位数
print np.std(arr) # 标准差
6.0
6.0
2.82842712475
Tips: Full code is here.
统计学
Scrapy
StatsModels
NLP
NLTK
Gensim
机器学习
Scikit-learn
人工智能
TensorFlow
Theano
Keras
可视化
Matplotlib
import numpy as np
import matplotlib.pyplot as plt
plt.figure(1)
plt.figure(2)
plt.figure(3)
x = np.linspace(0, 6, 100)
for i in range(3):
plt.figure(1)
plt.plot(x, np.sin(i * x))
plt.figure(2)
plt.plot(x, np.cos(i * x))
plt.figure(3)
plt.plot(x, np.tan(i * x))
plt.show()
plt.close()
https://picture.yuzhouwan.com/python_matplotlib_sin.png?imageslim
https://picture.yuzhouwan.com/python_matplotlib_cos.png?imageslim
https://picture.yuzhouwan.com/python_matplotlib_tan.png?imageslim
Seaborn
Bokeh
Plotly
地图
GeoplotLib
MapBox
图像处理
PIL
爬虫
lxml
from lxml import etree
import requests
def get_ide_id(job_id, tag_name):
# view-source:http://historyserver-yuzhouwan:19888/jobhistory/conf/job_1010101010101_0101010
url = "http://historyserver-yuzhouwan:19888/jobhistory/conf/" + job_id
page = requests.get(url)
html = page.text
selector = etree.HTML(html)
tds = selector.xpath("//*[@id='conf']//tbody//tr//td//text()")
exist = False
for td in tds:
if tag_name in td:
exist = True
continue
if exist:
return td.strip()
print(get_ide_id("job_1010101010101_0101010", "hive.ide.job.id"))
科学分析工具
IPython Notebook
安装
# 安装之前需要确定 pip 版本足够高,以及环境变量中加入了 %PYTHON_HOME%/Script
$ python -m pip install --upgrade pip
# 下载 Enthought Canopy 套件 (https://www.enthought.com/canopy-subscriptions/)
# 安装后,配置环境变量
$ PATH=D:\apps\Enthought\Canopy\App;%PATH%
# 安装
$ pip install "ipython[all]"
# 启动
$ mkdir ipython
$ cd ipython
$ ipython notebook
$ ipython notebook --pylab # pylab 模式
$ ipython notebook --pylab inline # Matplotlib 生成的图片嵌入网页内显示
配置
# 创建默认配置文件
$ jupyter notebook --generate-config
Writing default config to: C:\Users\BenedictJin\.jupyter\jupyter_notebook_config.py
# 修改默认工作区
$ vim ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.notebook_dir = 'F:\Github\_draft\ipython'
# 重启,验证
$ ipython notebook
格式转换
$ ipython c --to markdown --execute Basic.ipynb
# 或者使用 notedown 进行转换 (https://github.com/aaren/notedown)
$ pip install notedown
实用技巧
嵌入 Markdown
iPython 创建好 .ipynb
文件后,在 markdown 使用 <iframe>
标签,就可以将完成嵌入
<iframe src="https://nbviewer.jupyter.org/github/asdf2014/yuzhouwan/blob/master/yuzhouwan-hacker/yuzhouwan-hacker-python/src/main/resources/ipython/Basic.ipynb" width="640" height="700" frameborder="0"></iframe>
如此一来,可以将 matplotlib 画出的可视化图形,展示出来,而非仅仅一段 python 脚本,实际效果如下:
<iframe src="https://nbviewer.jupyter.org/github/asdf2014/yuzhouwan/blob/master/yuzhouwan-hacker/yuzhouwan-hacker-python/src/main/resources/ipython/Basic.ipynb" width="640" height="700" frameborder="0"></iframe>
Tips: 如果你的博客也是全站 HTTPS 的话,则需要保证 iframe
里面加载的资源也是 https
的,否则 chrome 会阻止混合内容的展示
帮助文档
?
单问号,可以展示出 对应函数、类、变量的文档,而使用 ??
双问号,则可以将对应的源码展示出来
$ a = 1
$ a?
Type: int
String form: 1
Docstring:
int(x=0) -> int or long
int(x, base=10) -> int or long
Convert a number or string to an integer, or return 0 if no arguments
are given. If x is floating point, the conversion truncates towards zero.
If x is outside the integer range, the function returns a long instead.
If x is not a number or if base is given, then x must be a string or
Unicode object representing an integer literal in the given base. The
literal can be preceded by '+' or '-' and be surrounded by whitespace.
The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to
interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4
$ a??
Type: int
String form: 1
# 另外,推荐使用 "shift + tab",可以快速展示方法的详细描述
配置 iPython Notebook 支持Python 3
# 安装 python3
$ which python
/d/apps/Python/Python35/python
# 安装 iPython kernel
$ python -m pip install ipykernel
$ python -m ipykernel install --user
# 安装 notebook
$ which pip
/d/apps/Python/Python35/Scripts/pip
$ pip install notebook
Python 工程工具
Tox
VirtualEnv
实战技巧
设置 Proxy
$ export http_proxy="http://127.0.0.1:1080"
$ export https_proxy="https://127.0.0.1:1080"
$ export socks5_proxy="socks5://127.0.0.1:1080"
# pip install --upgrade pip
Remote Debug
我们需要达到的效果是,本地通过 断点直接对 Python 代码进行 Debug 并修改,并在 Ctrl+S 之后会通过 SFTP 直接上传至远程服务器,待全部修改部署完成,自动通过 Flask 自动 reload 最新的代码,并自动重启远程 Python 进程,在本地直接看到修改之后的线上效果。(这里我们以 Airbnb的 Superset 项目为基础来介绍)
PyCharm
Windows 开发机
## local
# should shutdown local firewall firstly
$ cd .\JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg
$ easy_install pycharm-debug.egg
# 若运行使用的是 Python3,则需要 pycharm-debug-py3k.egg
# Run/Debug Configuration - SuperSet Remote Debug - 192.168.3.10(local ip) - 12345(port > 10000), will generate..
import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
# Path mappings
E:/Core Code/superset=/root/superset
# SFTP
# copy a project to a local directory.
# configure: tools - deployment, to upload this local copy to remote server
# config remote host
192.168.1.10 SFTP 192.168.1.10 22 /root/superset-0.15.4 root/****** UTF-8 # 脱敏
# Tools - Deployment - Options - Upload changed files automatically to the default server (On explicit save action (Ctrl+S))
# make deployment automatic: tools - deployment - "automatic upload"
# add remote interpreter: file - settings - python interpreters - "+" - "Remote.."
# Start Debug
Starting debug server at port 12345
Use the following code to connect to the debugger:
import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
Waiting for process connection...
Connected to pydev debugger (build 162.1967.10)
Starting server with command: gunicorn -w 2 --timeout 60 -b 0.0.0.0:9097 --limit-request-line 0 --limit-request-field_size 0 superset:app
远程 Linux 运行环境
## remote
$ cd /root/superset
$ source bin/activate
$ cd /root/superset/lib
# cp \JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg 到 lib 目录中
$ easy_install pycharm-debug.egg
# trouble shooting
>>> import pydevd
# restart
$ vim /root/superset/bin/superset
import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
# After local debug, then start superset
$ mkdir logs
$ nohup superset runserver -a 0.0.0.0 -p 9097 2>&1 > logs/superset.log &
# Flask - Werkzeug debugger
2017-02-07 15:47:03,905:WARNING:werkzeug: * Debugger is active!
2017-02-07 15:47:03,905:INFO:werkzeug: * Debugger pin code: 330-765-812
$ pip install django-debug-toolbar
$ vim lib/python2.7/site-packages/pycharm-debug.egg/tests_pydevd_python/my_django_proj_17/my_django_proj_17/settings.py
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'debug_toolbar', # add
'my_app',
)
# enable django
Setting - Language & Frameworks - Django - "Enable Django Support"
E:\Core Code\superset-0.15.4\bin\superset runserver -a '0.0.0.0' -p 9097
############################# PyDevd is so stiff! Let's Try Remote Python. #############################
# 配置 SFTP (同上)
# 配置 Remote Python
File - Settings - Project: superset-0.15.4 - Project Interpreter - show all(+) -
name: Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)
SSH Credentials
Host: 192.168.1.10 Port: 22
User name: root
Auth type: Password # 脱敏
Python interpreter path: /root/superset-0.15.4/bin/python
PyCharm helpers path: /root/superset-0.15.4/.pycharm_helpers
# 如果发现无法识别,可能是 python 缺少运行权限
$ cd /root/superset-0.15.4/bin && chmod 777 *
PyCharm 相关配置
# 配置 Python 运行项目
Run - Run/Debug Configurations(+) - Python -
Name: superset
Script: E:\Core Code\superset-0.15.4\bin\superset
Script parameters: runserver -d -p 9097
Environment Variables: VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1
Python interpreter: Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python) # 上面配置的 remote python
Working directory: E:\Core Code\superset-0.15.4\bin
Path mapping: E:/Core Code/superset-0.15.4=/root/superset-0.15.4
# 在用远程 python 进行 remote debug 之前,进入到 virtualenv 中
# 这里有可能找不到 activate 文件,可直接添加
File - Settings - Tools - Terminal - Shell path
/bin/bash --rcfile ~/.pycharmrc
$ vim '/e/Core Code/superset-0.15.4/.pycharmrc' # 本地工程增加 .pycharmrc
VIRTUAL_ENV="/root/superset-0.15.4" # 远程服务器中的 virtualenv 目录 (可以直接将 bin/activate 文件内容复制过来)
export VIRTUAL_ENV
# 远程服务器上多了两个进程
$ ps -ef | grep superset | grep -v grep
root 8638 10912 0 15:24 pts/1 00:00:00 bash -c cd /root/superset-0.15.4/bin; env "IDE_PROJECT_ROOTS"="/root/superset-0.15.4" "IPYTHONENABLE"="True" "PYTHONPATH"="/root/superset-0.15.4:/root/superset-0.15.4/.pycharm_helpers/pydev" "PYTHONUNBUFFERED"="1" "PYCHARM_HOSTED"="1" "VIRTUALENVWRAPPER_PYTHON"="E:\Core Code\superset-0.15.4\bin\python" "LIBRARY_ROOTS"="C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/544046706;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/964856790;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1532312494;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/python_stubs/250609560;D:/apps/JetBrains/PyCharm 2016.3.2/helpers/python-skeletons" "PYTHONDONTWRITEBYTECODE"="1" "JETBRAINS_REMOTE_RUN"="1" "PYTHONIOENCODING"="UTF-8" /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client '0.0.0.0' --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
root 8660 8638 11 15:24 pts/1 00:00:17 /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
root 8715 8660 28 15:24 pts/1 00:00:38 /root/superset-0.15.4/bin/python /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
完成
# 本地 windows 上访问
http://192.168.1.10:9097/login/
Visual Studio Code
Not good for me! You can still try it if you are interested.
踩过的坑
Gunicorn 预开启了多个 Work 子进程,无法 Remote Debug
描述
在本地 windows 开发机上,远程连接 linux 上运行在 virtualenv 里的 superset,发现可以 debug,但是 superset 里的 gunicorn 用的是 prefork 模型,开启了好多个 work 子进程
解决
a) 正常的 remote debug 来处理 --not ok
Connected to pydev debugger (build 162.1967.10)
[2017-02-06 18:13:22 +0000] [13609] [INFO] Starting gunicorn 19.6.0
[2017-02-06 18:13:22 +0000] [13609] [INFO] Listening at: http://0.0.0.0:9097 (13609)
[2017-02-06 18:13:22 +0000] [13609] [INFO] Using worker: sync
[2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13624)
[2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13623)
b) 所以用 "Django server" 替换 "Python Remote Debug" 来进行调试 --not ok
配置的 Remote Python 明明是 /root/superset/bin/python
,但是看到 报错信息里面,用的却是 /usr/local/bin/python
c) ipdb --not good
将 gunicorn 进程切换到前台,在 命令行用 ipdb 进行 debug
d) 增加 -w
参数,控制 work 数量 --not ok
@manager.option(
'-w', '--workers', default=config.get("SUPERSET_WORKERS", 2), # default: 2
help="Number of gunicorn web server workers to fire up")
$ superset runserver -a 0.0.0.0 -p 9097 -w 0
e) 关闭 gunicorn --ok
只有在压测时候,才需要开启 gunicorn
superset runserver -d -p 9097
Trying to add breakpoint to file that does not exist
描述
pydev debugger: warning: trying to add breakpoint to file that does not exist: /root/superset/d:/apps/python27/lib/site-packages/gunicorn/arbiter.py
解决
a) 增加 python 中 site-packages 的 mapping 映射 --not good
E:/Core Code/superset=/root/superset;D:/apps/Python27=/root/superset/lib/python2.7
b) 修改 python 为 superset 项目中的 python,而不是本机的 python --ok
同步到本机的 python 不是 python.exe --no
使用 remote python --ok
Couldn't obtain remote socket
描述
Error running superset
Can't run remote python interpreter: Couldn't obtain remote socket from output ('0.0.0.0', 52703), stderr /usr/local/bin/python: No module named virtualenvwrapper virtualenvwrapper.sh: There was a problem running the initialization hooks.
If Python could not import the module virtualenvwrapper.hook_loader, check that virtualenvwrapper has been installed for VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python and that PATH is set properly.
解决
# 查看 PATH 是否包含 venvWapper 的环境变量
$ echo $PATH
# 没有,则检查 ~/.bashrc,将其注释
# Source global definitions
# export WORKON_HOME=~/virtualenv
# source /usr/local/bin/virtualenvwrapper.sh
Vagrant
Vagrant 是一款可以自动化
虚拟机的 安装和配置流程
的软件
下载
# Vagrant
https://www.vagrantup.com/downloads.html
# VirtualBox
https://www.virtualbox.org/wiki/Downloads
http://download.virtualbox.org/virtualbox/5.1.12/ # better
https://hashicorp-files.hashicorp.com/lucid32.box # not good
https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box # best
# 相关镜像
https://atlas.hashicorp.com/boxes/search
http://chef.github.io/bento/
# 安装完成之后,需要 cmd/pycharm/git dash 等等,最好重启电脑
使用
$ vagrant box add superset /f/软件库/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box
==> box: Box file was not detected as metadata. Adding it directly...
==> box: Adding box 'superset' (v0) for provider:
box: Unpacking necessary files from: file:///F:/%C8%ED%BC%FE%BF%E2/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box
box:
==> box: Successfully added box 'superset' (v0) for 'virtualbox'!
$ vagrant box list
superset (virtualbox, 0)
$ vagrant init
A `Vagrantfile` has been placed in this directory. You are now ready to `vagrant up` your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on `vagrantup.com` for more information on using Vagrant.
$ vim /e/vagrant/superset-0.15.4/Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrant.configure("2") do |config|
# config.vm.box = "superset"
# config.vm.box_check_update = false
# config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
# config.vm.synced_folder "./", "/root/superset-0.15.4"
#
# config.vm.network "public_network"
# config.vm.provider "virtualbox" do |vb|
# vb.gui = true
# vb.memory = "1024"
# end
# config.vm.provision "shell", inline: <<-SHELL
# apt-get update
# SHELL
# end
$ vagrant up --provide virtualbox
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'superset'...
==> default: Matching MAC address for NAT networking...
==> default: Setting the name of the VM: superset-0154_default_1486969836220_44233
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
default: Adapter 2: hostonly
==> default: Forwarding ports...
default: 22 (guest) => 2122 (host) (adapter 1)
default: 80 (guest) => 6080 (host) (adapter 1)
default: 6079 (guest) => 6079 (host) (adapter 1)
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
踩过的坑
Provider 'virtualbox' not found
描述
$ vagrant up
==> Provider 'virtualbox' not found. We'll automatically install it now...
The installation process will start below. Human interaction may be required at some points. If you're uncomfortable with automatically installing this provider, you can safely Ctrl-C this process and install it manually.
==> Downloading VirtualBox 5.0.10...
This may not be the latest version of VirtualBox, but it is a version that is known to work well. Over time, we'll update the version that is installed.
解决
vagrant up --provider=virtualbox
Timed out while waiting for the machine to boot
描述
子目录或文件 -p 已经存在。
处理: -p 时出错。
子目录或文件 charms 已经存在。
处理: charms 时出错。
Timed out while waiting for the machine to boot. This means that Vagrant was unable to communicate with the guest machine within the configured ("config.vm.boot_timeout" value) time period.
If you look above, you should be able to see the error(s) that Vagrant had when attempting to connect to the machine. These errors are usually good hints as to what may be wrong.
If you're using a custom box, make sure that networking is properly working and you're able to connect to the machine. It is a common problem that networking isn't setup properly in these boxes. Verify that authentication configurations are also setup properly, as well.
If the box appears to be booting properly, you may want to increase the timeout ("config.vm.boot_timeout") value.'
解决
升级 VirtualBox 到 5.1.12
default: stdin: is not a tty
描述
default: stdin: is not a tty
解决
config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
参考
- Vagrant stuck on line SSH auth method: private key #2462
- Download vagrant box file locally from atlas and configuring it
- how to fix error "default: stdin: is not a tty" when using vagrant up #517
- 最全 Pycharm 教程(35)—— Pycharm 中使用 Vagrant
- 最全 Pycharm 教程(36)—— Pycharm 中 Vagrant 高级技巧
- pycharm 配置 vagrant 环境下调试开发
- 使用 Vagrant 打造跨平台开发环境
- Vagrant can find 'virtualbox', but not 'VirtualBox'. #5189
- How do I activate a virtualenv inside PyCharm's terminal?'
Unittest
-t 改变 顶级 package 路径
The discover sub-command has the following options:
-v, --verbose Verbose output
-s, --start-directory directory Directory to start discovery (. default)
-p, --pattern pattern Pattern to match test files (test*.py default)
-t, --top-level-directory directory Top level directory of project (defaults to start directory)
Name druid_tests
Script E:\Core Code\superset-0.15.4\code\tests\druid_tests.py
Environment variables VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1
Python interpreter Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)
Interpreter options -m tests.druid_tests
Working directory E:\Core Code\superset-0.15.4\code\
Path mappings E:/Core Code/superset-0.15.4=/root/superset-0.15.4
$ export SUPERSET_CONFIG=tests.superset_test_config
$ python -m tests.druid_tests discover . "druid_tests.py"
# 测试完成之后,需要 unset掉 SUPERSET_CONFIG
$ unset SUPERSET_CONFIG
踩过的坑
UnicodeDecodeError: 'gbk' codec can't decode byte 0x87 in position illegal multibyte sequence
解决
# 在程序开头,指定编码,并在 open 文件的时候,指定 encoding 属性
# -*- coding:utf8 -*-
open(fname, "r", encoding="utf8")
connection broken by SSLError
解决
$ python -m pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org --upgrade pip
ModuleNotFoundError: No module named 'yaml'
解决
$ pip install pyyaml
欢迎直接访问我的个人博客,阅读效果更佳:https://yuzhouwan.com/posts/43687/
网友评论