美文网首页大数据 爬虫Python AI SqlPython小哥哥
为Python回测代码提升十倍性能,具体做了哪些?

为Python回测代码提升十倍性能,具体做了哪些?

作者: 14e61d025165 | 来源:发表于2019-06-13 15:06 被阅读0次

我们的回测程序是用Python写的,因为使用Jupter Notebook显示结果非常方便。但是,最近在一次运行整个回测代码时,整整花了20分钟才出现结果。于是,我们打算好好优化一下。最终,性能提升10倍以上,耗时在1分29秒左右。

优化流程

Python学习交流群:1004391443,这里是python学习者聚集地,有大牛答疑,有资源共享!小编也准备了一份python学习资料,有想学习python编程的,或是转行,或是大学生,还有工作中想提升自己能力的,正在学习的小伙伴欢迎加入学习。

我们优化主要分成两部分,第一部分是程序内部逻辑,第二部分是Python提速。所以,我们整个流程可以分成下面三步:

  • 第一步,Python性能检测
  • 第二步,优化程序内部逻辑
  • 第三步,为Python提速

一、Python性能检测

Python性能检测,我们使用的是 line_profiler ,它可以检测代码每行消耗的时间,非常方便好用。

1、line_profiler 安装

使用pip安装就行了

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ pip3 install line_profiler
</pre>

2、line_profiler使用

首先,在需要测试的函数中加上修饰器 @profile

例如:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">@profile
def max_drawdown(data_list):
"""
求最大回撤, 回撤公式=max(Di-Dj)/Di
"""
max_return = 0
for i, v1 in enumerate(data_list):
for j, v2 in enumerate(data_list[i + 1:]):
if v1 == 0:
continue
tmp_value = (v1 - v2) / v1
if tmp_value > max_return:
max_return = tmp_value
return max_return
</pre>

然后,运行脚本,生成 .lprof 文件

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ kernprof -l lineProfilerDemo.py
</pre>

最后,查看各行代码运行的时间

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ python3 -m line_profiler lineProfilerDemo.py.lprof
Timer unit: 1e-06 s
Total time: 0.000475 s
File: lineProfilerDemo.py
Function: max_drawdown at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def max_drawdown(data_list):
6 """
7 求最大回撤, 回撤公式=max(Di-Dj)/Di
8 """
9 1 15.0 15.0 3.2 max_return = 0
10 11 8.0 0.7 1.7 for i, v1 in enumerate(data_list):
11 55 56.0 1.0 11.8 for j, v2 in enumerate(data_list[i + 1:]):
12 45 346.0 7.7 72.8 if v1 == 0:
13 continue
14 45 25.0 0.6 5.3 tmp_value = (v1 - v2) / v1
15 45 23.0 0.5 4.8 if tmp_value > max_return:
16 3 1.0 0.3 0.2 max_return = tmp_value
17 1 1.0 1.0 0.2 return max_return
</pre>

最后% Time那一列就是本行代码消耗的时间占比。

注意:每次只能检测一个函数,如果检测某一行代码执行比较慢。可以将@profile移动到这一行代码对应的方法,然后继续往内部检测。

3、在Jupyter Notebook中使用

先使用 %load_ext line_profiler 加载,然后使用 lprun -s -f 语句打印结果,具体使用如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">%load_ext line_profiler
import numpy as np
def max_drawdown(data_list):
"""
求最大回撤, 回撤公式=max(Di-Dj)/Di
"""
max_return = 0
for i, v1 in enumerate(data_list):
for j, v2 in enumerate(data_list[i + 1:]):
if v1 == 0:
continue
tmp_value = (v1 - v2) / v1
if tmp_value > max_return:
max_return = tmp_value
return max_return
value_array = np.random.randint(1, high=10, size=10)
%lprun -s -f max_drawdown max_drawdown(value_array)
</pre>

效果如下:

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1560409268185" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

image-20190612160711154.png

二、优化程序内部逻辑

我们的回测代码,主要分为数据读取、数据处理、交易逻辑、回测结果显示四部分。其中,我们主要优化了三方面的代码:

  • 优化数据读取性能
  • 简化数据
  • 改进算法

1、优化数据读取性能

最初,我们读取数据是这样的:

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1560409268192" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

image.png

最终,优化之后读取数据过程是这样的:

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1560409268197" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;"> image

<input class="pgc-img-caption-ipt" placeholder="图片描述(最多50字)" value="" style="box-sizing: border-box; outline: 0px; color: rgb(102, 102, 102); position: absolute; left: 187.5px; transform: translateX(-50%); padding: 6px 7px; max-width: 100%; width: 375px; text-align: center; cursor: text; font-size: 12px; line-height: 1.5; background-color: rgb(255, 255, 255); background-image: none; border: 0px solid rgb(217, 217, 217); border-radius: 4px; transition: all 0.2s cubic-bezier(0.645, 0.045, 0.355, 1) 0s;"></tt-image>

image.png

具体流程如下:

第一次回测的时候,我们会到局域网的Mongodb上读取数据,同时缓存一份json文件在本地。并且,把对应的数据缓存一份在Jupyter notebook的内存里面。

那么,当下一次回测的时候:

  • 如果Jupter notebook未重启,我们直接从内存中拿取数据,这个就不用耗费时间重新读取
  • 如果Jupter notebook重启了,我们就从本地json文件中读取数据,这个也会比从局域网Mongodb中读取快很多
  • 如果,json文件有24小时以上没有更新了,我们则再次从局域网Mongodb中读取,并更新本地缓存

2、简化数据

在检测处理数据的时候,发现计算天的指标时间基本可以忽略,最主要的时间放在遍历分钟数据的for循环里面。其实,这个for循环代码特别简单,但是因为分钟数据有超过9百万条,导致处理它成为了主要花费时间。那我们是否可以简化数据,优化这个性能?

我们的分钟数据主要有:high、low、open、close、volume、timestamp、date_time、frame、symbol等数据,但是我们暂时用到的只有high、low、open、close、timestamp,于是最终我们只保留了6个字段high、low、open、close、timestamp、volume,volume怕以后可能会用到。

同时, 除了减少字段以为,我们将本来存储是字典的数据结构,改成数组 ,以open、close、high、low、volume、timestamp的固定顺序存储。

这是因为,我们在测试性能的时候发现,数组不管是读取数据还是遍历,都比字典快,测试代码如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import time
def test_dic1(info):
a, b = info['name'], info['height']
def test_dic2(info):
for key in info:
pass
def test_array1(info):
a, b = info[0], info[1]
def test_array2(info):
for value in info:
pass
start_timestamp = time.time()
for i in range(106):
test_array1(['Jack', 1.8, 1])
print("取元素数组消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
start_timestamp = time.time()
for i in range(10
6):
test_dic1({
'name': 'Jack',
'height': 1.8,
'sex': 1
})
print("取元素字典消耗时间: {:.4f}秒\n".format(time.time() - start_timestamp))
start_timestamp = time.time()
for i in range(106):
test_array2(['Jack', 1.8, 1])
print("遍历数组消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
start_timestamp = time.time()
for i in range(10
6):
test_dic2({
'name': 'Jack',
'height': 1.8,
'sex': 1
})
print("遍历字典消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
</pre>

结果:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ python3 testDicAndArray.py
取元素数组消耗时间: 0.2128秒
取元素字典消耗时间: 0.2779秒
遍历数组消耗时间: 0.2346秒
遍历字典消耗时间: 0.2985秒
</pre>

比较之后,发现数组的速度比字典快了20%以上。

最终,在稍微牺牲了一点可读性,简化了数据之后,我们的性能提升了很大,主要有三方面:

  • 本地的json文件体积变成了以前的1/3,读取速度提升了将近80%
  • 处理数据性能得到提升
  • 交易逻辑性能得到提升

3、改进算法

(1)for循环内部数据提取到循环外面

通过line_profile检测,我发现有个判断语句时间占比比较大,代码逻辑大概如下所示:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">for i in range(10000000):
if i < self.every_coin_max_cash_unit() and (self.max_total_cash_unit() == -1 or total_cash_unit <= self.max_total_cash_unit()):
pass
</pre>

然后分析之后,我发现 every_coin_max_cash_unit() 方法和 max_total_cash_unit() 方法的值只是2个配置参数,配置好了就不会再变化。

于是,使用两个变量缓存下,放到for循环外面,优化成了下面这样:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">every_coin_max_cash_unit = self.every_coin_max_cash_unit()
max_total_cash_unit = self.max_total_cash_unit()

当为-1时, 给1个很大值,大于1000就够了

max_total_cash_unit = 100000 if max_total_cash_unit == -1 else max_total_cash_unit
for i in range(10000000):
if i < every_coin_max_cash_unit and total_cash_unit <= max_total_cash_unit:
pass
</pre>

像这种优化,至少优化了3处。例如,在for循环内部获取数组长度等等。

(2)最大回撤,将O(n²)算法优化成O(n)

在优化完测试的数据后,我们尝试跑整个数据,发现回测显示结果特别慢。然后使用line_profile去检测,一层一层往里面查的时候,发现计算最大回撤的时候特别耗时。下面是以前计算回撤的代码:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def max_drawdown(data_list):
"""
求最大回撤, 回撤公式=max(Di-Dj)/Di
"""
max_return = 0
for i, v1 in enumerate(data_list):
for j, v2 in enumerate(data_list[i + 1:]):
if v1 == 0:
continue
tmp_value = (v1 - v2) / v1
if tmp_value > max_return:
max_return = tmp_value
return max_return
</pre>

它的时间复杂度是O(n²),而收益数据有28000多条(每小时记录一次),那计算数据量达到了8.4亿次,这样就比较大了。

参考了 【Python量化】O(n)复杂度实现最大回撤的计算 的思路后,我们将代码优化成O(n),代码示例如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def new_max_drawdown(data_array):
"""
求最大回撤
:param array:
:return:
"""
drawdown_array = []
max_value = data_array[0]
for value in data_array:
# 当前值超过最大值, 则替换值, 并且当前是波峰, 所以当前值对应最大回撤为0
if value > max_value:
max_value = value
drawdown_array.append(0)
else:
drawdown_value = (max_value - value) / max_value
drawdown_array.append(drawdown_value)
return max(drawdown_array)
</pre>

性能,又得到提升。

(3)缓存好计算结果,直接传递参数

这方面,和第一点道理有点类似,这里主要有两个例子:

  • 我们的回测结果是比较丰富的,不单单会计算每年的收益率、夏普值、索提诺值等等,还会精确计算到每个月。虽然,最大回撤的算法优化了,这方面得到很大的提升,但是我们觉得还不是极限。在使用line_profile检测之后,发现计算每个月结果花费的时间占比比较大。分析之后,才知道每个月都会重新再计算一下市值、收益率等等,这个其实完全可以在最外面计算好,然后取某一段数据,当做参数传递过去。
  • 我们回测代码刚开始在回测数据的时候,是分成每年、2013至2019、2016至2018、2018至2019多个时间段回测,然后每回测一次,都会重新再处理一遍数据。这种重复计算,造成了好十几秒的浪费。于是,我们将整体代码逻辑改一下,先一次性将回测数据处理好,然后再分别取出来使用,不用再重新处理。

三、为Python提速

Python语言提速,我们分别尝试了三种方案:

  • Cython
  • Pypy
  • Numba

1、Cython

Cython将Python代码翻译成C语言版本,然后生成Python扩展模块,供我们使用。

Cython安装很简单,pip3 install Cython就行了。

怎么使用呢?

例如我们有一个Python脚本 test_array.py ,内容如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">def test_array(info):
for i in range(100):
sum = 0
for value in info:
if value % 2 == 0:
sum += value
else:
sum -= value
</pre>

我们在 test.py 脚本中引入 test_array 模块,代码如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import time
from test_array import test_array
start_timestamp = time.time()
test_array(range(1, 100000))
print("消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
</pre>

我们执行 test.py 脚本,获得了下面结果:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ python3 test.py
消耗时间: 1.0676秒
</pre>

下面,我们使用Cython为它提速。

首先,我们将 test_array.py 中的内容copy到另一个文件 test_array1.pyx 中。

然后,我们创建一个 setup.py 文件,它的内容如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from distutils.core import setup
from Cython.Build import cythonize
setup(
name='test_array',
ext_modules=cythonize('test_array1.pyx'),
)
</pre>

之后,我们执行下面语句编译脚本

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">python3 setup.py build_ext --inplace
</pre>

这一步,它为我们生成了 test_array1.c 和 test_array1.cpython-37m-darwin.so 文件

接着一步,我们在 test.py 中导入 test_array1 模块

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import time

from test_array import test_array

from test_array1 import test_array
start_timestamp = time.time()
test_array(range(1, 100000))
print("消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
</pre>

最后,我们执行 test.py

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ python3 test.py
消耗时间: 0.4671秒
</pre>

什么代码都没有改,我们就为 Python提升2倍多的速度

不过,根据cython建议,我们在确定数据类型情况下,可以使用cdef先确定数据类型, test_array1.pyx 修改后代码如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">cpdef test_array(info):
cdef int i, sum, value
for i in range(100):
sum = 0
for value in info:
if value % 2 == 0:
sum += value
else:
sum -= value
</pre>

执行 test.py

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ python3 test.py
消耗时间: 0.1861秒
</pre>

又提升了2倍的速度,相比于原生的python代码, 提升了将近6倍的速度

其实,若是我们传递的数组确定类型,那么速度还能提升, test_array1.pyx 代码修改如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">cpdef test_array(int[:] info):
cdef int i, sum, value, j
cdef int n = len(info)
for i in range(100):
sum = 0
for j in range(n):
value = info[j]
if value % 2 == 0:
sum += value
else:
sum -= value
</pre>

此时,我们 test.py 也需要修改下,传递的数据是array类型,如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import time

from test_array import test_array

from test_array1 import test_array
import array
start_timestamp = time.time()
a_array = array.array('i', range(1, 100000))
test_array(a_array)

test_array(range(1, 100000))

print("消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
</pre>

最后,执行 test.py ,看下效果

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ python3 test.py
消耗时间: 0.0138秒
</pre>

相比于前面代码,提升了10多倍,比原生的python代码, 提升了75倍

总结:Cython完全兼容Python,也支持对C语言代码的支持。不过,它对代码优化方面,需要下一些功夫才能达到极致效果。

2、Pypy

Pypy是Python语言的JIT编译器,它的运行速度在某些方面比原生更快。

安装Pypy3

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ brew install pypy3
:beer: /usr/local/Cellar/pypy3/7.0.0: 5,776 files, 135.5MB
</pre>

下面,我们试试pypy的速度,执行前面 test.py 的原生python脚本,内容如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import time
from test_array import test_array
start_timestamp = time.time()
test_array(range(1, 100000))
print("消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
</pre>

结果如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ pypy3 test.py
消耗时间: 0.0191秒
</pre>

不需要任何代码修改,它将近达到我们上面Cython最终效果, 比原生的Python快54倍

由于它是Python的另外一套编译器了,需要重新安装依赖环境,下面是我们安装过程遇到的一些问题:

(1)、pypy安装numpy后,运行报错,错误信息:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Traceback (most recent call last):
File "/usr/local/Cellar/pypy3/7.0.0/libexec/site-packages/numpy/core/init.py", line 40, in <module>
from . import multiarray
File "/usr/local/Cellar/pypy3/7.0.0/libexec/site-packages/numpy/core/multiarray.py", line 12, in <module>
from . import overrides
File "/usr/local/Cellar/pypy3/7.0.0/libexec/site-packages/numpy/core/overrides.py", line 6, in <module>
from numpy.core._multiarray_umath import (
ImportError: dlopen(/usr/local/Cellar/pypy3/7.0.0/libexec/site-packages/numpy/core/_multiarray_umath.pypy3-70-darwin.so, 6): Symbol not found: _PyStructSequence_InitType2
Referenced from: /usr/local/Cellar/pypy3/7.0.0/libexec/site-packages/numpy/core/_multiarray_umath.pypy3-70-darwin.so
Expected in: dynamic lookup
</pre>

后来发现豆瓣源过时了,于是换成官方的源,重新安装就行了:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ vim ~/.pip/pip.conf
[global]

index-url = https://pypi.douban.com/simple/

index-url = http://mirrors.aliyun.com/pypi/simple/

index-url = https://pypi.python.org/pypi
</pre>

换回官方源,执行 pip_pypy3 install numpy , 安装的的numpy版本是numpy-1.16.4,果然就没问题了 。豆瓣源还是numpy-1.16.3

(2)、没有安装模块playhouse,具体信息:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Traceback (most recent call last):
File "BackTestFunction.py", line 1, in <module>
from utils import helper
File "/Users/liuchungui/Sites/python/BackTest/utils/helper.py", line 3, in <module>
from playhouse.shortcuts import model_to_dict
ModuleNotFoundError: No module named 'playhouse'
</pre>

原来是因为没有安装peewee这个库导致的,重新安装: pip_pypy3 install peewee

(3)、安装pandas失败,报错信息如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">Downloading https://files.pythonhosted.org/packages/b2/4c/b6f966ac91c5670ba4ef0b0b5613b5379e3c7abdfad4e7b89a87d73bae13/pandas-0.24.2.tar.gz (11.8MB)
100% |████████████████████████████████| 11.8MB 401kB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/qc/z1r6s9rs3h7dtlthq1dstq840000gn/T/pip-install-9j2r7tk5/pandas/setup.py", line 438, in <module>
if python_target < '10.9' and current_system >= '10.9':
File "/usr/local/Cellar/pypy3/7.0.0/libexec/lib-python/3/distutils/version.py", line 52, in lt
c = self._cmp(other)
File "/usr/local/Cellar/pypy3/7.0.0/libexec/lib-python/3/distutils/version.py", line 335, in _cmp
if self.version == other.version:
AttributeError: 'LooseVersion' object has no attribute 'version'

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/qc/z1r6s9rs3h7dtlthq1dstq840000gn/T/pip-install-9j2r7tk5/pandas/
</pre>

原来是安装最新版本会报错,所以安装0.23.4,命令如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ pip_pypy3 install pandas==0.23.4
</pre>

参考: centos7 pypy python3.6版本安装编译及numpy,pandas安装

(4)、安装talib

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pip_pypy3 install TA-Lib
</pre>

(5)Pypy在Jupter Notebook使用

只需要改下kernel就行了,操作如下:

复制一份kernel,命名为pypy3

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">cp ~/Library/Jupyter/kernels/python3/ ~/Library/Jupyter/kernels/pypy3
</pre>

编辑 kernel.json ,使用pypy3

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ vim kernel.json
{
"argv": [
"/usr/local/bin/pypy3",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "pypy3",
"language": "python"
}
</pre>

保存之后,我们在Jupter Notebook选择kernel时,选择pypy3就行了。

总结:Pypy不需要修改代码就能运行并得到加速,这方面很方便。不过,它也有缺点,它不兼容Cpython,在科学计算领域中兼容性也不行,有时候性能反而更差。例如,我曾经准备优化读取文件,默认情况下一个一个读取文件需要花费12秒;使用多线程没有效果(因为GIL,其实还是一个线程);使用多进程只需要花费4.4秒,提升效果明显;使用多进程+Pypy,检查了进程,根本没有用到多进程,而速度却反而变慢了。还有,在使用ujson解析读取json文件时,速度也反而慢了。

3、Numba

Numba是一个JIT编译器,可以在运行时将Python代码编译成机器代码,和Pypy类似,不过它对CPython兼容性更好。

不过,不同于Pypy,它是用装饰器 @jit() 来为指定函数加速。

test_array.py 代码如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">from numba import jit
@jit()
def test_array(info):
for i in range(100):
sum = 0
for value in info:
if value % 2 == 0:
sum += value
else:
sum -= value
</pre>

test.py 传递numpy数组参数,如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import time
from test_array import test_array
import numpy as np
start_timestamp = time.time()
a = np.array(range(1, 100000))
test_array(a)
print("消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
</pre>

执行结果如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ python3 test.py
消耗时间: 0.2292秒
</pre>

而若是传递Numpy参数数组,不使用Numba加速,耗时是6.6148,所以它为Numpy加速将近30倍。

不过,Numba对原生的list不会加速,反而会让代码运行更慢。例如,上面 test.py 代码如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import time
from test_array import test_array
start_timestamp = time.time()
test_array(range(1, 100000))
print("消耗时间: {:.4f}秒".format(time.time() - start_timestamp))
</pre>

执行结果如下:

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">$ python3 test.py
消耗时间: 1.6948秒
</pre>

总结:Numba使用@jit()装饰器来进行加速,它可以指定函数进行加速,但是不能整体加速,使用起来没有Pypy方便,而且对普通的Python代码没有加速效果。不过,它的优势在于支持Numpy等科学计算领域的框架,对Numpy有更一层的加速效果。

总结

最终我们选择了使用Pypy,理由如下:

1、Pypy不用我们修改任何代码,不像Cython那样麻烦,比Numba也方便

2、Pypy对纯Python代码加速效果很好,我们基本上都是纯Python代码,而且整体加速效果很明显

3、Pypy缺点是数据读取比原生更慢,而且不能使用ujson这种C语言框架来加速。不过在简化数据之后,暂时还能接受。而且,后期可以考虑通过Pypy的脚本调用Python脚本,然后获取数据来进行提速。

相关文章

网友评论

    本文标题:为Python回测代码提升十倍性能,具体做了哪些?

    本文链接:https://www.haomeiwen.com/subject/cgksfctx.html