一、jit加速对比
# 用pandas来计算,特别是不能用矢量计算,必须要用循环来处理的时候,用 numpy的结构化数组,就显得比较有优势。
# 本例,用于观测 structured array for循环时,用jit加速和不加速的时间对比。
# 本例structured array有26400行。
# structured array 再用jit加速前,需要把dtype为object的改为numpy支持的类型[比如 string 被 默认为object]
import numba as nb
@nb.jit
def update(struct_array):
for row in struct_array:
row['open'] = 200
row['high'] = 250
#print(row['day'])
def update1(struct_array):
for row in struct_array:
row['open'] = 200
row['high'] = 250
#print(row['day'])
%timeit update(struct_array3)
%timeit update1(struct_array3)
结果
68.6 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
31.3 ms ± 957 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
二、结构化数组数据查询
1、尽量使用view,不要用copy,view是引用,不涉及新的内存分配,故,速度块。
#判断是不是共享内存
year = struct_array['年']
year.base is struct_array
2、查询1d数组,返回真实结果值
res = numpy.isin(struct_array['年'],[2016,2017])
struct_array['年'][res]
==========================
res:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
False, False, False, False, False, False, False, False, False])
struct_array['年'][res] :
array([2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016], dtype=int64)
3、用where和isin查询结构化数据
先用where返回index,再用index取切片数据
year = struct_array['年']
bool_arr = numpy.where(numpy.isin(year,[2016,2017]))
display(bool_arr)
display(year[bool_arr])
final_result = struct_array[numpy.where(numpy.isin(struct_array['年'],[2016,2017]))]
display(final_result)
==========================
(array([18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35], dtype=int64),)
array([2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016], dtype=int64)
array([(18, 2017, 0.1, 104.51972096, -10.82555183, '憨斑鸠'),
(19, 2017, 0.2, 104.3501145 , -10.96938717, '憨斑鸠'),
(20, 2017, 0.3, 103.54367631, -11.35928169, '憨斑鸠'),
(21, 2017, 0.4, 107.41392689, -9.6743072 , '憨斑鸠'),
(22, 2017, 0.5, 108.28510005, -9.85590002, '憨斑鸠'),
(23, 2017, 0.6, 104.48715011, -9.62250469, '憨斑鸠'),
(24, 2017, 0.7, 100.66455001, -9.81848412, '憨斑鸠'),
(25, 2017, 0.8, 99.66175183, -9.55695774, '憨斑鸠'),
(26, 2017, 0.9, 100.40963599, -7.01453746, '憨斑鸠'),
(27, 2016, 0.1, 104.70750137, -22.43171061, '憨斑鸠'),
(28, 2016, 0.2, 103.04499966, -22.55541852, '憨斑鸠'),
(29, 2016, 0.3, 99.48432722, -23.29792662, '憨斑鸠'),
(30, 2016, 0.4, 98.85926603, -23.8461711 , '憨斑鸠'),
(31, 2016, 0.5, 99.34908936, -22.30951175, '憨斑鸠'),
(32, 2016, 0.6, 97.82385895, -21.96773831, '憨斑鸠'),
(33, 2016, 0.7, 97.66852514, -22.1247624 , '憨斑鸠'),
(34, 2016, 0.8, 97.0840451 , -19.36211832, '憨斑鸠'),
(35, 2016, 0.9, 96.74356454, -19.47856185, '憨斑鸠')],
dtype=[('Unnamed: 0', '<i8'), ('年', '<i8'), ('百分比', '<f8'), ('个股最终收益', '<f8'), ('个股最大回撤', '<f8'), ('名字', 'O')])
三、一个问题
jit编译的函数fn1里面引用了numpy,然后,把fn1放在joblib里面,会报numpy not defined
待查
网友评论