折腾了一下午,也算是有一点点心得了,还好没有太早放弃吧。总觉得别人已经在玩很高深精妙的东西,而我只是在做一些最基础的东西。。。
第一步
改变数据存储类型
data[['lag', 'L','S','B']] = data[['lag', 'L','S','B']].astype(np.float16)
data[['T']]=data[['T']].astype(np.float32)
第二步
改变数据存储文件,从csv换成hdf或者feather,二进制存储相比csv快的真的不是一点点。。。
pandas.read_hdf
pandas.read_hdf
(path_or_buf, key=None, mode: str = 'r', errors: str = 'strict', where=None, start: Union[int, NoneType] = None, stop: Union[int, NoneType] = None, columns=None, iterator=False, chunksize: Union[int, NoneType] = None, **kwargs)[source]
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_hdf.html
pandas.DataFrame.to_hdf
DataFrame.to_hdf
(self, path_or_buf, key: str, mode: str = 'a', complevel: Union[int, NoneType] = None, complib: Union[str, NoneType] = None, append: bool = False, format: Union[str, NoneType] = None, index: bool = True, min_itemsize: Union[int, Dict[str, int], NoneType] = None, nan_rep=None, dropna: Union[bool, NoneType] = None, data_columns: Union[List[str], NoneType] = None, errors: str = 'strict', encoding: str = 'UTF-8') → None[source]
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_hdf.html
pandas.DataFrame.to_feather¶
DataFrame.to_feather
(self, path) → None[source]
pandas.read_feather
pandas.read_feather
(path, columns=None, use_threads: bool = True)[source]
data_store = pd.HDFStore('data_1215.h5')
# 将 DataFrame 放进对象中,并设置 key 为 D1215
data_store['D1215'] = data
data_store.close()
##use hdf to write: 41.06633472442627 s
time1 = time.time()
data=pd.read_hdf('data_1215.h5',key='D1215')
time2 = time.time()
print("use hdf to read:", time2 - time1,"s")
print(data.head())
## use hdf to read: 11.263915061950684 s
第三步
需要研究下怎么进行批量处理,未完待续。
参考:
https://zhuanlan.zhihu.com/p/56541628
https://zhuanlan.zhihu.com/p/69221436
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#performance-considerations
[https://blog.csdn.net/hzau_yang/article/details/78485879]
网友评论