美文网首页
pandas 提高文件读写及处理Tips

pandas 提高文件读写及处理Tips

作者: Aerosols | 来源:发表于2020-04-04 18:35 被阅读0次

    折腾了一下午,也算是有一点点心得了,还好没有太早放弃吧。总觉得别人已经在玩很高深精妙的东西,而我只是在做一些最基础的东西。。。
    第一步
    改变数据存储类型

    data[['lag', 'L','S','B']] = data[['lag', 'L','S','B']].astype(np.float16)
    data[['T']]=data[['T']].astype(np.float32)
    

    第二步
    改变数据存储文件,从csv换成hdf或者feather,二进制存储相比csv快的真的不是一点点。。。

    pandas.read_hdf

    pandas.read_hdf(path_or_buf, key=None, mode: str = 'r', errors: str = 'strict', where=None, start: Union[int, NoneType] = None, stop: Union[int, NoneType] = None, columns=None, iterator=False, chunksize: Union[int, NoneType] = None, **kwargs)[source]

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_hdf.html

    pandas.DataFrame.to_hdf

    DataFrame.to_hdf(self, path_or_buf, key: str, mode: str = 'a', complevel: Union[int, NoneType] = None, complib: Union[str, NoneType] = None, append: bool = False, format: Union[str, NoneType] = None, index: bool = True, min_itemsize: Union[int, Dict[str, int], NoneType] = None, nan_rep=None, dropna: Union[bool, NoneType] = None, data_columns: Union[List[str], NoneType] = None, errors: str = 'strict', encoding: str = 'UTF-8') → None[source]

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_hdf.html

    pandas.DataFrame.to_feather

    DataFrame.to_feather(self, path) → None[source]

    pandas.read_feather

    pandas.read_feather(path, columns=None, use_threads: bool = True)[source]

    data_store = pd.HDFStore('data_1215.h5')
    # 将 DataFrame 放进对象中,并设置 key 为 D1215
    data_store['D1215'] = data
    data_store.close()
    ##use hdf to write: 41.06633472442627 s
    
    time1 = time.time()
    data=pd.read_hdf('data_1215.h5',key='D1215')
    time2 = time.time()
    print("use hdf to read:", time2 - time1,"s")
    print(data.head())
    ## use hdf to read: 11.263915061950684 s
    
    

    第三步
    需要研究下怎么进行批量处理,未完待续。

    参考:
    https://zhuanlan.zhihu.com/p/56541628
    https://zhuanlan.zhihu.com/p/69221436
    https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#performance-considerations
    [https://blog.csdn.net/hzau_yang/article/details/78485879]

    相关文章

      网友评论

          本文标题:pandas 提高文件读写及处理Tips

          本文链接:https://www.haomeiwen.com/subject/ejujphtx.html