美文网首页
dataframe增加一行的操作,如果是大量数据的代替方案

dataframe增加一行的操作,如果是大量数据的代替方案

作者: 昵称违法 | 来源:发表于2019-11-17 23:33 被阅读0次

    Python - Efficient way to add rows to dataframe

    44

    I used this answer's df.loc[i] = [new_data] suggestion, but I have > 500,000 rows and that was very slow.

    While the answers given are good for the OP's question, I found it more efficient, when dealing with large numbers of rows up front (instead of the tricking in described by the OP) to use csvwriter to add data to an in memory CSV object, then finally use pandas.read_csv(csv) to generate the desired DataFrame output.

    from io import BytesIO
    from csv import writer 
    import pandas as pd
    
    output = BytesIO()
    csv_writer = writer(output)
    
    for row in iterable_object:
        csv_writer.writerow(row)
    
    output.seek(0) # we need to get back to the start of the BytesIO
    df = pd.read_csv(output)
    return df
    

    This, for ~500,000 rows was 1000x faster and as the row count grows the speed improvement will only get larger (the df.loc[1] = [data] will get a lot slower comparatively)

    Hope this helps someone who need efficiency when dealing with more rows than the OP.

    shareimprove this answer

    edited Oct 22 '18 at 23:27

    [图片上传失败...(image-68250d-1574004763478)]

    ximiki

    31711 silver badge1616 bronze badges

    answered Jan 16 '18 at 18:09

    image

    Tom Harvey

    44811 gold badge44 silver badges44 bronze badges

    相关文章

      网友评论

          本文标题:dataframe增加一行的操作,如果是大量数据的代替方案

          本文链接:https://www.haomeiwen.com/subject/wlnmictx.html