dataframe增加一行的操作，如果是大量数据的代替方案

作者: 昵称违法 | 来源:发表于2019-11-17 23:33 被阅读0次

dataframe增加一行的操作，如果是大量数据的代替方案
pandas VS baseR
Python—pandas
pandas 行和列的操作
pandas 给dataframe 赋值操作1
DataFrame常见数据处理方式
python pandas 入门(1)-- 数据导入，清洗，导出
pandas DataFrame 单个数据修改（cell）
SparkSQL编程实战
pandas dataframe 赋值操作2

Python - Efficient way to add rows to dataframe

44

I used this answer's df.loc[i] = [new_data] suggestion, but I have > 500,000 rows and that was very slow.

While the answers given are good for the OP's question, I found it more efficient, when dealing with large numbers of rows up front (instead of the tricking in described by the OP) to use csvwriter to add data to an in memory CSV object, then finally use pandas.read_csv(csv) to generate the desired DataFrame output.

from io import BytesIO
from csv import writer 
import pandas as pd

output = BytesIO()
csv_writer = writer(output)

for row in iterable_object:
    csv_writer.writerow(row)

output.seek(0) # we need to get back to the start of the BytesIO
df = pd.read_csv(output)
return df

This, for ~500,000 rows was 1000x faster and as the row count grows the speed improvement will only get larger (the df.loc[1] = [data] will get a lot slower comparatively)

Hope this helps someone who need efficiency when dealing with more rows than the OP.

share improve this answer

edited Oct 22 '18 at 23:27

[图片上传失败...(image-68250d-1574004763478)]

ximiki

31711 silver badge1616 bronze badges

answered Jan 16 '18 at 18:09

image

Tom Harvey

44811 gold badge44 silver badges44 bronze badges

1

Could one alternatively, efficiently use an in-memory structure or CSV instead of actually writing a CSV to file? – matanster Jun 18 '18 at 6:42
1

@matanster To my understanding, and that is what the author actually states himself as well, this is already in memory. So hard to beat it. Is this actually faster than appending to a list and converting that? – Jochen Sep 5 '18 at 5:48
Great ! I tested and can confirm that this is much faster. – Floran Gmehlin Sep 17 '18 at 9:07
7

Note: for Python 3 need to use StringIO instead of BytesIO – Romi Kuntsman Nov 27 '18 at 15:04
This is the best performing solution I have found to the problem. Damn thanks for sharing this. Saved the day. – MattSt Jul 3 at 11:49

网友评论

本文标题：dataframe增加一行的操作，如果是大量数据的代替方案

本文链接：https://www.haomeiwen.com/subject/wlnmictx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

dataframe增加一行的操作，如果是大量数据的代替方案

Python - Efficient way to add rows to dataframe

相关文章

dataframe增加一行的操作，如果是大量数据的代替方案

pandas VS baseR

Python—pandas

pandas 行和列的操作

pandas 给dataframe 赋值操作1

DataFrame常见数据处理方式

python pandas 入门(1)-- 数据导入，清洗，导出

pandas DataFrame 单个数据修改（cell）

SparkSQL编程实战

pandas dataframe 赋值操作2

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读