Python数据处理：concat与append快速合并数据「附

作者: 1a076099f916 | 来源:发表于2019-01-16 16:18 被阅读19次

Python数据处理：concat与append快速合并数据「附源码」

前言：

将不同的数据源进行合并是数据科学中最有趣的事情之一，将两个不同的数据集非常简单地拼接在一起，Pandas 的函数与方法让数据合并变得快速简单。

进群进群：700341555可以获取Python各类入门学习资料！

这是我的微信公众号【Python编程之家】各位大佬用空可以关注下，每天更新Python学习方法，感谢！

111111111111.png

一、介绍concat函数

pd.concat函数使用非常的简单，但是你可以看到它的参数比一般的函数要多，下面用一些例子，来证明这些参数怎么用。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
copy=True)
</pre>

二、通过pd.concat合并Series：

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">ser1= pd.Series(['张三', '李四', '王五'],index=[1, 2, 3])
ser2= pd.Series(['小明', '小华', '小花'], index=[1, 2, 3])
pd.concat([ser1, ser2])
</pre>

代码结果：

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1 张三
2 李四
3 王五
1 小明
2 小华
3 小花
dtype: object
</pre>

三、通过pd.concat合并DataFrame

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">df1= pd.DataFrame({'a':['张三', '李四', '王五'],
'b':['赵六','王七','周八']})
df2= pd.DataFrame({'a':['小明', '小花', '小华'],
'b':['小王','小黑','小白']})
pd.concat([df1, df2])
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

默认情况下，DataFrame 的合并都是逐行进行的(默认设置是 axis=0)。pd.concat 也可以设置合并按列(axis=1）

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pd.concat([df1, df2], axis=1)
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

四、concat函数常用参数使用：

保留索引:

Pandas在合并时会保留索引,即使索引是重复的也会保留。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">x= df1
y= df2
pd.concat([x, y])
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

捕捉索引重复:

如果你想要检测 pd.concat() 合并的结果中是否出现了重复的索引，可以设置 verify_integrity 参数。将参数设置为True，合并时若有索引重复就会触发异常。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">try:
pd.concat([x, y], verify_integrity=True)
exceptValueErrorase:
print("ValueError:", e)
</pre>

代码结果：

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">ValueError: Indexes have overlapping values: Int64Index([0, 1, 2], dtype='int64')
</pre>

忽略索引:

有时索引无关紧要，那么合并时就可以忽略它们，可以通过设置ignore_index参数来实现。如果将参数设置为True，那么合并时将会创建一个新的整数索引。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pd.concat([x, y], ignore_index=True)
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

增加多级索引:

另一种处理索引重复的方法是通过 keys 参数为数据源设置多级索引标签，这样结果数据就会带上多级索引.

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pd.concat([x, y], keys=['x', 'y'])
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

类似join的合并方法:

在实际工作中，需要合并的数据往往带有不同的列名，而pd.concat提供了一些选项来解决这类合并问题。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">df5= pd.DataFrame({'员工': ['张三', '李四', '王五'],
'职务': ['经理', '主管', '工程师']})
df6= pd.DataFrame({'员工': ['张三', '李四', '赵六'],
'入职': [2004, 2008, 2012]})
pd.concat([df5, df6],sort=True)
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pd.concat([df5, df6], join='inner')
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

默认情况下，某个位置上缺失的数据会用 NaN 表示。如果不想这样，可以用 join 和 join_ axes 参数设置合并方式。当然也可以用 join='inner' 实现对输入列的交集合并。

另一种合并方式是设置 join_axes 参数，保留与df5的列标签一样的数据

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">pd.concat([df5, df6], join_axes=[df5.columns])
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

append()方法：

Series对象、DataFrame 对象都支持 append 方法，通过最少的代码实现合并功能。

Series对象

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">ser1.append(ser2)
</pre>

代码结果：

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1 张三
2 李四
3 王五
1 小明
2 小华
3 小花
dtype: object
</pre>

DataFrame 对象

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">df5.append(df6,sort=True)
</pre>

代码结果：

Python数据处理：concat与append快速合并数据「附源码」

df5.append(df6)的效果与 pd.concat([df5, df6])一样，但是与Python 列表中的 append() 和 extend() 方法不同，Pandas 的 append() 不直接更新原有对象的值，而是为合并后的数据创建一个新对象。

总结：

1.axis=0, 对行操作 axis=1 ,对列操作。

2. join='outer', 取交集集，join='inner'取并集。

3. join_axes=[df5.columns]，保留与df5的列标签一样的数据。

4. ignore_index=False, 保留原索引 ignore_index=True,忽略原索引并生成新索引。

5. keys=['x', 'y'']为数据源设置多级索引标签。

网友评论

本文标题：Python数据处理：concat与append快速合并数据「附

本文链接：https://www.haomeiwen.com/subject/yrhmdqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python数据处理：concat与append快速合并数据「附

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据爬虫Python AI Sql

Python学习

Python精选

Python数据处理：concat与append快速合并数据「附

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据 爬虫Python AI Sql

Python学习

Python精选

大数据爬虫Python AI Sql