美文网首页我爱编程
pandas之concat and merge

pandas之concat and merge

作者: 阿发贝塔伽马 | 来源:发表于2018-04-07 15:24 被阅读0次

pandas 表横竖合concat,参考官方链接

竖合

# Merge, join, and concatenate
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3', 'A4'],
                    'B': ['B0', 'B1', 'B2', 'B3', 'B4'],
                    'C': ['C0', 'C1', 'C2', 'C3', 'C4'],
                    'D': ['D0', 'D1', 'D2', 'D3', 'D4'],
                    'E': ['E0', 'E1', 'E2', 'E3', 'E4']
                   },
                           index=[0, 1, 2, 3,4])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'F': ['F4', 'F5', 'F6', 'F7'],
                    'D': ['D4', 'D5', 'D6', 'D7'],
                   },
                   index=[4, 5, 6, 7])

df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                            'B': ['B8', 'B9', 'B10', 'B11'],
                            'C': ['C8', 'C9', 'C10', 'C11'],
                            'D': ['D8', 'D9', 'D10', 'D11']})

frames = [df1, df2, df3]
# 可以将三个表数据尾部追加到一起,但是每个表都有自己的index,df3是默认的1,2,3
# 合并之后index
'''concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
   keys=None, levels=None, names=None, verify_integrity=False, copy=True)
'''
result = pd.concat(frames)
print result

保留了各自的index以及各自数据,缺失的用
NaN填充,concat函数默认参数

join='inner'相当于求交集,ignore_index=True,忽略各自的index,合并后数据集新的index

result = pd.concat(frames, join='inner', ignore_index=True)
print result

可以用下面的形式修改index

# 指定'A'列数据为index
result.index = result['A']
# 使用reser_inex放弃之前的index,用0,1,2取代
rl = result.reset_index(drop=True)

横合

result = pd.concat([df1, df2], axis=1)
print result

使用join = inner参数

result = pd.concat([df1, df2], axis=1, join='inner')
print result

如果我们只想用df1中数据,等于以df1数据为准

result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])

类似于数据库操作merge

函数原型

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
         left_index=False, right_index=False, sort=True,
         suffixes=('_x', '_y'), copy=True, indicator=False,
         validate=None)

对比数据库操作

Merge method SQL Join Name Description
left LEFT OUTER JOIN Use keys from left frame only
right RIGHT OUTER JOIN Use keys from right frame only
outer FULL OUTER JOIN Use union of keys from both frames
inner INNER JOIN Use intersection of keys from both frames
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3','K4'],
                     'A': ['A0', 'A1', 'A2', 'A3', 'A4'],
                     'B': ['B0', 'B1', 'B2', 'B3', 'B4']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K5'],
                      'C': ['C0', 'C1', 'C2', 'C3', 'C5'],
                      'D': ['D0', 'D1', 'D2', 'D3', 'D5']}) 

left

result = pd.merge(left, right,how='left', on='key')
print result

right

result = pd.merge(left, right,how='right', on='key')

inner

result = pd.merge(left, right,how='inner', on='key')
print result

outer

result = pd.merge(left, right,how='outer', on='key')
print result

复杂一点的例子

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],       
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})

left

result = pd.merge(left, right, how = 'left', on=['key1', 'key2'])
print result

right

result = pd.merge(left, right, how = 'right', on=['key1', 'key2'])
print result

inner

result = pd.merge(left, right, how = 'inner', on=['key1', 'key2'])
print result

outer

result = pd.merge(left, right, how = 'outer', on=['key1', 'key2'])
print result

通过left_index,right_index代替on

left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                    index=['K0', 'K1', 'K2'])

right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                      'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])
result = pd.merge(left, right, left_index=True, right_index=True, how='outer')
print result

下面两个表,以left表‘key’与right表index为标准merge

left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3'],
                     'key': ['K0', 'K1', 'K0', 'K1']})

right = pd.DataFrame({'C': ['C0', 'C1'],
                      'D': ['D0', 'D1']},
                     index=['K0', 'K1'])
result = pd.merge(left, right, left_on='key', right_index=True,how='left', sort=False);
print result

我的博客即将搬运同步至腾讯云+社区,邀请大家一同入驻:https://cloud.tencent.com/developer/support-plan

相关文章

网友评论

    本文标题:pandas之concat and merge

    本文链接:https://www.haomeiwen.com/subject/fojyhftx.html