美文网首页我爱编程
pandas之concat and merge

pandas之concat and merge

作者: 阿发贝塔伽马 | 来源:发表于2018-04-07 15:24 被阅读0次

    pandas 表横竖合concat,参考官方链接

    竖合

    # Merge, join, and concatenate
    import pandas as pd
    df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3', 'A4'],
                        'B': ['B0', 'B1', 'B2', 'B3', 'B4'],
                        'C': ['C0', 'C1', 'C2', 'C3', 'C4'],
                        'D': ['D0', 'D1', 'D2', 'D3', 'D4'],
                        'E': ['E0', 'E1', 'E2', 'E3', 'E4']
                       },
                               index=[0, 1, 2, 3,4])
    df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                        'B': ['B4', 'B5', 'B6', 'B7'],
                        'F': ['F4', 'F5', 'F6', 'F7'],
                        'D': ['D4', 'D5', 'D6', 'D7'],
                       },
                       index=[4, 5, 6, 7])
    
    df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                                'B': ['B8', 'B9', 'B10', 'B11'],
                                'C': ['C8', 'C9', 'C10', 'C11'],
                                'D': ['D8', 'D9', 'D10', 'D11']})
    
    frames = [df1, df2, df3]
    # 可以将三个表数据尾部追加到一起,但是每个表都有自己的index,df3是默认的1,2,3
    # 合并之后index
    '''concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
       keys=None, levels=None, names=None, verify_integrity=False, copy=True)
    '''
    result = pd.concat(frames)
    print result
    

    保留了各自的index以及各自数据,缺失的用
    NaN填充,concat函数默认参数

    join='inner'相当于求交集,ignore_index=True,忽略各自的index,合并后数据集新的index

    result = pd.concat(frames, join='inner', ignore_index=True)
    print result
    

    可以用下面的形式修改index

    # 指定'A'列数据为index
    result.index = result['A']
    # 使用reser_inex放弃之前的index,用0,1,2取代
    rl = result.reset_index(drop=True)
    

    横合

    result = pd.concat([df1, df2], axis=1)
    print result
    

    使用join = inner参数

    result = pd.concat([df1, df2], axis=1, join='inner')
    print result
    

    如果我们只想用df1中数据,等于以df1数据为准

    result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])
    

    类似于数据库操作merge

    函数原型

    pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
             left_index=False, right_index=False, sort=True,
             suffixes=('_x', '_y'), copy=True, indicator=False,
             validate=None)
    

    对比数据库操作

    Merge method SQL Join Name Description
    left LEFT OUTER JOIN Use keys from left frame only
    right RIGHT OUTER JOIN Use keys from right frame only
    outer FULL OUTER JOIN Use union of keys from both frames
    inner INNER JOIN Use intersection of keys from both frames
    left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3','K4'],
                         'A': ['A0', 'A1', 'A2', 'A3', 'A4'],
                         'B': ['B0', 'B1', 'B2', 'B3', 'B4']})
    
    right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K5'],
                          'C': ['C0', 'C1', 'C2', 'C3', 'C5'],
                          'D': ['D0', 'D1', 'D2', 'D3', 'D5']}) 
    

    left

    result = pd.merge(left, right,how='left', on='key')
    print result
    

    right

    result = pd.merge(left, right,how='right', on='key')
    

    inner

    result = pd.merge(left, right,how='inner', on='key')
    print result
    

    outer

    result = pd.merge(left, right,how='outer', on='key')
    print result
    

    复杂一点的例子

    left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                         'key2': ['K0', 'K1', 'K0', 'K1'],
                         'A': ['A0', 'A1', 'A2', 'A3'],
                         'B': ['B0', 'B1', 'B2', 'B3']})
    
    right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],       
                          'key2': ['K0', 'K0', 'K0', 'K0'],
                          'C': ['C0', 'C1', 'C2', 'C3'],
                          'D': ['D0', 'D1', 'D2', 'D3']})
    

    left

    result = pd.merge(left, right, how = 'left', on=['key1', 'key2'])
    print result
    

    right

    result = pd.merge(left, right, how = 'right', on=['key1', 'key2'])
    print result
    

    inner

    result = pd.merge(left, right, how = 'inner', on=['key1', 'key2'])
    print result
    

    outer

    result = pd.merge(left, right, how = 'outer', on=['key1', 'key2'])
    print result
    

    通过left_index,right_index代替on

    left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                         'B': ['B0', 'B1', 'B2']},
                        index=['K0', 'K1', 'K2'])
    
    right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                          'D': ['D0', 'D2', 'D3']},
                          index=['K0', 'K2', 'K3'])
    
    result = pd.merge(left, right, left_index=True, right_index=True, how='outer')
    print result
    

    下面两个表,以left表‘key’与right表index为标准merge

    left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                         'B': ['B0', 'B1', 'B2', 'B3'],
                         'key': ['K0', 'K1', 'K0', 'K1']})
    
    right = pd.DataFrame({'C': ['C0', 'C1'],
                          'D': ['D0', 'D1']},
                         index=['K0', 'K1'])
    
    result = pd.merge(left, right, left_on='key', right_index=True,how='left', sort=False);
    print result
    

    我的博客即将搬运同步至腾讯云+社区,邀请大家一同入驻:https://cloud.tencent.com/developer/support-plan

    相关文章

      网友评论

        本文标题:pandas之concat and merge

        本文链接:https://www.haomeiwen.com/subject/fojyhftx.html