美文网首页
Python之DataFrame数据处理

Python之DataFrame数据处理

作者: xieyan0811 | 来源:发表于2017-12-18 16:58 被阅读52次

    1. 说明

     DataFrame是Pandas库中处理表的数据结构,可看作是python中的类似数据库的操作,是Python数据挖掘中最常用的工具。下面介绍DataFrame的一些常用方法。

    2. 遍历

    1) 代码

    import pandas as pd
    import math
    
    df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3],'data2':[4,5,6]})  
    print(df)
    for idx,item in df.iterrows():
        print(idx)
        print(item)
    

    2) 结果

       data1  data2 key
    0      1      4   a
    1      2      5   b
    2      3      6   c
    0
    data1    1
    data2    4
    key      a
    Name: 0, dtype: object
    … 略
    

    3. 同时遍历两个数据表

    1) 代码

    import pandas as pd
    import math
    
    df1=pd.DataFrame({'key':['a','b'],'data1':[1,2]})  
    df2=pd.DataFrame({'key':['c','d'],'data2':[4,5]})  
    for (idx1,item1),(idx2,item2) in zip(df1.iterrows(),df2.iterrows()):
        print("idx1",idx1)
        print(item1)
        print("idx2",idx2)
        print(item2)
    

    2) 结果

    ('idx1', 0)
    data1    1
    key      a
    Name: 0, dtype: object
    ('idx2', 0)
    data2    4
    key      c
    Name: 0, dtype: object
    ('idx1', 1)
    data1    2
    key      b
    Name: 1, dtype: object
    ('idx2', 1)
    data2    5
    key      d
    Name: 1, dtype: object
    

    4. 取一行或多行

    1) 代码

    import pandas as pd
    import math
    
    df1=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
    df2=df1[:1]
    print(df2)
    

    2) 结果

       data1 key
    0      1   a
    

    5. 取一列或多列

    1) 代码

    import pandas as pd
    import math
    
    df1=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
    df2=pd.DataFrame()
    df2['key2']=df1['key']
    print(df2)
    

    2) 结果

      key2
    0    a
    1    b
    2    c
    

    6. 列连接(横向:变宽):merge

    1) 代码

    import pandas as pd
    
    df1=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
    df2=pd.DataFrame({'key':['a','b','c'],'data2':[4,5,6]}) 
    df3=pd.merge(df1,df2)
    

    2) 结果

       data1 key
    0      1   a
    1      2   b
    2      3   c
       data2 key
    0      4   a
    1      5   b
    2      6   c
       data1 key  data2
    0      1   a      4
    1      2   b      5
    2      3   c      6
    

    7. 行连接(纵向:变长):concat

    1) 代码

    import pandas as pd
    
    df1=pd.DataFrame({'key':['a','b','c'],'data':[1,2,3]})  
    df2=pd.DataFrame({'key':['d','e','f'],'data':[4,5,6]}) 
    df3=pd.concat([df1,df2])
    

    2) 结果

       data key
    0     1   a
    1     2   b
    2     3   c
       data key
    0     4   d
    1     5   e
    2     6   f
       data key
    0     1   a
    1     2   b
    2     3   c
    0     4   d
    1     5   e
    2     6   f
    

    8. 对某列做简单变换

    1) 代码

    
    import pandas as pd
    
    df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
    print(df)
    df['data1']=df['data1']+1
    print(df)
    

    2) 结果

       data1 key
    0      1   a
    1      2   b
    2      3   c
       data1 key
    0      2   a
    1      3   b
    2      4   c
    

    9. 对某列做复杂变换

    1) 代码

    import pandas as pd
    import math
    
    df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
    print(df)
    df['data1']=df['data1'].apply(lambda x: math.sin(x))
    print(df)
    

    2) 结果

       data1 key
    0      1   a
    1      2   b
    2      3   c
          data1 key
    0  0.841471   a
    1  0.909297   b
    2  0.141120   c
    

    10. 对某列做函数处理

    1) 代码

    import pandas as pd
    
    def testme(x):
        print("???",x)
        y = x + 3000
        return y
    
    df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
    print(df)
    df['data1']=df['data1'].apply(testme)
    print(df)
    

    2) 结果

       data1 key
    0      1   a
    1      2   b
    2      3   c
    ('???', 1)
    ('???', 2)
    ('???', 3)
       data1 key
    0   3001   a
    1   3002   b
    2   3003   c
    

    11. 用某几列计算生成新列

    1) 代码

    import pandas as pd
    
    df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3],'data2':[4,5,6]})  
    print(df)
    df['data3']=df['data1']+df['data2']
    print(df)
    

    2) 结果

       data1  data2 key
    0      1      4   a
    1      2      5   b
    2      3      6   c
       data1  data2 key  data3
    0      1      4   a      5
    1      2      5   b      7
    2      3      6   c      9
    

    12. 用某几列用函数生成新列

    1) 代码

    import pandas as pd
    import math
    
    def testme(x):
        print(x['data1'],x['data2'])
        return x['data1'] + x['data2']
    
    df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3],'data2':[4,5,6]})  
    print(df)
    df['data3']=df.apply(testme, axis=1)
    print(df)
    

    2) 结果

       data1  data2 key
    0      1      4   a
    1      2      5   b
    2      3      6   c
    (1, 4)
    (2, 5)
    (3, 6)
       data1  data2 key  data3
    0      1      4   a      5
    1      2      5   b      7
    2      3      6   c      9
    

    13. 删除列

    1) 代码

    import pandas as pd
    import math
    
    df=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3],'data2':[4,5,6]})  
    print(df)
    df=df.drop(['data2'],axis=1)
    print(df)
    

    2) 结果

       data1  data2 key
    0      1      4   a
    1      2      5   b
    2      3      6   c
       data1 key
    0      1   a
    1      2   b
    2      3   c
    

    14. One-Hot变换(把一列枚举型变为多列数值型)

    1) 代码

    import pandas as pd
    import math
    
    df1=pd.DataFrame({'key':['a','b','c'],'data1':[1,2,3]})  
    print(df1)
    df2=pd.get_dummies(df1['key'])
    print(df2)
    df3=pd.get_dummies(df1)
    print(df3)
    

    2) 结果

       data1 key
    0      1   a
    1      2   b
    2      3   c
       a  b  c
    0  1  0  0
    1  0  1  0
    2  0  0  1
       data1  key_a  key_b  key_c
    0      1      1      0      0
    1      2      0      1      0
    2      3      0      0      1
    

    15. 其它常用方法

    1) 求均值方差,中位数等

    df[f].describe()

    2) 求均值

    df[f].mean()

    3) 求方差

    df[f].std()

    4) 清除空值

    df.dropna()

    5) 填充空值

    df.fillna()

    相关文章

      网友评论

          本文标题:Python之DataFrame数据处理

          本文链接:https://www.haomeiwen.com/subject/tdpmwxtx.html