美文网首页python
Python Pandas 使用[ ]进行数据操作

Python Pandas 使用[ ]进行数据操作

作者: Kaspar433 | 来源:发表于2020-03-25 22:13 被阅读0次

    Python Pandas 使用[ ]进行数据操作

    本文将介绍Pandas中“[ ]”的一些相关操作,如进行数据选择及更改。

    “[ ]” 应该是最基本的选择数据的方法,下面是可以向其中传入的类型:

    • 可以直接传入column;
    • 也可以传入column list;
    • 使用切片;
    • 使用布尔索引。

    读入数据

    import pandas as pd
    import numpy as np
    import seaborn as sns
    df
    dates = pd.date_range('1/1/2020', periods=8)
    df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('ABCD'))
    df
    
    out:
        A   B   C   D
    2020-01-01  0.336131    -0.086456   0.096903    -1.230599
    2020-01-02  -0.106293   0.111821    1.165342    -1.378462
    2020-01-03  -0.933779   0.898738    0.013194    -0.593243
    2020-01-04  0.190229    -1.108908   0.597650    2.759475
    2020-01-05  -0.647080   1.573537    1.357191    -0.536916
    2020-01-06  -0.455373   1.342904    -0.316548   0.145119
    2020-01-07  -1.350214   -0.044642   0.501508    1.969973
    2020-01-08  -0.474602   -0.384916   1.829222    0.853519
    

    传入列表

    传入列表,并以列表顺序读取,返回 DataFrame对象。

    df[['C','D']]
    
        C   D
    2020-01-01  0.096903    -1.230599
    2020-01-02  1.165342    -1.378462
    2020-01-03  0.013194    -0.593243
    2020-01-04  0.597650    2.759475
    2020-01-05  1.357191    -0.536916
    2020-01-06  -0.316548   0.145119
    2020-01-07  0.501508    1.969973
    2020-01-08  1.829222    0.853519
    

    传入单列

    如果单独传入某一列,则返回series对象;如果传入列表,则返回DataFrame对象,即使列表的长度为1.

    df['C']
    
    out:
    2020-01-01    0.096903
    2020-01-02    1.165342
    2020-01-03    0.013194
    2020-01-04    0.597650
    2020-01-05    1.357191
    2020-01-06   -0.316548
    2020-01-07    0.501508
    2020-01-08    1.829222
    Freq: D, Name: C, dtype: float64
    
    df[['C']]
    
    out:
    2020-01-01  0.096903
    2020-01-02  1.165342
    2020-01-03  0.013194
    2020-01-04  0.597650
    2020-01-05  1.357191
    2020-01-06  -0.316548
    2020-01-07  0.501508
    2020-01-08  1.829222
    

    可以用来交换列值。

    df[['A','B']] = df[['B','A']]
    df
    
    out:
        A   B   C   D
    2020-01-01  -0.086456   0.336131    0.096903    -1.230599
    2020-01-02  0.111821    -0.106293   1.165342    -1.378462
    2020-01-03  0.898738    -0.933779   0.013194    -0.593243
    2020-01-04  -1.108908   0.190229    0.597650    2.759475
    2020-01-05  1.573537    -0.647080   1.357191    -0.536916
    2020-01-06  1.342904    -0.455373   -0.316548   0.145119
    2020-01-07  -0.044642   -1.350214   0.501508    1.969973
    2020-01-08  -0.384916   -0.474602   1.829222    0.853519
    

    如下所示是另一种交换子集的方法。

    df.loc[:, ['A', 'B']] = df[['B', 'A']]
    df.loc[:, ['A', 'B']] = df[['B', 'A']]
    df
    
    out:
        A   B   C   D
    2020-01-01  -0.086456   0.336131    0.096903    -1.230599
    2020-01-02  0.111821    -0.106293   1.165342    -1.378462
    2020-01-03  0.898738    -0.933779   0.013194    -0.593243
    2020-01-04  -1.108908   0.190229    0.597650    2.759475
    2020-01-05  1.573537    -0.647080   1.357191    -0.536916
    2020-01-06  1.342904    -0.455373   -0.316548   0.145119
    2020-01-07  -0.044642   -1.350214   0.501508    1.969973
    2020-01-08  -0.384916   -0.474602   1.829222    0.853519
    

    上面的操作不会交换列值,交换列值需要使用值来交换。

    df.loc[:, ['A', 'B']] = df[['B', 'A']].values
    df
    
    out:
    A   B   C   D
    2020-01-01  0.336131    -0.086456   0.096903    -1.230599
    2020-01-02  -0.106293   0.111821    1.165342    -1.378462
    2020-01-03  -0.933779   0.898738    0.013194    -0.593243
    2020-01-04  0.190229    -1.108908   0.597650    2.759475
    2020-01-05  -0.647080   1.573537    1.357191    -0.536916
    2020-01-06  -0.455373   1.342904    -0.316548   0.145119
    2020-01-07  -1.350214   -0.044642   0.501508    1.969973
    2020-01-08  -0.474602   -0.384916   1.829222    0.853519
    

    使用to_numpy()也可以进行交换。

    df.loc[:, ['A', 'B']] = df[['B', 'A']].to_numpy()
    df
    
    out:
    A   B   C   D
    2020-01-01  -0.086456   0.336131    0.096903    -1.230599
    2020-01-02  0.111821    -0.106293   1.165342    -1.378462
    2020-01-03  0.898738    -0.933779   0.013194    -0.593243
    2020-01-04  -1.108908   0.190229    0.597650    2.759475
    2020-01-05  1.573537    -0.647080   1.357191    -0.536916
    2020-01-06  1.342904    -0.455373   -0.316548   0.145119
    2020-01-07  -0.044642   -1.350214   0.501508    1.969973
    2020-01-08  -0.384916   -0.474602   1.829222    0.853519
    

    使用切片

    获取前两行数据

    df[:2]
    
    out:
    A   B   C   D
    2020-01-01  -0.086456   0.336131    0.096903    -1.230599
    2020-01-02  1.000000    2.000000    5.000000    6.000000
    

    设置步长

    df[::2]
    
    out:
    A   B   C   D
    2020-01-01  -0.086456   0.336131    0.096903    -1.230599
    2020-01-03  0.898738    -0.933779   0.013194    -0.593243
    2020-01-05  1.573537    -0.647080   1.357191    -0.536916
    2020-01-07  -0.044642   -1.350214   0.501508    1.969973
    
    df[1::2]
    
    out:
    A   B   C   D
    2020-01-02  4.000000    5.000000    6.000000    7.000000
    2020-01-04  -1.108908   0.190229    0.597650    2.759475
    2020-01-06  1.342904    -0.455373   -0.316548   0.145119
    2020-01-08  -0.384916   -0.474602   1.829222    0.853519
    

    将数据逆序排列

    df[::-1]
    
    out:
    A   B   C   D
    2020-01-08  -0.384916   -0.474602   1.829222    0.853519
    2020-01-07  -0.044642   -1.350214   0.501508    1.969973
    2020-01-06  1.342904    -0.455373   -0.316548   0.145119
    2020-01-05  1.573537    -0.647080   1.357191    -0.536916
    2020-01-04  -1.108908   0.190229    0.597650    2.759475
    2020-01-03  0.898738    -0.933779   0.013194    -0.593243
    2020-01-02  1.000000    2.000000    5.000000    6.000000
    2020-01-01  -0.086456   0.336131    0.096903    -1.230599
    

    使用切片进行赋值

    df[:2] = np.arange(8).reshape(2,4)
    df
    
    out:
    A   B   C   D
    2020-01-01  0.000000    1.000000    2.000000    3.000000
    2020-01-02  4.000000    5.000000    6.000000    7.000000
    2020-01-03  0.898738    -0.933779   0.013194    -0.593243
    2020-01-04  -1.108908   0.190229    0.597650    2.759475
    2020-01-05  1.573537    -0.647080   1.357191    -0.536916
    2020-01-06  1.342904    -0.455373   -0.316548   0.145119
    2020-01-07  -0.044642   -1.350214   0.501508    1.969973
    2020-01-08  -0.384916   -0.474602   1.829222    0.853519
    

    使用布尔索引

    df = pd.DataFrame(np.random.randn(8,4),index=dates,columns=list('abcd'))
    df
    
    out:
    a   b   c   d
    2020-01-01  -1.749988   -0.249398   -1.165277   -0.806687
    2020-01-02  0.026334    0.158118    0.341183    -1.042534
    2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
    2020-01-04  1.719313    -1.417885   0.267647    -0.960537
    2020-01-05  -0.259797   -0.851702   -0.873451   -0.476420
    2020-01-06  -0.048619   -0.690095   0.759120    1.184295
    2020-01-07  -0.748535   -1.252718   0.386220    -0.415996
    2020-01-08  -0.497471   -0.550428   -0.867333   -0.109223
    
    mask = df['a'] > 0
    mask
    
    out:
    2020-01-01    False
    2020-01-02     True
    2020-01-03     True
    2020-01-04     True
    2020-01-05    False
    2020-01-06    False
    2020-01-07    False
    2020-01-08    False
    Freq: D, Name: a, dtype: bool
    
    df[mask]
    
    out:
    a   b   c   d
    2020-01-02  0.026334    0.158118    0.341183    -1.042534
    2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
    2020-01-04  1.719313    -1.417885   0.267647    -0.960537
    

    多条件

    df[mask & mask2]
    mask2 = df['b'] < 0
    ​
    df[mask & mask2]
    
    out:
    a   b   c   d
    2020-01-03  0.513027    -0.127235   -0.454433   -0.162600
    2020-01-04  1.719313    -1.417885   0.267647    -0.960537
    

    使用布尔索引更改数据

    df[mask & mask2] = np.arange(8).reshape(2,4)
    df
    
    out:
    a   b   c   d
    2020-01-01  -1.749988   -0.249398   -1.165277   -0.806687
    2020-01-02  0.026334    0.158118    0.341183    -1.042534
    2020-01-03  0.000000    1.000000    2.000000    3.000000
    2020-01-04  4.000000    5.000000    6.000000    7.000000
    2020-01-05  -0.259797   -0.851702   -0.873451   -0.476420
    2020-01-06  -0.048619   -0.690095   0.759120    1.184295
    2020-01-07  -0.748535   -1.252718   0.386220    -0.415996
    2020-01-08  -0.497471   -0.550428   -0.867333   -0.109223
    

    相关文章

      网友评论

        本文标题:Python Pandas 使用[ ]进行数据操作

        本文链接:https://www.haomeiwen.com/subject/ebjfuhtx.html