美文网首页
Pandas_Select_Data_where()

Pandas_Select_Data_where()

作者: Kaspar433 | 来源:发表于2020-03-28 22:09 被阅读0次

    Pandas_Select_Data_where()

    从具有布尔向量的Series中选择值通常会返回数据的子集。为了保证选择输出与原始数据具有相同的形状,您可以where在Series和中使用该方法DataFrame。

    import pandas as pd
    import numpy as np
    ​
    dates = pd.date_range('2020-01-01',periods=5)
    data = pd.DataFrame(np.random.randn(5,4), index=dates, columns=list('abcd'))
    data
        a   b   c   d
    2020-01-01  -1.017523   -0.838623   -0.284684   1.723855
    2020-01-02  0.926578    -0.374901   -1.038738   -1.901277
    2020-01-03  1.973570    -1.225851   -0.450821   -0.550839
    2020-01-04  -0.456445   -0.557138   -0.227323   0.390099
    2020-01-05  0.681782    -0.380826   0.989172    0.164163
    

    仅返回选定的行

    data[data.a>0]
    
    out:
        a   b   c   d
    2020-01-02  0.926578    -0.374901   -1.038738   -1.901277
    2020-01-03  1.973570    -1.225851   -0.450821   -0.550839
    2020-01-05  0.681782    -0.380826   0.989172    0.164163
    
    data.where(data.a>0)
    out:
        a   b   c   d
    2020-01-01  NaN NaN NaN NaN
    2020-01-02  0.926578    -0.374901   -1.038738   -1.901277
    2020-01-03  1.973570    -1.225851   -0.450821   -0.550839
    2020-01-04  NaN NaN NaN NaN
    2020-01-05  0.681782    -0.380826   0.989172    0.164163
    
    data.where(data>0)
    
    out:
        a   b   c   d
    2020-01-01  NaN NaN NaN 1.723855
    2020-01-02  0.926578    NaN NaN NaN
    2020-01-03  1.973570    NaN NaN NaN
    2020-01-04  NaN NaN NaN 0.390099
    2020-01-05  0.681782    NaN 0.989172    0.164163
    

    other参数

    在返回的副本中,where使用可选other参数替换条件为False的值。

    data.where(data>0, -data)
    
    out:
        a   b   c   d
    2020-01-01  1.017523    0.838623    0.284684    1.723855
    2020-01-02  0.926578    0.374901    1.038738    1.901277
    2020-01-03  1.973570    1.225851    0.450821    0.550839
    2020-01-04  0.456445    0.557138    0.227323    0.390099
    2020-01-05  0.681782    0.380826    0.989172    0.164163
    

    inplace参数

    默认情况下,where返回数据的修改副本。有一个可选参数,inplace以便可以在不创建副本的情况下修改原始数据

    data
    
    out:
        a   b   c   d
    2020-01-01  -1.017523   -0.838623   -0.284684   1.723855
    2020-01-02  0.926578    -0.374901   -1.038738   -1.901277
    2020-01-03  1.973570    -1.225851   -0.450821   -0.550839
    2020-01-04  -0.456445   -0.557138   -0.227323   0.390099
    2020-01-05  0.681782    -0.380826   0.989172    0.164163
    
    data.where(data>0, -data, inplace=True)
    data
    
    out:
        a   b   c   d
    2020-01-01  1.017523    0.838623    0.284684    1.723855
    2020-01-02  0.926578    0.374901    1.038738    1.901277
    2020-01-03  1.973570    1.225851    0.450821    0.550839
    2020-01-04  0.456445    0.557138    0.227323    0.390099
    2020-01-05  0.681782    0.380826    0.989172    0.164163
    

    与numpy.where()的区别

    DataFrame.where()不同于numpy.where(),但是如下所示是等价的。

    data.where(data>1, 0) == np.where(data>1, data, 0)
    
    out:
        a   b   c   d
    2020-01-01  True    True    True    True
    2020-01-02  True    True    True    True
    2020-01-03  True    True    True    True
    2020-01-04  True    True    True    True
    2020-01-05  True    True    True    True
    

    axis参数

    where()也可以接受axis参数。

    data_2 = data.copy()
    data_2.where(data_2 > 1, data_2.a, axis='index')
    
    out:
        a   b   c   d
    2020-01-01  1.017523    1.017523    1.017523    1.723855
    2020-01-02  0.926578    0.926578    1.038738    1.901277
    2020-01-03  1.973570    1.225851    1.973570    1.973570
    2020-01-04  0.456445    0.456445    0.456445    0.456445
    2020-01-05  0.681782    0.681782    0.681782    0.681782
    
    data_2.where(data_2 > 1, data_2.a, axis=0)
    
    out:
        a   b   c   d
    2020-01-01  1.017523    1.017523    1.017523    1.723855
    2020-01-02  0.926578    0.926578    1.038738    1.901277
    2020-01-03  1.973570    1.225851    1.973570    1.973570
    2020-01-04  0.456445    0.456445    0.456445    0.456445
    2020-01-05  0.681782    0.681782    0.681782    0.681782
    

    使用callable

    where()可以接受一个可调用的条件和other参数。该函数必须带有一个参数(调用Series或DataFrame),并返回有效的输出作为条件和other参数。

    data_2.where(data_2 > 1, lambda x: x + 10)
    
    out:
        a   b   c   d
    2020-01-01  1.017523    10.838623   10.284684   1.723855
    2020-01-02  10.926578   10.374901   1.038738    1.901277
    2020-01-03  1.973570    1.225851    10.450821   10.550839
    2020-01-04  10.456445   10.557138   10.227323   10.390099
    2020-01-05  10.681782   10.380826   10.989172   10.164163
    
    data_2.where(lambda x: x >1, lambda x: x + 10)
    
    out:
        a   b   c   d
    2020-01-01  1.017523    10.838623   10.284684   1.723855
    2020-01-02  10.926578   10.374901   1.038738    1.901277
    2020-01-03  1.973570    1.225851    10.450821   10.550839
    2020-01-04  10.456445   10.557138   10.227323   10.390099
    2020-01-05  10.681782   10.380826   10.989172   10.164163
    

    mask()

    mask() 是where()的反向操作。

    data_2.mask(data_2 > 1)
    
    out:
        a   b   c   d
    2020-01-01  NaN 0.838623    0.284684    NaN
    2020-01-02  0.926578    0.374901    NaN NaN
    2020-01-03  NaN NaN 0.450821    0.550839
    2020-01-04  0.456445    0.557138    0.227323    0.390099
    2020-01-05  0.681782    0.380826    0.989172    0.164163
    ​

    相关文章

      网友评论

          本文标题:Pandas_Select_Data_where()

          本文链接:https://www.haomeiwen.com/subject/gvheuhtx.html