Pandas_Select_Data_where()

作者: Kaspar433 | 来源:发表于2020-03-28 22:09 被阅读0次

Pandas_Select_Data_where()

从具有布尔向量的Series中选择值通常会返回数据的子集。为了保证选择输出与原始数据具有相同的形状，您可以where在Series和中使用该方法DataFrame。

import pandas as pd
import numpy as np

dates = pd.date_range('2020-01-01',periods=5)
data = pd.DataFrame(np.random.randn(5,4), index=dates, columns=list('abcd'))
data
    a   b   c   d
2020-01-01  -1.017523   -0.838623   -0.284684   1.723855
2020-01-02  0.926578    -0.374901   -1.038738   -1.901277
2020-01-03  1.973570    -1.225851   -0.450821   -0.550839
2020-01-04  -0.456445   -0.557138   -0.227323   0.390099
2020-01-05  0.681782    -0.380826   0.989172    0.164163

仅返回选定的行

data[data.a>0]

out:
    a   b   c   d
2020-01-02  0.926578    -0.374901   -1.038738   -1.901277
2020-01-03  1.973570    -1.225851   -0.450821   -0.550839
2020-01-05  0.681782    -0.380826   0.989172    0.164163

data.where(data.a>0)
out:
    a   b   c   d
2020-01-01  NaN NaN NaN NaN
2020-01-02  0.926578    -0.374901   -1.038738   -1.901277
2020-01-03  1.973570    -1.225851   -0.450821   -0.550839
2020-01-04  NaN NaN NaN NaN
2020-01-05  0.681782    -0.380826   0.989172    0.164163

data.where(data>0)

out:
    a   b   c   d
2020-01-01  NaN NaN NaN 1.723855
2020-01-02  0.926578    NaN NaN NaN
2020-01-03  1.973570    NaN NaN NaN
2020-01-04  NaN NaN NaN 0.390099
2020-01-05  0.681782    NaN 0.989172    0.164163

other参数

在返回的副本中，where使用可选other参数替换条件为False的值。

data.where(data>0, -data)

out:
    a   b   c   d
2020-01-01  1.017523    0.838623    0.284684    1.723855
2020-01-02  0.926578    0.374901    1.038738    1.901277
2020-01-03  1.973570    1.225851    0.450821    0.550839
2020-01-04  0.456445    0.557138    0.227323    0.390099
2020-01-05  0.681782    0.380826    0.989172    0.164163

inplace参数

默认情况下，where返回数据的修改副本。有一个可选参数，inplace以便可以在不创建副本的情况下修改原始数据

data

out:
    a   b   c   d
2020-01-01  -1.017523   -0.838623   -0.284684   1.723855
2020-01-02  0.926578    -0.374901   -1.038738   -1.901277
2020-01-03  1.973570    -1.225851   -0.450821   -0.550839
2020-01-04  -0.456445   -0.557138   -0.227323   0.390099
2020-01-05  0.681782    -0.380826   0.989172    0.164163

data.where(data>0, -data, inplace=True)
data

out:
    a   b   c   d
2020-01-01  1.017523    0.838623    0.284684    1.723855
2020-01-02  0.926578    0.374901    1.038738    1.901277
2020-01-03  1.973570    1.225851    0.450821    0.550839
2020-01-04  0.456445    0.557138    0.227323    0.390099
2020-01-05  0.681782    0.380826    0.989172    0.164163

与numpy.where()的区别

DataFrame.where()不同于numpy.where()，但是如下所示是等价的。

data.where(data>1, 0) == np.where(data>1, data, 0)

out:
    a   b   c   d
2020-01-01  True    True    True    True
2020-01-02  True    True    True    True
2020-01-03  True    True    True    True
2020-01-04  True    True    True    True
2020-01-05  True    True    True    True

axis参数

where()也可以接受axis参数。

data_2 = data.copy()
data_2.where(data_2 > 1, data_2.a, axis='index')

out:
    a   b   c   d
2020-01-01  1.017523    1.017523    1.017523    1.723855
2020-01-02  0.926578    0.926578    1.038738    1.901277
2020-01-03  1.973570    1.225851    1.973570    1.973570
2020-01-04  0.456445    0.456445    0.456445    0.456445
2020-01-05  0.681782    0.681782    0.681782    0.681782

data_2.where(data_2 > 1, data_2.a, axis=0)

out:
    a   b   c   d
2020-01-01  1.017523    1.017523    1.017523    1.723855
2020-01-02  0.926578    0.926578    1.038738    1.901277
2020-01-03  1.973570    1.225851    1.973570    1.973570
2020-01-04  0.456445    0.456445    0.456445    0.456445
2020-01-05  0.681782    0.681782    0.681782    0.681782

使用callable

where()可以接受一个可调用的条件和other参数。该函数必须带有一个参数（调用Series或DataFrame），并返回有效的输出作为条件和other参数。

data_2.where(data_2 > 1, lambda x: x + 10)

out:
    a   b   c   d
2020-01-01  1.017523    10.838623   10.284684   1.723855
2020-01-02  10.926578   10.374901   1.038738    1.901277
2020-01-03  1.973570    1.225851    10.450821   10.550839
2020-01-04  10.456445   10.557138   10.227323   10.390099
2020-01-05  10.681782   10.380826   10.989172   10.164163

data_2.where(lambda x: x >1, lambda x: x + 10)

out:
    a   b   c   d
2020-01-01  1.017523    10.838623   10.284684   1.723855
2020-01-02  10.926578   10.374901   1.038738    1.901277
2020-01-03  1.973570    1.225851    10.450821   10.550839
2020-01-04  10.456445   10.557138   10.227323   10.390099
2020-01-05  10.681782   10.380826   10.989172   10.164163

mask()

mask() 是where()的反向操作。

data_2.mask(data_2 > 1)

out:
    a   b   c   d
2020-01-01  NaN 0.838623    0.284684    NaN
2020-01-02  0.926578    0.374901    NaN NaN
2020-01-03  NaN NaN 0.450821    0.550839
2020-01-04  0.456445    0.557138    0.227323    0.390099
2020-01-05  0.681782    0.380826    0.989172    0.164163

网友评论

本文标题：Pandas_Select_Data_where()

本文链接：https://www.haomeiwen.com/subject/gvheuhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Pandas_Select_Data_where()

Pandas_Select_Data_where()

仅返回选定的行

other参数

inplace参数

与numpy.where()的区别

axis参数

使用callable

mask()

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读