美文网首页Python
python pandas对DataFrame缺失值处理-pd.

python pandas对DataFrame缺失值处理-pd.

作者: 悟空Oo | 来源:发表于2019-09-25 16:26 被阅读0次

    pandas.DataFrame.dropna

    DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)
    参考pandas.DataFrame.dropna

    • axis : {0 or ‘index’, 1 or ‘columns’}, default 0
      0, or ‘index’ : 以行为单位进行计算,若该行中具有缺失值则舍去该行,
      1, or ‘columns’ : 以列为单位进行计算,若该列中具有缺失值则舍去该列;
    • how :{‘any’, ‘all’}, default ‘any
      ‘any’ : 只要含有NA,就舍去该行/列,
      ‘all’ : 只有该行/列均为NA时才舍去;
    • thresh : int, optional,指定行/列具有非NA的数目,即至少具有thresh个非NA值时才进行保留;
    • subset:array-like, optional,对特定的列进行缺失值删除处理;
    • inplace : bool, default False,如果是True, 修改原dataframe,返回值为None.

    代码演示:

    >>> import numpy as np
    >>> import pandas as pd
    >>> data = np.eye(6)
    >>> datanan = np.where(data,data,np.nan)
    >>> datapdnan = pd.DataFrame(datanan)
    >>> datapd = datapdnan.fillna(method='ffill')
    >>> datapd#这几步生成一个用于测试的dataframe:datapd
         0    1    2    3    4    5
    0  1.0  NaN  NaN  NaN  NaN  NaN
    1  1.0  1.0  NaN  NaN  NaN  NaN
    2  1.0  1.0  1.0  NaN  NaN  NaN
    3  1.0  1.0  1.0  1.0  NaN  NaN
    4  1.0  1.0  1.0  1.0  1.0  NaN
    5  1.0  1.0  1.0  1.0  1.0  1.0
    >>> datapd.dropna()#按行删除:存在空值,即删除该行
         0    1    2    3    4    5
    5  1.0  1.0  1.0  1.0  1.0  1.0
    >>> datapd.dropna(how='all')#按行删除:所有数据都为空值时,即删除该行
         0    1    2    3    4    5
    0  1.0  NaN  NaN  NaN  NaN  NaN
    1  1.0  1.0  NaN  NaN  NaN  NaN
    2  1.0  1.0  1.0  NaN  NaN  NaN
    3  1.0  1.0  1.0  1.0  NaN  NaN
    4  1.0  1.0  1.0  1.0  1.0  NaN
    5  1.0  1.0  1.0  1.0  1.0  1.0
    >>> datapd.dropna(axis='columns', thresh=3)#按列删除:保留至少有3个非NaN值的列
         0    1    2    3
    0  1.0  NaN  NaN  NaN
    1  1.0  1.0  NaN  NaN
    2  1.0  1.0  1.0  NaN
    3  1.0  1.0  1.0  1.0
    4  1.0  1.0  1.0  1.0
    5  1.0  1.0  1.0  1.0
    >>> datapd.dropna(axis='index', subset=[1,2])#设置子集:删除第1、2列有空值的行
         0    1    2    3    4    5
    2  1.0  1.0  1.0  NaN  NaN  NaN
    3  1.0  1.0  1.0  1.0  NaN  NaN
    4  1.0  1.0  1.0  1.0  1.0  NaN
    5  1.0  1.0  1.0  1.0  1.0  1.0
    >>> datapd.dropna(axis=1, how='any', subset=[2,3])#设置子集:删除第2、3行有空值的列
         0    1    2
    0  1.0  NaN  NaN
    1  1.0  1.0  NaN
    2  1.0  1.0  1.0
    3  1.0  1.0  1.0
    4  1.0  1.0  1.0
    5  1.0  1.0  1.0
    >>> print(datapd.dropna(inplace=True))#原地修改原dataframe,返回值为None.
    None
    >>> datapd
         0    1    2    3    4    5
    5  1.0  1.0  1.0  1.0  1.0  1.0
    

    相关文章

      网友评论

        本文标题:python pandas对DataFrame缺失值处理-pd.

        本文链接:https://www.haomeiwen.com/subject/nrghyctx.html