pandas.DataFrame.dropna
DataFrame.dropna
(self, axis=0, how='any', thresh=None, subset=None, inplace=False)
参考pandas.DataFrame.dropna
- axis : {0 or ‘index’, 1 or ‘columns’}, default 0
0, or ‘index’ : 以行为单位进行计算,若该行中具有缺失值则舍去该行,
1, or ‘columns’ : 以列为单位进行计算,若该列中具有缺失值则舍去该列;- how :{‘any’, ‘all’}, default ‘any
‘any’ : 只要含有NA,就舍去该行/列,
‘all’ : 只有该行/列均为NA时才舍去;- thresh : int, optional,指定行/列具有非NA的数目,即至少具有thresh个非NA值时才进行保留;
- subset:array-like, optional,对特定的列进行缺失值删除处理;
- inplace : bool, default False,如果是True, 修改原dataframe,返回值为None.
代码演示:
>>> import numpy as np
>>> import pandas as pd
>>> data = np.eye(6)
>>> datanan = np.where(data,data,np.nan)
>>> datapdnan = pd.DataFrame(datanan)
>>> datapd = datapdnan.fillna(method='ffill')
>>> datapd#这几步生成一个用于测试的dataframe:datapd
0 1 2 3 4 5
0 1.0 NaN NaN NaN NaN NaN
1 1.0 1.0 NaN NaN NaN NaN
2 1.0 1.0 1.0 NaN NaN NaN
3 1.0 1.0 1.0 1.0 NaN NaN
4 1.0 1.0 1.0 1.0 1.0 NaN
5 1.0 1.0 1.0 1.0 1.0 1.0
>>> datapd.dropna()#按行删除:存在空值,即删除该行
0 1 2 3 4 5
5 1.0 1.0 1.0 1.0 1.0 1.0
>>> datapd.dropna(how='all')#按行删除:所有数据都为空值时,即删除该行
0 1 2 3 4 5
0 1.0 NaN NaN NaN NaN NaN
1 1.0 1.0 NaN NaN NaN NaN
2 1.0 1.0 1.0 NaN NaN NaN
3 1.0 1.0 1.0 1.0 NaN NaN
4 1.0 1.0 1.0 1.0 1.0 NaN
5 1.0 1.0 1.0 1.0 1.0 1.0
>>> datapd.dropna(axis='columns', thresh=3)#按列删除:保留至少有3个非NaN值的列
0 1 2 3
0 1.0 NaN NaN NaN
1 1.0 1.0 NaN NaN
2 1.0 1.0 1.0 NaN
3 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0
>>> datapd.dropna(axis='index', subset=[1,2])#设置子集:删除第1、2列有空值的行
0 1 2 3 4 5
2 1.0 1.0 1.0 NaN NaN NaN
3 1.0 1.0 1.0 1.0 NaN NaN
4 1.0 1.0 1.0 1.0 1.0 NaN
5 1.0 1.0 1.0 1.0 1.0 1.0
>>> datapd.dropna(axis=1, how='any', subset=[2,3])#设置子集:删除第2、3行有空值的列
0 1 2
0 1.0 NaN NaN
1 1.0 1.0 NaN
2 1.0 1.0 1.0
3 1.0 1.0 1.0
4 1.0 1.0 1.0
5 1.0 1.0 1.0
>>> print(datapd.dropna(inplace=True))#原地修改原dataframe,返回值为None.
None
>>> datapd
0 1 2 3 4 5
5 1.0 1.0 1.0 1.0 1.0 1.0
网友评论