这里讲到的都是平时和sql,以及数据透视表十分类似的功能。数据分析会经常用到。
1.去重函数 .unique()
Input:
![](https://img.haomeiwen.com/i309562/d59f0dff81bfcd8b.png)
output:
![](https://img.haomeiwen.com/i309562/34bf4205716e940c.png)
2.计数函数 .value_counts()
input
Series 版
![](https://img.haomeiwen.com/i309562/5b351ae419896a11.png)
output
![](https://img.haomeiwen.com/i309562/e16b5d6469527036.png)
Data Frame 版
![](https://img.haomeiwen.com/i309562/75d276a4bb6e379c.png)
统计一个frame中的所有元素在每个数列 QUE中 的个数
![](https://img.haomeiwen.com/i309562/531b70ef06460ab4.png)
3.条件判断函数 .isin(['b','c'])
![](https://img.haomeiwen.com/i309562/61c5ca6b0ad0c16d.png)
output:
![](https://img.haomeiwen.com/i309562/7361b1ee3c4254da.png)
![](https://img.haomeiwen.com/i309562/7eb9c885dbd5ce6a.png)
处理空值
1.判断空值的函数 .isnull() NAN, None 都适用
![](https://img.haomeiwen.com/i309562/2c69f5f2f3b64d5c.png)
output
![](https://img.haomeiwen.com/i309562/1c97ef81ea02f613.png)
![](https://img.haomeiwen.com/i309562/751d7d489a7f9362.png)
.dropna() 对存在缺失值进行过滤,等同于.notnull()
Series 版
![](https://img.haomeiwen.com/i309562/78a4353f767d1726.png)
output
![](https://img.haomeiwen.com/i309562/8d39057123da6425.png)
DataFrame 版
![](https://img.haomeiwen.com/i309562/6a28f4f9c44cd9b7.png)
(1)只要存在NA就会给DROP 掉,如果改成 how='all' 只drop掉全部都是NA的
![](https://img.haomeiwen.com/i309562/ace27bc889553fcd.png)
(2)按照列进行drop AXIS=1 .dropna(axis =1 ,how='all')
![](https://img.haomeiwen.com/i309562/1e9905b56db57a29.png)
output
![](https://img.haomeiwen.com/i309562/c052559fdb9fe094.png)
(3) 设置对缺失值的容忍度, thresh
![](https://img.haomeiwen.com/i309562/467ac0248f00311d.png)
![](https://img.haomeiwen.com/i309562/960fd326f35c2f3e.png)
填充数据
(1).fillna({1:0.5}) ,可以按照columns 选择填充在哪一列
df.fillna(0) = _.=df.fillna(0,inplace=True)
![](https://img.haomeiwen.com/i309562/3be29171a3fa55bc.png)
![](https://img.haomeiwen.com/i309562/6d4fe56412b1463b.png)
(2)顺延填充 method = 'ffill',limit 限制阈值
![](https://img.haomeiwen.com/i309562/825db6499c045d33.png)
output
![](https://img.haomeiwen.com/i309562/25453f17b4fcbb7a.png)
或者填充某个统计函数值 .fillna(data.mean())
![](https://img.haomeiwen.com/i309562/8710399f39527f2a.png)
output
![](https://img.haomeiwen.com/i309562/cae6ef0149a44254.png)
![](https://img.haomeiwen.com/i309562/237b9e6e75303860.png)
![](https://img.haomeiwen.com/i309562/22fae1b47097f8a9.png)
层次化索引
(1)存在多个INDEX
![](https://img.haomeiwen.com/i309562/9afea6f27b8a2c4c.png)
output
![](https://img.haomeiwen.com/i309562/58b94e389acd1e30.png)
(2)多种的定位方式
input
![](https://img.haomeiwen.com/i309562/f7145671e6060085.png)
output
![](https://img.haomeiwen.com/i309562/23bf2e269b3c2cea.png)
(3)数据透视表模式 .unstack() 解开数据透视表模式 .stack()
![](https://img.haomeiwen.com/i309562/a1e2c5cc5ada2856.png)
output
![](https://img.haomeiwen.com/i309562/24cc69d1414647e5.png)
(4) 复合index 和 复合columns 的情况,给 index 和 column 命名
![](https://img.haomeiwen.com/i309562/3135e089a62f7151.png)
output
![](https://img.haomeiwen.com/i309562/fdf3788bd5e65f43.png)
(5) 多重index变换位置
![](https://img.haomeiwen.com/i309562/127fef7e6f3d8e3f.png)
out put
![](https://img.haomeiwen.com/i309562/b7e08c0572ac25b4.png)
(6) sort_index(level=1) 按照那个level 进行排序
![](https://img.haomeiwen.com/i309562/3be60582f7524343.png)
(7)跟数据透视表一样进行横排和纵排的sum
![](https://img.haomeiwen.com/i309562/2b168cf4a3d4b46c.png)
(8) 将frame的两列作为index进行计算,同样类似于数据透视表
![](https://img.haomeiwen.com/i309562/eabbccd860718023.png)
output
![](https://img.haomeiwen.com/i309562/1113f8e40c7125dd.png)
(9) 将columns 作为index,且仍然保持作为数据列 ,drop=false
![](https://img.haomeiwen.com/i309562/b7dddb3d3aa7e658.png)
output
![](https://img.haomeiwen.com/i309562/d4da597d753770d7.png)
(10) .reset_index 将层次性index释放
![](https://img.haomeiwen.com/i309562/7bdc58e0f940dd83.png)
output
![](https://img.haomeiwen.com/i309562/91538d972d1dbe6a.png)
网友评论