![](https://img.haomeiwen.com/i1128202/d6c656015971b28f.png)
1.字段抽取
抽出指定位置数据slice(start,stop)
![](https://img.haomeiwen.com/i1128202/987852ce89f46b85.png)
![](https://img.haomeiwen.com/i1128202/15e02caa5f7d2a20.png)
2.字段拆分
按指定字符sep,拆分已有字符串
split(sep,n,expand=False)
sep分隔符,n分割后新增列数,expand是否扩展开为数据框,默认False
![](https://img.haomeiwen.com/i1128202/aed949cfdf8e17d1.png)
![](https://img.haomeiwen.com/i1128202/38cd54a14721be61.png)
3.重置索引
df.set_index('列名')
![](https://img.haomeiwen.com/i1128202/4417df224f7e4027.png)
4.抽取记录
根据条件对数据进行抽取 df[condition] 返回DataFrame
condition类型:
比较运算符 ==,<,> df[df.comments>100]
范围运算:between(left,right) df[df.comments.between(10,100)]
空值运算:pandas.isnull(column) df[df.title.isnull()]
字符匹配:str.contains(patten,na=Frase) df[df.title.str.contains('字段',na=False)]
逻辑运算:&(与)、|(或)、not(取反)
![](https://img.haomeiwen.com/i1128202/aa7fa35c2b1c70fd.png)
5.随机抽样
随机从数据中按照一定比例抽取
numpy.random.randint(start,end,num)
start表示范围开始值
end范围结束值
num抽样个数
![](https://img.haomeiwen.com/i1128202/2df37b091456272a.png)
6.通过索引抽取数据
索引名(标签)选取数据:df.loc[行标签,列标签]
![](https://img.haomeiwen.com/i1128202/c1667b0e98017236.png)
索引号选取数据:df.iloc[行索引号,列索引号]
![](https://img.haomeiwen.com/i1128202/2d3303c188938b63.png)
loc为location的缩写,iloc为integer&location的缩写,更广义的切片方式使用.ix,他自动根据所给的索引类型判断使用索引号还是索引名(标签)进行切片;即:
iloc为整型索引,只能是索引号索引;
loc为字符串索引,索引名索引;
ix是iloc和loc的结合体,索引号和索引名均可,但当索引名为int类型时,只能用索引名索引;
7.字典数据抽取
![](https://img.haomeiwen.com/i1128202/b3eb0d0a10367523.png)
![](https://img.haomeiwen.com/i1128202/00e235226ca04347.png)
网友评论