本次笔记内容:
- 删掉行
- 删掉列
- groupby不要再忘了拜托了
删掉行
df[df['colname'] != 'notthis']
按照列的condition删掉行
df[df.index != int]
按照row的index删掉行
df.dropna()
删掉含有Na值的行
删掉列
df.drop(df.columns[1], axis=1)
删掉第二列
df.drop('colname', axis=1)
删掉名为colname的列
groupby
用泰坦尼克号的数据示例:
df.groupby('colname')
是一个split的过程,这样直接print不会有什么结果出来,需要apply操作给它。比如.mean(), .sum()
等
In [2]: import pandas as pd
In [3]: url = 'https://tinyurl.com/titanic-csv'
In [4]: df = pd.read_csv(url)
In [5]: df.head()
Out[5]:
Name PClass Age Sex \
0 Allen, Miss Elisabeth Walton 1st 29.00 female
1 Allison, Miss Helen Loraine 1st 2.00 female
2 Allison, Mr Hudson Joshua Creighton 1st 30.00 male
3 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 1st 25.00 female
4 Allison, Master Hudson Trevor 1st 0.92 male
Survived SexCode
0 1 1
1 0 1
2 0 0
3 0 1
4 1 0
In [8]: df.groupby("Sex").mean()
Out[8]:
Age Survived SexCode
Sex
female 29.396424 0.666667 1.0
male 31.014338 0.166863 0.0
In [11]: df.groupby("PClass")['Age'].mean()
Out[11]:
PClass
* NaN
1st 39.667788
2nd 28.300142
3rd 25.208585
Name: Age, dtype: float64
df.groupby(['col1', 'col2']).mean()
会按照col1和col2将df整理为multiindex的dataframe
In [9]: df.groupby(["Sex","PClass"]).mean()
Out[9]:
Age Survived SexCode
Sex PClass
female 1st 37.772277 0.937063 1.0
2nd 27.388235 0.878505 1.0
3rd 22.776176 0.377358 1.0
male * NaN 0.000000 0.0
1st 41.199360 0.329609 0.0
2nd 28.910472 0.145349 0.0
3rd 26.357222 0.116232 0.0
常见把groupby和apply结合起来使用,自定义操作
df.groupby('colname').apply(lambda x: fun(x))
In [13]: df.groupby("PClass").apply(lambda x: x.count())
Out[13]:
Name PClass Age Sex Survived SexCode
PClass
* 1 1 0 1 1 1
1st 322 322 226 322 322 322
2nd 279 279 212 279 279 279
3rd 711 711 318 711 711 711
for (colname,group) in df.groupby('colname'):
对应的colname为要groupby的column, 对应其下的各个levels。group为这个levels抽提出来的dataframe
In [14]: for (colname, group) in df.groupby('PClass'):
...: print colname
...:
*
1st
2nd
3rd
In [15]: for (colname, group) in df.groupby('PClass'):
...: print group.head(2)
...:
...:
Name PClass Age Sex Survived SexCode
456 Jacobsohn Mr Samuel * NaN male 0 0
Name PClass Age Sex Survived SexCode
0 Allen, Miss Elisabeth Walton 1st 29.0 female 1 1
1 Allison, Miss Helen Loraine 1st 2.0 female 0 1
Name PClass Age Sex Survived SexCode
322 Abelson, Mr Samuel 2nd 30.0 male 0 0
323 Abelson, Mrs Samuel (Anna) 2nd 28.0 female 1 1
Name PClass Age Sex Survived SexCode
602 Abbing, Mr Anthony 3rd 42.0 male 0 0
603 Abbott, Master Eugene Joseph 3rd 13.0 male 0 0
网友评论