美文网首页Python小推车python模块python
python学习:pandas学习笔记(五)

python学习:pandas学习笔记(五)

作者: GPZ_Lab | 来源:发表于2018-12-11 23:01 被阅读17次

本次笔记内容:

  • 删掉行
  • 删掉列
  • groupby不要再忘了拜托了

删掉行

df[df['colname'] != 'notthis'] 按照列的condition删掉行
df[df.index != int] 按照row的index删掉行
df.dropna() 删掉含有Na值的行

删掉列

df.drop(df.columns[1], axis=1) 删掉第二列
df.drop('colname', axis=1) 删掉名为colname的列

groupby

用泰坦尼克号的数据示例:
df.groupby('colname')是一个split的过程,这样直接print不会有什么结果出来,需要apply操作给它。比如.mean(), .sum()

In [2]: import pandas as pd

In [3]: url = 'https://tinyurl.com/titanic-csv'

In [4]: df = pd.read_csv(url)

In [5]: df.head()
Out[5]:
                                            Name PClass    Age     Sex  \
0                   Allen, Miss Elisabeth Walton    1st  29.00  female
1                    Allison, Miss Helen Loraine    1st   2.00  female
2            Allison, Mr Hudson Joshua Creighton    1st  30.00    male
3  Allison, Mrs Hudson JC (Bessie Waldo Daniels)    1st  25.00  female
4                  Allison, Master Hudson Trevor    1st   0.92    male

   Survived  SexCode
0         1        1
1         0        1
2         0        0
3         0        1
4         1        0
In [8]: df.groupby("Sex").mean()
Out[8]:
              Age  Survived  SexCode
Sex
female  29.396424  0.666667      1.0
male    31.014338  0.166863      0.0
In [11]: df.groupby("PClass")['Age'].mean()
Out[11]:
PClass
*            NaN
1st    39.667788
2nd    28.300142
3rd    25.208585
Name: Age, dtype: float64

df.groupby(['col1', 'col2']).mean()
会按照col1和col2将df整理为multiindex的dataframe

In [9]: df.groupby(["Sex","PClass"]).mean()
Out[9]:
                     Age  Survived  SexCode
Sex    PClass
female 1st     37.772277  0.937063      1.0
       2nd     27.388235  0.878505      1.0
       3rd     22.776176  0.377358      1.0
male   *             NaN  0.000000      0.0
       1st     41.199360  0.329609      0.0
       2nd     28.910472  0.145349      0.0
       3rd     26.357222  0.116232      0.0

常见把groupby和apply结合起来使用,自定义操作
df.groupby('colname').apply(lambda x: fun(x))

In [13]: df.groupby("PClass").apply(lambda x: x.count())
Out[13]:
        Name  PClass  Age  Sex  Survived  SexCode
PClass
*          1       1    0    1         1        1
1st      322     322  226  322       322      322
2nd      279     279  212  279       279      279
3rd      711     711  318  711       711      711

for (colname,group) in df.groupby('colname'):
对应的colname为要groupby的column, 对应其下的各个levels。group为这个levels抽提出来的dataframe

In [14]: for (colname, group) in df.groupby('PClass'):
    ...:     print colname
    ...:
*
1st
2nd
3rd
In [15]: for (colname, group) in df.groupby('PClass'):
    ...:     print group.head(2)
    ...:
    ...:
                    Name PClass  Age   Sex  Survived  SexCode
456  Jacobsohn Mr Samuel      *  NaN  male         0        0
                           Name PClass   Age     Sex  Survived  SexCode
0  Allen, Miss Elisabeth Walton    1st  29.0  female         1        1
1   Allison, Miss Helen Loraine    1st   2.0  female         0        1
                           Name PClass   Age     Sex  Survived  SexCode
322          Abelson, Mr Samuel    2nd  30.0    male         0        0
323  Abelson, Mrs Samuel (Anna)    2nd  28.0  female         1        1
                             Name PClass   Age   Sex  Survived  SexCode
602            Abbing, Mr Anthony    3rd  42.0  male         0        0
603  Abbott, Master Eugene Joseph    3rd  13.0  male         0        0

相关文章

网友评论

    本文标题:python学习:pandas学习笔记(五)

    本文链接:https://www.haomeiwen.com/subject/fjpbtqtx.html