美文网首页Python小推车python模块python
python学习:pandas学习笔记(五)

python学习:pandas学习笔记(五)

作者: GPZ_Lab | 来源:发表于2018-12-11 23:01 被阅读17次

    本次笔记内容:

    • 删掉行
    • 删掉列
    • groupby不要再忘了拜托了

    删掉行

    df[df['colname'] != 'notthis'] 按照列的condition删掉行
    df[df.index != int] 按照row的index删掉行
    df.dropna() 删掉含有Na值的行

    删掉列

    df.drop(df.columns[1], axis=1) 删掉第二列
    df.drop('colname', axis=1) 删掉名为colname的列

    groupby

    用泰坦尼克号的数据示例:
    df.groupby('colname')是一个split的过程,这样直接print不会有什么结果出来,需要apply操作给它。比如.mean(), .sum()

    In [2]: import pandas as pd
    
    In [3]: url = 'https://tinyurl.com/titanic-csv'
    
    In [4]: df = pd.read_csv(url)
    
    In [5]: df.head()
    Out[5]:
                                                Name PClass    Age     Sex  \
    0                   Allen, Miss Elisabeth Walton    1st  29.00  female
    1                    Allison, Miss Helen Loraine    1st   2.00  female
    2            Allison, Mr Hudson Joshua Creighton    1st  30.00    male
    3  Allison, Mrs Hudson JC (Bessie Waldo Daniels)    1st  25.00  female
    4                  Allison, Master Hudson Trevor    1st   0.92    male
    
       Survived  SexCode
    0         1        1
    1         0        1
    2         0        0
    3         0        1
    4         1        0
    In [8]: df.groupby("Sex").mean()
    Out[8]:
                  Age  Survived  SexCode
    Sex
    female  29.396424  0.666667      1.0
    male    31.014338  0.166863      0.0
    In [11]: df.groupby("PClass")['Age'].mean()
    Out[11]:
    PClass
    *            NaN
    1st    39.667788
    2nd    28.300142
    3rd    25.208585
    Name: Age, dtype: float64
    

    df.groupby(['col1', 'col2']).mean()
    会按照col1和col2将df整理为multiindex的dataframe

    In [9]: df.groupby(["Sex","PClass"]).mean()
    Out[9]:
                         Age  Survived  SexCode
    Sex    PClass
    female 1st     37.772277  0.937063      1.0
           2nd     27.388235  0.878505      1.0
           3rd     22.776176  0.377358      1.0
    male   *             NaN  0.000000      0.0
           1st     41.199360  0.329609      0.0
           2nd     28.910472  0.145349      0.0
           3rd     26.357222  0.116232      0.0
    

    常见把groupby和apply结合起来使用,自定义操作
    df.groupby('colname').apply(lambda x: fun(x))

    In [13]: df.groupby("PClass").apply(lambda x: x.count())
    Out[13]:
            Name  PClass  Age  Sex  Survived  SexCode
    PClass
    *          1       1    0    1         1        1
    1st      322     322  226  322       322      322
    2nd      279     279  212  279       279      279
    3rd      711     711  318  711       711      711
    

    for (colname,group) in df.groupby('colname'):
    对应的colname为要groupby的column, 对应其下的各个levels。group为这个levels抽提出来的dataframe

    In [14]: for (colname, group) in df.groupby('PClass'):
        ...:     print colname
        ...:
    *
    1st
    2nd
    3rd
    In [15]: for (colname, group) in df.groupby('PClass'):
        ...:     print group.head(2)
        ...:
        ...:
                        Name PClass  Age   Sex  Survived  SexCode
    456  Jacobsohn Mr Samuel      *  NaN  male         0        0
                               Name PClass   Age     Sex  Survived  SexCode
    0  Allen, Miss Elisabeth Walton    1st  29.0  female         1        1
    1   Allison, Miss Helen Loraine    1st   2.0  female         0        1
                               Name PClass   Age     Sex  Survived  SexCode
    322          Abelson, Mr Samuel    2nd  30.0    male         0        0
    323  Abelson, Mrs Samuel (Anna)    2nd  28.0  female         1        1
                                 Name PClass   Age   Sex  Survived  SexCode
    602            Abbing, Mr Anthony    3rd  42.0  male         0        0
    603  Abbott, Master Eugene Joseph    3rd  13.0  male         0        0
    

    相关文章

      网友评论

        本文标题:python学习:pandas学习笔记(五)

        本文链接:https://www.haomeiwen.com/subject/fjpbtqtx.html