美文网首页
Python之pandas汇总统计

Python之pandas汇总统计

作者: Brendansmisle | 来源:发表于2020-03-26 10:57 被阅读0次
    1.导入模块
    >>> import pandas as pd
    
    2.解决DataFrame中的行列显示不全问题
    >>> pd.set_option('display.max_rows', 100,'display.max_columns', 1000,"display.max_colwidth",1000,'display.width',1000)
    
    3.导入数据表格
    >>> titanic = pd.read_csv(r"C:\Users\Administrator\Desktop\titanic.csv")
    
    4.统计平均年龄
    >>> titanic["Age"].mean()
    29.69911764705882
    

    默认会跳过空值,并不会跨行统计

    5.统计年龄和票价中位数
    >>> titanic[["Age", "Fare"]].median()
    Age     28.0000
    Fare    14.4542
    dtype: float64
    
    6.多列数据统计,函数自定义统计值
    >>> titanic[["Age", "Fare"]].describe()
                  Age        Fare
    count  714.000000  891.000000
    mean    29.699118   32.204208
    std     14.526497   49.693429
    min      0.420000    0.000000
    25%     20.125000    7.910400
    50%     28.000000   14.454200
    75%     38.000000   31.000000
    max     80.000000  512.329200
    
    7.多列数据统计,自定义统计值
    >>> titanic.agg({'Age': ['min', 'max', 'median', 'skew'],
                    'Fare': ['min', 'max', 'median', 'mean']})
    ...               Age        Fare
    max     80.000000  512.329200
    mean          NaN   32.204208
    median  28.000000   14.454200
    min      0.420000    0.000000
    skew     0.389108         NaN
    
    8.按类别分组统计
    分类统计流程.png
    >>> titanic.groupby("Sex").mean()          #按性别统计各类别的平均值
            PassengerId  Survived    Pclass        Age     SibSp     Parch       Fare
    Sex                                                                              
    female   431.028662  0.742038  2.159236  27.915709  0.694268  0.649682  44.479818
    male     454.147314  0.188908  2.389948  30.726645  0.429809  0.235702  25.523893
    >>> titanic.groupby("Sex")["Age"].mean()    #按性别统计年龄的平均值
    Sex
    female    27.915709
    male      30.726645
    Name: Age, dtype: float64
    >>> titanic.groupby(["Sex", "Pclass"])["Fare"].mean()    #按性别和机舱舱位组合统计平均票价
    Sex     Pclass
    female  1         106.125798
            2          21.970121
            3          16.118810
    male    1          67.226127
            2          19.741782
            3          12.661633
    Name: Fare, dtype: float64
    
    9.按类别统计其个数
    >>> titanic.groupby("Pclass")["Pclass"].count()
    Pclass
    1    216
    2    184
    3    491
    Name: Pclass, dtype: int64
    >>> 
    >>> titanic["Pclass"].value_counts()
    3    491
    1    216
    2    184
    Name: Pclass, dtype: int64
    

    value_counts()方法计算列中每个类别的记录数,该函数是一个快捷方式,它实际上是一个groupby操作

    相关文章

      网友评论

          本文标题:Python之pandas汇总统计

          本文链接:https://www.haomeiwen.com/subject/akjquhtx.html