美文网首页
Pandas - 10.3 单个分组聚合

Pandas - 10.3 单个分组聚合

作者: 陈天睡懒觉 | 来源:发表于2022-07-31 14:08 被阅读0次

    单个分组

    import pandas as pd
    import seaborn as sns
    

    保存分组

    tips_10 = sns.load_dataset('tips').sample(10, random_state=42)
    print(tips_10)
    '''
         total_bill   tip     sex smoker   day    time  size
    24        19.82  3.18    Male     No   Sat  Dinner     2
    6          8.77  2.00    Male     No   Sun  Dinner     2
    153       24.55  2.00    Male     No   Sun  Dinner     4
    211       25.89  5.16    Male    Yes   Sat  Dinner     4
    198       13.00  2.00  Female    Yes  Thur   Lunch     2
    176       17.89  2.00    Male    Yes   Sun  Dinner     2
    192       28.44  2.56    Male    Yes  Thur   Lunch     2
    124       12.48  2.52  Female     No  Thur   Lunch     2
    9         14.78  3.23    Male     No   Sun  Dinner     2
    101       15.38  3.00  Female    Yes   Fri  Dinner     2
    '''
    
    grouped = tips_10.groupby('sex')
    # 查看实际分组
    print(grouped.groups)
    '''
    {'Male': [24, 6, 153, 211, 176, 192, 9], 'Female': [198, 124, 101]}
    '''
    
    {'Male': [24, 6, 153, 211, 176, 192, 9], 'Female': [198, 124, 101]}
    

    选择分组

    female = grouped.get_group('Female')
    print(female)
    '''
         total_bill   tip     sex smoker   day    time  size
    198       13.00  2.00  Female    Yes  Thur   Lunch     2
    124       12.48  2.52  Female     No  Thur   Lunch     2
    101       15.38  3.00  Female    Yes   Fri  Dinner     2
    '''
    
         total_bill   tip     sex smoker   day    time  size
    198       13.00  2.00  Female    Yes  Thur   Lunch     2
    124       12.48  2.52  Female     No  Thur   Lunch     2
    101       15.38  3.00  Female    Yes   Fri  Dinner     2
    

    涉及多个变量的分组计算

    针对可能计算的列计算,删除不能计算的列

    avg = grouped.mean()
    # 没有意义的列不计算不展示
    print(avg)
    '''
            total_bill       tip      size
    sex                                   
    Male         20.02  2.875714  2.571429
    Female       13.62  2.506667  2.000000
    '''
    
            total_bill       tip      size
    sex                                   
    Male         20.02  2.875714  2.571429
    Female       13.62  2.506667  2.000000
    

    历遍分组

    for sex_group in grouped:
        print(sex_group)
        
    '''
    ('Male',      total_bill   tip   sex smoker   day    time  size
    24        19.82  3.18  Male     No   Sat  Dinner     2
    6          8.77  2.00  Male     No   Sun  Dinner     2
    153       24.55  2.00  Male     No   Sun  Dinner     4
    211       25.89  5.16  Male    Yes   Sat  Dinner     4
    176       17.89  2.00  Male    Yes   Sun  Dinner     2
    192       28.44  2.56  Male    Yes  Thur   Lunch     2
    9         14.78  3.23  Male     No   Sun  Dinner     2)
    ('Female',      total_bill   tip     sex smoker   day    time  size
    198       13.00  2.00  Female    Yes  Thur   Lunch     2
    124       12.48  2.52  Female     No  Thur   Lunch     2
    101       15.38  3.00  Female    Yes   Fri  Dinner     2)
    '''
    
    ('Male',      total_bill   tip   sex smoker   day    time  size
    24        19.82  3.18  Male     No   Sat  Dinner     2
    6          8.77  2.00  Male     No   Sun  Dinner     2
    153       24.55  2.00  Male     No   Sun  Dinner     4
    211       25.89  5.16  Male    Yes   Sat  Dinner     4
    176       17.89  2.00  Male    Yes   Sun  Dinner     2
    192       28.44  2.56  Male    Yes  Thur   Lunch     2
    9         14.78  3.23  Male     No   Sun  Dinner     2)
    ('Female',      total_bill   tip     sex smoker   day    time  size
    198       13.00  2.00  Female    Yes  Thur   Lunch     2
    124       12.48  2.52  Female     No  Thur   Lunch     2
    101       15.38  3.00  Female    Yes   Fri  Dinner     2)
    

    grouped中的元素sex_group是一个元组,sex_group的第一个元素是字符串(类似于‘键’),第二个元素是DataFrame(类似于‘值’)

    for sex_group in grouped:
        print('the type is: {}'.format(type(sex_group)))
        print('the length is: {}\n'.format(len(sex_group)))
        first_element = sex_group[0]
        print('the first element is:{}'.format(first_element))
        print('it has a type of: {}\n'.format(type(first_element)))
        second_element = sex_group[1]
        print('the second element is:\n{}'.format(second_element))
        print('it has a type of: {}\n'.format(type(second_element)))
        print('what we have:')
        print(sex_group)
        break
        
    '''
    the type is: <class 'tuple'>
    the length is: 2
    
    the first element is:Male
    it has a type of: <class 'str'>
    
    the second element is:
         total_bill   tip   sex smoker   day    time  size
    24        19.82  3.18  Male     No   Sat  Dinner     2
    6          8.77  2.00  Male     No   Sun  Dinner     2
    153       24.55  2.00  Male     No   Sun  Dinner     4
    211       25.89  5.16  Male    Yes   Sat  Dinner     4
    176       17.89  2.00  Male    Yes   Sun  Dinner     2
    192       28.44  2.56  Male    Yes  Thur   Lunch     2
    9         14.78  3.23  Male     No   Sun  Dinner     2
    it has a type of: <class 'pandas.core.frame.DataFrame'>
    
    what we have:
    ('Male',      total_bill   tip   sex smoker   day    time  size
    24        19.82  3.18  Male     No   Sat  Dinner     2
    6          8.77  2.00  Male     No   Sun  Dinner     2
    153       24.55  2.00  Male     No   Sun  Dinner     4
    211       25.89  5.16  Male    Yes   Sat  Dinner     4
    176       17.89  2.00  Male    Yes   Sun  Dinner     2
    192       28.44  2.56  Male    Yes  Thur   Lunch     2
    9         14.78  3.23  Male     No   Sun  Dinner     2)
    '''
    
    the type is: <class 'tuple'>
    the length is: 2
    
    the first element is:Male
    it has a type of: <class 'str'>
    
    the second element is:
         total_bill   tip   sex smoker   day    time  size
    24        19.82  3.18  Male     No   Sat  Dinner     2
    6          8.77  2.00  Male     No   Sun  Dinner     2
    153       24.55  2.00  Male     No   Sun  Dinner     4
    211       25.89  5.16  Male    Yes   Sat  Dinner     4
    176       17.89  2.00  Male    Yes   Sun  Dinner     2
    192       28.44  2.56  Male    Yes  Thur   Lunch     2
    9         14.78  3.23  Male     No   Sun  Dinner     2
    it has a type of: <class 'pandas.core.frame.DataFrame'>
    
    what we have:
    ('Male',      total_bill   tip   sex smoker   day    time  size
    24        19.82  3.18  Male     No   Sat  Dinner     2
    6          8.77  2.00  Male     No   Sun  Dinner     2
    153       24.55  2.00  Male     No   Sun  Dinner     4
    211       25.89  5.16  Male    Yes   Sat  Dinner     4
    176       17.89  2.00  Male    Yes   Sun  Dinner     2
    192       28.44  2.56  Male    Yes  Thur   Lunch     2
    9         14.78  3.23  Male     No   Sun  Dinner     2)
    

    相关文章

      网友评论

          本文标题:Pandas - 10.3 单个分组聚合

          本文链接:https://www.haomeiwen.com/subject/lyvhwrtx.html