美文网首页
DataFrame的pivot_table数据透视表

DataFrame的pivot_table数据透视表

作者: Chaweys | 来源:发表于2020-11-21 11:38 被阅读0次

DataFrame的pivot_table数据透视表

当我们建立数据透视表时,最简单的方法时一步一步地做。
添加项目并检查每个步骤,以验证正在或得预期的结果。
创建一个最简单的数据透视表,一定需要有一个索引,即index。



#coding=utf-8
import pandas as pd
import numpy as np

df=pd.read_excel("sales-funnel.xlsx")
print(df.head())   #默认读取前五行
'''
   Account                          Name  ...  Price     Status
0   714466               Trantow-Barrows  ...  30000  presented
1   714466               Trantow-Barrows  ...  10000  presented
2   714466               Trantow-Barrows  ...   5000    pending
3   737550  Fritsch, Russel and Anderson  ...  35000   declined
4   146832                  Kiehn-Spinka  ...  65000        won
'''


print(df.pivot_table(index="Manager"))
'''
解释:
1、指定行索引为Manaer列,
2、即对Manager进行分组(类似数据库中group by)数据透视表只自动获取带有数值的列,默认聚合方法为np.mean即求平均值

                     Account         Price  Quantity
Manager                                             
Debra Henley   513112.222222  26111.111111  1.444444
Fred Anderson  405039.000000  35875.000000  2.125000
'''


print(df.pivot_table(index=["Manager","Rep"]))
'''
解释:
1、指定行索引为两个值Manager、Rep
2、即对Manager分组后再对其中的Rep进行组(类似数据库的group by A,B),数据透视表默认自动获取带有数值的列,默认聚合方法为np.mean即求平均值
                              Account         Price  Quantity
Manager       Rep                                            
Debra Henley  Craig Booker   720237.0  20000.000000  1.250000
              Daniel Hilton  194874.0  38333.333333  1.666667
              John Smith     576220.0  20000.000000  1.500000
Fred Anderson Cedric Moss    196016.5  27500.000000  1.250000
              Wendy Yule     614061.5  44250.000000  3.000000
'''


print(df.pivot_table(index=["Manager","Rep"],values="Price"))
'''
解释:
指定values参数,表示仅聚合Price该列
                                    Price
Manager       Rep                        
Debra Henley  Craig Booker   20000.000000
              Daniel Hilton  38333.333333
              John Smith     20000.000000
Fred Anderson Cedric Moss    27500.000000
              Wendy Yule     44250.000000
'''


print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=np.sum))
'''
解释:
aggfunc指定聚合使用的函数名,这里指定为求和np.sum
                              Price
Manager       Rep                  
Debra Henley  Craig Booker    80000
              Daniel Hilton  115000
              John Smith      40000
Fred Anderson Cedric Moss    110000
              Wendy Yule     177000
'''


print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=[np.sum,len]))
'''
解释:
aggfuc可以传入列表参数,传递多个聚合函数名
                                sum   len
                              Price Price
Manager       Rep                        
Debra Henley  Craig Booker    80000     4
              Daniel Hilton  115000     3
              John Smith      40000     2
Fred Anderson Cedric Moss    110000     4
              Wendy Yule     177000     4
'''


print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=[np.sum,len,np.mean]))
'''
解释:
aggfunc=[np.sum,len,np.mean],即同时求和,求总个数,求平均值
                                sum   len          mean
                              Price Price         Price
Manager       Rep                                      
Debra Henley  Craig Booker    80000     4  20000.000000
              Daniel Hilton  115000     3  38333.333333
              John Smith      40000     2  20000.000000
Fred Anderson Cedric Moss    110000     4  27500.000000
              Wendy Yule     177000     4  44250.000000
'''


#显示相关产品销售情况
print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=np.sum,columns="Product"))
'''
解释:
columns="Product"表示对Product该列的值区分开来列出,每一个值都进行Price的聚合求sum

Product                           CPU  Maintenance  Monitor  Software
Manager       Rep                                                    
Debra Henley  Craig Booker    65000.0       5000.0      NaN   10000.0
              Daniel Hilton  105000.0          NaN      NaN   10000.0
              John Smith      35000.0       5000.0      NaN       NaN
Fred Anderson Cedric Moss     95000.0       5000.0      NaN   10000.0
              Wendy Yule     165000.0       7000.0   5000.0       NaN
'''


#显示相关产品销售情况,NaN的情况置为0
print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=np.sum,columns="Product",fill_value=0))
'''
解释:
columns="Product"表示对Product该列的值区分开来列出,每一个值都进行Price的聚合求sum;
对于聚合之后的值为NaN,添加参数fill_value=0,表示NaN值的自动置为0

Product                         CPU  Maintenance  Monitor  Software
Manager       Rep                                                  
Debra Henley  Craig Booker    65000         5000        0     10000
              Daniel Hilton  105000            0        0     10000
              John Smith      35000         5000        0         0
Fred Anderson Cedric Moss     95000         5000        0     10000
              Wendy Yule     165000         7000     5000         0
'''


print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=np.sum,columns="Product",fill_value=0,margins=True))
'''
解释:
margins=True表示对聚合统计的数值,再次进行一个行与列的总聚合。
如下:
行'All'表示对每行的各个数值进行聚合所得结果;
列'All'表示对每列的各个数值进行聚合所得结果;
注:聚合统一使用的都是aggfuc=np.sum指定的函数名

Product                         CPU  Maintenance  Monitor  Software     All
Manager       Rep                                                          
Debra Henley  Craig Booker    65000         5000        0     10000   80000
              Daniel Hilton  105000            0        0     10000  115000
              John Smith      35000         5000        0         0   40000
Fred Anderson Cedric Moss     95000         5000        0     10000  110000
              Wendy Yule     165000         7000     5000         0  177000
All                          465000        22000     5000     30000  522000
'''


#aggfunc可以传入字典,对指定列进行指定的聚合运算
print(df.pivot_table(index=["Manager","Rep"],values=["Price",'Quantity'],aggfunc={"Price":np.mean,"Quantity":np.sum},columns="Product",fill_value=0,margins=True))
'''
解释:
1、values=["Price",'Quantity']表示聚合之后,聚合函数使用到的列的数值,即这里会使用Price和Quantity的列的值进行聚合运算
2、aggfunc={"Price":np.mean,"Quantity":np.sum}表示对Price进行求平均值的聚合,对Quantity进行求和的聚合
3、指定margins=True,则对应Price的All的行与列的聚合使用np.mean,对应Quantity的All的行与列的聚合使用np.sum

                             Price                      ... Quantity             
Product                        CPU Maintenance Monitor  ...  Monitor Software All
Manager       Rep                                       ...                      
Debra Henley  Craig Booker   32500        5000       0  ...        0        1   5
              Daniel Hilton  52500           0       0  ...        0        1   5
              John Smith     35000        5000       0  ...        0        0   3
Fred Anderson Cedric Moss    47500        5000       0  ...        0        1   5
              Wendy Yule     82500        7000    5000  ...        2        0  12
All                          51666        5500    5000  ...        2        3  30
'''
sales-funnel.png

相关文章

网友评论

      本文标题:DataFrame的pivot_table数据透视表

      本文链接:https://www.haomeiwen.com/subject/wtgsiktx.html