DataFrame的pivot_table数据透视表
当我们建立数据透视表时,最简单的方法时一步一步地做。
添加项目并检查每个步骤,以验证正在或得预期的结果。
创建一个最简单的数据透视表,一定需要有一个索引,即index。
#coding=utf-8
import pandas as pd
import numpy as np
df=pd.read_excel("sales-funnel.xlsx")
print(df.head()) #默认读取前五行
'''
Account Name ... Price Status
0 714466 Trantow-Barrows ... 30000 presented
1 714466 Trantow-Barrows ... 10000 presented
2 714466 Trantow-Barrows ... 5000 pending
3 737550 Fritsch, Russel and Anderson ... 35000 declined
4 146832 Kiehn-Spinka ... 65000 won
'''
print(df.pivot_table(index="Manager"))
'''
解释:
1、指定行索引为Manaer列,
2、即对Manager进行分组(类似数据库中group by)数据透视表只自动获取带有数值的列,默认聚合方法为np.mean即求平均值
Account Price Quantity
Manager
Debra Henley 513112.222222 26111.111111 1.444444
Fred Anderson 405039.000000 35875.000000 2.125000
'''
print(df.pivot_table(index=["Manager","Rep"]))
'''
解释:
1、指定行索引为两个值Manager、Rep
2、即对Manager分组后再对其中的Rep进行组(类似数据库的group by A,B),数据透视表默认自动获取带有数值的列,默认聚合方法为np.mean即求平均值
Account Price Quantity
Manager Rep
Debra Henley Craig Booker 720237.0 20000.000000 1.250000
Daniel Hilton 194874.0 38333.333333 1.666667
John Smith 576220.0 20000.000000 1.500000
Fred Anderson Cedric Moss 196016.5 27500.000000 1.250000
Wendy Yule 614061.5 44250.000000 3.000000
'''
print(df.pivot_table(index=["Manager","Rep"],values="Price"))
'''
解释:
指定values参数,表示仅聚合Price该列
Price
Manager Rep
Debra Henley Craig Booker 20000.000000
Daniel Hilton 38333.333333
John Smith 20000.000000
Fred Anderson Cedric Moss 27500.000000
Wendy Yule 44250.000000
'''
print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=np.sum))
'''
解释:
aggfunc指定聚合使用的函数名,这里指定为求和np.sum
Price
Manager Rep
Debra Henley Craig Booker 80000
Daniel Hilton 115000
John Smith 40000
Fred Anderson Cedric Moss 110000
Wendy Yule 177000
'''
print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=[np.sum,len]))
'''
解释:
aggfuc可以传入列表参数,传递多个聚合函数名
sum len
Price Price
Manager Rep
Debra Henley Craig Booker 80000 4
Daniel Hilton 115000 3
John Smith 40000 2
Fred Anderson Cedric Moss 110000 4
Wendy Yule 177000 4
'''
print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=[np.sum,len,np.mean]))
'''
解释:
aggfunc=[np.sum,len,np.mean],即同时求和,求总个数,求平均值
sum len mean
Price Price Price
Manager Rep
Debra Henley Craig Booker 80000 4 20000.000000
Daniel Hilton 115000 3 38333.333333
John Smith 40000 2 20000.000000
Fred Anderson Cedric Moss 110000 4 27500.000000
Wendy Yule 177000 4 44250.000000
'''
#显示相关产品销售情况
print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=np.sum,columns="Product"))
'''
解释:
columns="Product"表示对Product该列的值区分开来列出,每一个值都进行Price的聚合求sum
Product CPU Maintenance Monitor Software
Manager Rep
Debra Henley Craig Booker 65000.0 5000.0 NaN 10000.0
Daniel Hilton 105000.0 NaN NaN 10000.0
John Smith 35000.0 5000.0 NaN NaN
Fred Anderson Cedric Moss 95000.0 5000.0 NaN 10000.0
Wendy Yule 165000.0 7000.0 5000.0 NaN
'''
#显示相关产品销售情况,NaN的情况置为0
print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=np.sum,columns="Product",fill_value=0))
'''
解释:
columns="Product"表示对Product该列的值区分开来列出,每一个值都进行Price的聚合求sum;
对于聚合之后的值为NaN,添加参数fill_value=0,表示NaN值的自动置为0
Product CPU Maintenance Monitor Software
Manager Rep
Debra Henley Craig Booker 65000 5000 0 10000
Daniel Hilton 105000 0 0 10000
John Smith 35000 5000 0 0
Fred Anderson Cedric Moss 95000 5000 0 10000
Wendy Yule 165000 7000 5000 0
'''
print(df.pivot_table(index=["Manager","Rep"],values="Price",aggfunc=np.sum,columns="Product",fill_value=0,margins=True))
'''
解释:
margins=True表示对聚合统计的数值,再次进行一个行与列的总聚合。
如下:
行'All'表示对每行的各个数值进行聚合所得结果;
列'All'表示对每列的各个数值进行聚合所得结果;
注:聚合统一使用的都是aggfuc=np.sum指定的函数名
Product CPU Maintenance Monitor Software All
Manager Rep
Debra Henley Craig Booker 65000 5000 0 10000 80000
Daniel Hilton 105000 0 0 10000 115000
John Smith 35000 5000 0 0 40000
Fred Anderson Cedric Moss 95000 5000 0 10000 110000
Wendy Yule 165000 7000 5000 0 177000
All 465000 22000 5000 30000 522000
'''
#aggfunc可以传入字典,对指定列进行指定的聚合运算
print(df.pivot_table(index=["Manager","Rep"],values=["Price",'Quantity'],aggfunc={"Price":np.mean,"Quantity":np.sum},columns="Product",fill_value=0,margins=True))
'''
解释:
1、values=["Price",'Quantity']表示聚合之后,聚合函数使用到的列的数值,即这里会使用Price和Quantity的列的值进行聚合运算
2、aggfunc={"Price":np.mean,"Quantity":np.sum}表示对Price进行求平均值的聚合,对Quantity进行求和的聚合
3、指定margins=True,则对应Price的All的行与列的聚合使用np.mean,对应Quantity的All的行与列的聚合使用np.sum
Price ... Quantity
Product CPU Maintenance Monitor ... Monitor Software All
Manager Rep ...
Debra Henley Craig Booker 32500 5000 0 ... 0 1 5
Daniel Hilton 52500 0 0 ... 0 1 5
John Smith 35000 5000 0 ... 0 0 3
Fred Anderson Cedric Moss 47500 5000 0 ... 0 1 5
Wendy Yule 82500 7000 5000 ... 2 0 12
All 51666 5500 5000 ... 2 3 30
'''

sales-funnel.png
网友评论