本文是对Adventure项目案例的分析总结,主要使用jupyter进行数据处理,将处理好的数据存储到数据库中,连接到PowerBI实现可视化。
项目目录
- 项目简介
- 分析思路与过程
- 制作PPT
一、项目简介
-
公司业务简介
Adventure Works Cycle是国内一家制造公司,该公司生产和销售金属和复合材料自行车在全国各个市场。销售方式主要有两种,前期主要是分销商模式,但是2018年公司实现财政收入目标后,2019就开始通过公司自有网站获取线上商户进一步扩大市场。
-
分析背景
2019年12月需要向领导汇报2019年11月自行车销售情况,为精细化运营提供数据支持,能精准的定位目标客户群体。
-
分析目的
1、制定销售策略,调整产品结构,保持高速增长,获取更多的收益,占领更多市场份额。
2、通过对整个公司的自行车销量持续监测和分析,掌握公司自行车销售状况、走势的变化,为客户制订、调整和检查销售策略,完善产品结构提供依据。
-
数据源简介
根据业务需求,从数据库中梳理出三张表分析:
1.ods_sales_orders 订单明细表——用于用户行为分析。
2.dw_customer_order 时间地区产品聚合表——用于整体销售表现,地域销售表现,产品销售表现,热品销售分析。
3.ods_customer 每日新增用户表——用户用户行为分析。
二、分析思路与过程
-
分析思路
-
分析过程
0、数据集观察
(1)导入常用包
#导入数据模块
import pandas as pd
import numpy as np
#引入pymysql
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine
import datetime
(2)导入数据集
#从数据库读取数据源:从Mysql读取dw_customer_order,形成DataFrame格式,赋予变量gather_customer_order
#创建数据库引擎
engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83/adventure_ods?charset=utf8')
yuan = engine
gather_customer_order = pd.read_sql_query('select * from dw_customer_order ',con = yuan)
(3)数据集的初步了解
gather_customer_order.head()
gather_customer_order.info()
为了便于后续按月分析数据,需要增加一个月份字段create_year_month,用于存储年月数据。
# 利用create_date字段增加create_year_month月份字段
gather_customer_order['create_year_month'] = gather_customer_order['create_date'].apply(lambda x :x.strftime('%Y-%m'))
gather_customer_order['create_year_month'] .head()
(4)筛选出自行车的数据
# 筛选产品类型cplb_zw中的自行车作为新的gather_customer_order
gather_customer_order=gather_customer_order.loc[gather_customer_order['cplb_zw']=='自行车']
gather_customer_order
1、整体销售表现:分析2019.1—2019.11自行车整体销售表现
字段解释
create_date 订单日期
product_name 产品名
cpzl_zw 产品子类
cplb_zw 产品类别
order_num 产品销售数量
customer_num 购买客户数
sum_amount 产品销售金额
is_current_year 是否当前年(1:是,0:否)
is_last_year 是否上一年(1:是,0:否)
is_yesterday 是否昨天(1:是,0:否)
is_today 是否今天(1:是,0:否)
is_current_month 是否当前余额(1:是,0:否)
is_current_quarter 是否当前季度(1:是,0:否)
chinese_province 所在省份
chinese_city 所在城市
chinese_territory 所在区域
pd.set_option('display.float_format', lambda x: '%.6f' % x)#取消科学计数法
(1)、自行车整体销量表现
# 聚合每月订单数量和销售金额,具体groupby创建一个新的对象,需要将order_num、sum_amount求和,对日期降序排序,记得重置索引
overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num' : sum , 'sum_amount' : sum}).reset_index().\
sort_values('create_year_month',ascending = False)
overall_sales_performance
#新增一列order_num_diff,此为每月自行车销售订单量环比,本月与上月相比
order_num_diff =list((overall_sales_performance.order_num.diff())/(overall_sales_performance.order_num)/-1)
order_num_diff.pop(0)
order_num_diff.append(0)
order_num_diff
这里使用diff()函数计算环比,diff()=前一个数—后一个数,将上述的环比列表转换为DataFrame并重命名,并将其拼接在overall_sales_performance中,并重命名为order_num_diff
order_num_diff=pd.DataFrame(order_num_diff)
overall_sales_performance =overall_sales_performance.set_index('create_year_month').reset_index()
overall_sales_performance = pd.concat([overall_sales_performance ,order_num_diff] ,axis = 1)
overall_sales_performance
(2)自行车整体销售额表现
# 新增一列sum_amount_diff,此为每月自行车销售金额环比,原理一样,但是所需字段不同,最后形成按照日期升序排列
sum_amount_diff = list((overall_sales_performance.sum_amount.diff())/(overall_sales_performance.sum_amount)/-1)
sum_amount_diff.pop(0)
sum_amount_diff.append(0)
sum_amount_diff
#将环比转换为Datadiff
sum_amount_diff = pd.DataFrame(sum_amount_diff,columns=['sum_amount_diff'])],axis=1 )
sum_amount_diff
#将overall_sales_performance和sum_amount_diff拼接起来
overall_sales_performance = pd.concat([overall_sales_performance ,sum_amount_diff],axis = 1)
overall_sales_performance
将最终的overall_sales_performance的DataFrame存到Mysql的pt_overall_sale_performance_1当中。
engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan = engine
overall_sales_performance.to_sql('pt_overall_sale_performance_1_yuan',con = yuan ,if_exists= 'replace')
可视化实现
自行车销量走势图近11个月,11月自行车销售额最多,为3316辆,较10月增长了7.1%
自行车销售额走势图
近11月自行车销售量最多,为6190万元,较10月增长了8.7%,销售金额与销售数量趋势一致。
二、2019年11月自行车地域销售表现
(1)2019年11月区域销售表现
数据清洗筛选10月和11月的自行车数据
# 筛选10、11月的自行车数据,赋值变量为gather_customer_order_10_11
gather_customer_order_10_11 = gather_customer_order[(gather_customer_order[ 'create_year_month' ]=='2019-10')|(gather_customer_order[ 'create_year_month' ]=='2019-11') ]
gather_customer_order_10_11
# 按照'chinese_territory','create_year_month',区域、月份分组,订单量求和、销售金额求和,赋予变量gather_customer_order_10_11_group,记得重置索引
gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(['chinese_territory','create_year_month']).agg({'order_num':sum , 'sum_amount':sum}).reset_index()
gather_customer_order_10_11_group
提取各个区域并存储在列表中,为后续计算11月的环比数据做准备。
region_list = gather_customer_order_10_11['chinese_territory'].unique()
region_list
这里需要生成order_x和amount_x两个空Series,用来存放11月各区域销售量和销售额的环比。pct_change()是(后一个值——前一个值)/前一个值
order_x=pd.Series([])
amount_x=pd.Series([])
#因为没有九月份的数据所以10月份的环比就为NaN,所以这里将Nan换成0
for i in region_list:
a = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['order_num'].pct_change().fillna(0)
b = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['sum_amount'].pct_change().fillna(0)
order_x = order_x.append(a)
amount_x = amount_x.append(b)
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']= amount_x
gather_customer_order_10_11_group.head()
将最终的gather_customer_order_10_11_group的DataFrame存入Mysql的pt_bicy_november_territory_2当中,
engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan = engine
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory_2_yaun',con = yuan ,if_exists= 'replace')
将其导入到Excel中
gather_customer_order_10_11_group.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicy_november_territory_2_yaun.xlsx')
(2)2019年11月自行车销售量Top10城市环比
筛选11月自行车交易数据 赋予变量为gather_customer_order_11
gather_customer_order_11 = gather_customer_order_10_11.loc[gather_customer_order_10_11['create_year_month']== '2019-11']
gather_customer_order_11
按照城市分组并对销售量求和,并进行降序排列,查看销量前十的城市。 ```
# 按照customer_order_11将gather_hinese_city城市分组,求和销售数量order_num,
# 最终查看11月自行车销售数量前十城市,赋予变量gather_customer_order_city_head
gather_customer_order_11 = gather_customer_order_11.groupby('chinese_city').agg({'order_num': sum}).reset_index().sort_values(by = 'order_num', ascending = False)
gather_customer_order_11=gather_customer_order_11.head(10)
gather_customer_order_11_head
# 根据gather_customer_order_city_head的前十城市,查看10月11月自行车销售数据gather_customer_order_10_11
# 赋予变量gather_customer_order_10_11_head
#查看10月11月的自行车销售数据
gather_customer_order_10_11.head()
#查看10月11月的自行车销售数据,筛选的是11月top10的城市,这里会用到isin()函数
gather_customer_order_10_11_head = gather_customer_order_10_11[gather_customer_order_10_11.chinese_city.isin(list(gather_customer_order_11_head['chinese_city']))]
#分组计算前十城市,自行车销售数量销售金额
gather_customer_order_city_10_11 =gather_customer_order_10_11_head.groupby(['chinese_city','create_year_month']).agg({'order_num':sum ,'sum_amount':sum}).reset_index()
gather_customer_order_city_10_11
注意这里的isin()函数是要筛选出gather_customer_order_city_10_11中11月Top10城市
计算11月份销售额和销售量的环比
# 根据gather_customer_order_city_10_11,计算前10的销售金额及销售量环比
city_top_list = gather_customer_order_city_10_11.chinese_city.unique()
order_top_x=pd.Series([])
amount_top_x=pd.Series([])
for i in city_top_list:
a =gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city']==i]['order_num'].pct_change().fillna(0)
b =gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city']==i]['sum_amount'].pct_change().fillna(0)
order_top_x=order_top_x.append(a)
amount_top_x=amount_top_x.append(b)
gather_customer_order_city_10_11['order_diff']=order_top_x
gather_customer_order_city_10_11['amount_diff'] = amount_top_x
gather_customer_order_city_10_11
将数据存到mysql中,并导出到Excel中
engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan = engine
gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3_yuan',con = yuan ,if_exists= 'replace')
gather_customer_order_city_10_11.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicy_november_october_city_3_yuan.xlsx')
实现可视化
地域销售环比增速11月华东地区自行车销售量在8个地区中最多 ,较10月,华南地区增加23.6%,增速最快
Top10城市销售量表现
TOP城市市场份额占比
北京市和上海市销售量最多,郑州市环比增长最快,达到4.8%
TOP城市市场份额总占比13.41%
三、2019年11月自行车产品销量表现
(1)细分市场销量表现
gather_customer_order表利用groupby聚合月份,求每个月自行车的销售数量,赋值给变量gather_customer_order_group_month
# gather_customer_order表利用groupby聚合月份,求每个月自行车的销售数量,赋值给变量gather_customer_order_group_month
gather_customer_order_group_month = gather_customer_order.groupby('create_year_month').agg({'order_num':sum}).reset_index()
gather_customer_order_group_month
利用pd.merge模块合并自行车销售信息表(gather_customer_order)+自行车每月累计销售数量表(gather_customer_order_group_month)
# 利用pd.merge模块合并自行车销售信息表(gather_customer_order)+自行车每月累计销售数量表(gather_customer_order_group_month)
# 赋值变量给order_num_proportion
order_num_proportion = pd.merge(gather_customer_order ,gather_customer_order_group_month, on = ['create_year_month'])
order_num_proportion
通过自行车销量/自行车每月销量计算每单每月的销售量占比
# 计算自行车销量/自行车每月销量占比,计算结果形成新的列'order_proportion'
order_num_proportion['order_proportion']=(order_num_proportion['order_num_x'])/(order_num_proportion['order_num_y'])
order_num_proportion
将每月自行车销售数据存到mysql中,将最终的order_num_proportion的DataFrame存入Mysql的ppt_bicycle_product_sales_month_4
engine = create_engine('mysql+pymysql://frogdataXX:密码@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan = engine
order_num_proportion.to_sql('ppt_bicycle_product_sales_month_4_yuan',con = yuan ,if_exists= 'replace')
导入到Excel 中
order_num_proportion.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\ppt_bicycle_product_sales_month_4_yuan.xlsx')
查看cpzl_zw有哪些产品子类
# 查看cpzl_zw有哪些产品子类
gather_customer_order['cpzl_zw'].unique()
(2)公路自行车细分市场表现
筛选出公路自行车,并将按照月份和不同型号的公路自行车进行分组,对销售量求和,并重置索引。
# 求公路自行车不同型号'product_name'字段的产品销售数量,赋值变量为gather_customer_order_road_month
gather_customer_order_road_month = gather_customer_order_road.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
gather_customer_order_road_month
# 求每个月公路自行车累计销售数量 赋值为gather_customer_order_road_month_sum,记得重置索引
gather_customer_order_road_month_sum =gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
gather_customer_order_road_month_sum =gather_customer_order_road_month[gather_customer_order_road_month['cpzl_zw'] == '公路自行车']. groupby('create_year_month').agg({'order_num':sum}).reset_index()
gather_customer_order_road_month_sum.head()
# 在gather_customer_order_road_month基础上,合并公路自行车每月累计销售数量gather_customer_order_road_month_sum,主键为'create_year_month'
gather_customer_order_road_month = pd.merge(gather_customer_order_road_month , gather_customer_order_road_month_sum,on='create_year_month')
gather_customer_order_road_month
(3)山地自行车
与公路自行车处理过程一致,赋予变量gather_customer_order_Mountain筛选山地自行车→求山地自行车不同型号的产品销售数量→求每月累计销售数量→合并→目的是用于产品子类比较环比
#筛选出山地自行车
gather_customer_order_Mountain = gather_customer_order[gather_customer_order['cpzl_zw']=='山地自行车']
# 求山地自行车不同型号的产品销售数量
gather_customer_order_Mountain_month = gather_customer_order_Mountain.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
gather_customer_order_Mountain_month['cpzl_zw'] = '山地自行车'
gather_customer_order_Mountain_month
#求每月累计销售数量
gather_customer_order_Mountain_month_sum = gather_customer_order_Mountain_month.groupby('create_year_month').agg({'order_num': sum}).reset_index()
gather_customer_order_Mountain_month_sum
#合并gather_customer_order_Mountain_month,gather_customer_order_Mountain_month_sum两个表
gather_customer_order_Mountain_month=pd.merge(gather_customer_order_Mountain_month ,gather_customer_order_Mountain_month_sum ,on = 'create_year_month' )
gather_customer_order_Mountain_month
(4)旅游自行车
与公路自行车处理过程一致,赋予变量gather_customer_order_tour筛选山地自行车→求山地自行车不同型号的产品销售数量→求每月累计销售数量→合并→目的是用于产品子类比较环比
#筛选
gather_customer_order_tour = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行车']
gather_customer_order_tour
#求旅游自行车不同型号产品销售数量
gather_customer_order_tour_month = gather_customer_order_tour.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
gather_customer_order_tour_month ['cpzl_zw'] = '旅游自行车'
gather_customer_order_tour_month
#求每月累计销售数量
gather_customer_order_tour_month_sum = gather_customer_order_tour_month.groupby('create_year_month').agg({'order_num':sum}).reset_index()
gather_customer_order_tour_month_sum
#合并
gather_customer_order_tour_month = pd.merge(gather_customer_order_tour_month ,gather_customer_order_tour_month_sum ,on= 'create_year_month')
gather_customer_order_tour_month
将山地自行车、旅游自行车、公路自行车每月销量信息合并,并计算占比
#将山地自行车、旅游自行车、公路自行车每月销量信息合并
gather_customer_order_month = pd.concat([gather_customer_order_road_month , gather_customer_order_Mountain_month,gather_customer_order_tour_month])
gather_customer_order_month
# 新增一列'order_num_proportio',为销售量占每月自行车总销售量比率
#各类自行车,销售量占每月自行车总销售量比率
gather_customer_order_month['order_num_proportio'] = (gather_customer_order_month['order_num_x'])/(gather_customer_order_month['order_num_y'])
gather_customer_order_month
修改列名
gather_customer_order_month = gather_customer_order_month.rename(columns = {'order_num_x':'order_month_product','order_num_y':'sum_order_month'})
gather_customer_order_month
将数据存入数据库,并将其导入到Excel中
#将数据存入数据库
engine = create_engine('mysql+pymysql://frogdataXXXX:密码@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan=engine
gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4_yuan',con = yuan,if_exists = 'replace')
gather_customer_order_month.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicycle_product_sales_order_month_4_yuan.xlsx')
(5)2019年11月自行车环比
筛选出2019年10月和11月的自行车数据
gather_customer_order_month_10_11 =gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2019-10','2019-11'])]
gather_customer_order_month_10_11
将10月和11月的自行车销售信息排序
#排序。将10月11月自行车销售信息排序
gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month'])
gather_customer_order_month_10_11.head()
查看自行车的种类
product_name =list(gather_customer_order_month_10_11['product_name'].unique())
product_name
计算每个类型自行车11月份的环比数据
# 计算每个类型11月份自行车的环比
order_top_x = pd.Series([])
for i in product_name:
a =gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']==i]['order_month_product'].pct_change().fillna(0)
order_top_x = order_top_x.append(a)
gather_customer_order_month_10_11['order_num_diff'] =order_top_x
gather_customer_order_month_10_11
筛选出11月份的数据
gather_customer_order_month_11 = gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month']== '2019-11']
gather_customer_order_month_11
(6)2019年1月至11月产品累计销量
筛选出1月到11月的数据
#使用str.contains()函数筛选出2019年的数据,然后用~取反,将12月份的数据排除,这里的str.contains()类似于SQL中的like
gather_customer_order_month_1_11 = gather_customer_order_month[gather_customer_order_month['create_year_month'].str.contains('2019') & ~gather_customer_order_month['create_year_month'].str.contains('12')]
gather_customer_order_month_1_11.head()
#计算2019年1月至11月自行车累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby('product_name').agg({'order_month_product':sum}).reset_index()
gather_customer_order_month_1_11_sum
#重命名sum_order_1_11:1-11月产品累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})
gather_customer_order_month_1_11_sum.head()
(7)2019年11月自行车产品销量、环比、累计销量
累计销量我们在gather_customer_order_month_1_11_sum中已计算好,11月自行车环比、及产品销量占比在gather_customer_order_month_11已计算好,这里我们只需将两张表关联起来,用pd.merge()
#按相同字段product_name产品名,合并两张表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on = 'product_name')
gather_customer_order_month_11
将最终gather_customer_order_month_11的DataFrame存入Mysql的pt_bicycle_product_sales_order_month_11当中
#将最终gather_customer_order_month_11的DataFrame存入Mysql的pt_bicycle_product_sales_order_month_11当中
engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan=engine
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11_yuan',con = yuan,if_exists = 'replace')
细分市场销量表现
细分市场销量表现表
11月公路自行车占比最多 ,较10月相比,旅游自行车增速最快
公路自行车细分市场销量表现
公路自行车细分市场销量表现表
11月公路自行车,除Road-350-W Yellow外,其他型号的自行车环比都呈上升趋势 Road-650 较10月增长14.29%,增速最快 。Road-150 Red销售占比最高,约为19.63%
山地自行车细分市场销售表现
山地自行车细分市场销售表现表
11月山地自行车,除Mountain-200 Black外,其他型号的自行车环比呈上升的趋势 型号Mountain-500 Silver增速最快,为19.51% 型号Mountain-200 Silver销售份额占比最大
旅游自行车细分市场销售表现
旅游自行车细分市场销售表现表
11月旅游自行车,除型号Touring-2000 Blue、Touring-3000 Blue外,其他型号的自行车环呈上升趋势,型号Touring-1000 Yellow较10月增速最快,为27.18% ,型号Touring-1000 Blue销售份额占比最大,为32.52%
四、用户行为分析
这里我们需要使用订单明细表:ods_sales_orders,ods_customer用户表
需要读取数据库客户信息表
#读取数据库客户信息表
# 导入order_customer表
engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83:3306/adventure_ods?charset=gbk')
datafrog=engine
df_CUSTOMER = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = datafrog)
#导入ods_sales_orders表
engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83:3306/adventure_ods?charset=gbk')
datafrog=engine
df_sales_orders_11 = pd.read_sql_query("select * from ods_sales_orders where create_date>='2019-11-1' and create_date<'2019-12-1'",con = datafrog)
观察数据可知,销售订单表中没有客户年龄性别等信息,因此需要将销售信息表和客户信息表合并。
sales_customer_order_11=pd.merge(df_sales_orders_11,df_CUSTOMER,on='customer_key',how= 'left')
sales_customer_order_11
利用split函数提取sales_customer_order_11['birth_date'],获取客人的年份作为新的一列,以字符串类型存储
customer_birth_year = sales_customer_order_11['birth_date'].str.split('-',2).apply(lambda x :x[0] if type(x) == list else x)
customer_birth_year.name='birth_year'
sales_customer_order_11 = pd.concat([sales_customer_order_11,customer_birth_year],axis = 1)
sales_customer_order_11
(1)用户年龄分析
#修改出生年为int数据类型
sales_customer_order_11['birth_year'] = sales_customer_order_11['birth_year'].fillna(method = 'ffill').astype('int')
# 计算用户年龄
sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']
sales_customer_order_11.head()
利用pd.cut()函数对年龄进行分层
# 请利用customer_age字段,进行年龄分层,划分层次为"30-34","35-39","40-44","45-49","50-54","55-59","60-64",最终形成age_level字段
customer_age_lst =[i for i in range(30 , 68 ,5)]
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'] , bins = customer_age_lst ,right =False, labels = ['30-34','35-39','40-44','45-49','50-54','55-59','60-64'])
sales_customer_order_11
筛选出销售订单信息为自行车的订单信息
#筛选销售订单为自行车的订单信息
df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行车']
df_customer_order_bycle
计算年龄比例
# 计算年龄比例,最终形成df_customer_order_bycle['age_level_rate']
df_customer_order_bycle ['age_level_rate'] = 1/(df_customer_order_bycle.customer_key.count())
将年龄划分为3个层次,分别为<=29'、'30-39'、'>=40',因为年龄最大的用户是62岁,所以将上线设置为100
# 将年龄分为3个层次,分别为'<=29'、'30-39'、'>=40'
df_customer_order_bycle['age_level2'] = pd.cut(df_customer_order_bycle['customer_age'], bins = [0,30,40,100] ,right= False, labels = ['<=29','30-39','>=40'])
# 求每个年龄段人数
age_level2_count = df_customer_order_bycle.groupby(by = 'age_level2').sales_order_key.count().reset_index()
age_level2_count
(2)用户性别
计算不同性别的总人数
gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
gender_count
计算每个客户的年龄在该年龄段的比率
#将age_level2_count合并在df_customer_order_bycle中,并改名
df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'sales_order_key_y':'age_level2_count'})
df_customer_order_bycle['age_level2_rate'] = 1/df_customer_order_bycle['age_level2_count']
计算每个客户的性别在该性别的比率
#将gender_count合并在df_customer_order_bycle中,并改名
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
df_customer_order_bycle.head()
将df_customer_order_bycle 将11月自行车用户存入数据库
#df_customer_order_bycle 将11月自行车用户存入数据库
#存入数据库
engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan=engine
df_customer_order_bycle.to_sql('pt_user_behavior_november_yuan',con = yuan ,if_exists='replace')
2019年11月全国网络年龄分布
年龄段消费群分析
根据年龄断划分,年龄35-39岁消费人数占比最 高,为29%;之后随着年龄的增长,占比逐渐下降
针对年龄(大于30岁)和细分市场的关联分析, 购买公路自行车占比最大,旅游自行车占比最小。
全国男女比例
男女消费群分析
男性与女性购买自行车占比几乎相同
针对性别和细分市场的关联分析,男性和女 性购买公路自行车占比最高,购买旅游自行 车占比最少
五、2019年11月热品销售分析
(1)11月产品销售量TOP10产品,销售数量及环比
筛选11 月的数据
#筛选11月数据
gather_customer_order_11 = gather_customer_order.loc[gather_customer_order['create_year_month'] == '2019-11']
gather_customer_order_11
计算产品销售数量,按照销量降序,取TOP10产品
#计算产品销售数量,\ 为换行符
#按照销量降序,取TOP10产品
customer_order_11_top10 = gather_customer_order_11.groupby('product_name').agg({'order_num': 'count' }).reset_index().\
sort_values(by = 'order_num',ascending = False).head(10)
#TOP10销量产品信息
list(gather_customer_order_11_top10['product_name'])
计算TOP10销量和环比
#查看11月环比数据
gather_customer_order_month_10_11.head()
这里我们只需要四个字段:create_year_month月份,product_name产品名,order_month_product本月销量,cpzl_zw产品类别,order_num_diff本月产品销量环比
customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].\ isin(list(customer_order_11_top10['product_name']))]
customer_order_month_10_11
给销量前10的型号加上一个字段本月TOP10销量
customer_order_month_10_11['category'] = '本月TOP10销量'
customer_order_month_10_11.head()
(2)11月增速TOP10产品,销售数量及环比
customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].\
sort_values(by = 'order_num_diff',ascending = False).head(10)
customer_order_month_11
筛选出11 月增速TOP10 的型号
customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].\ isin(list(customer_order_month_11['product_name']))]
筛选我们需要的四个字段:create_year_month月份,product_name产品名,order_month_product本月销量,cpzl_zw产品类别,order_num_diff本月产品销量环比
customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '本月TOP10增速'
customer_order_month_11_top10_seep
将增速top10的表和销量top10的表合并
#axis = 0按照行维度合并,axis = 1按照列维度合并
hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
hot_products_11
将数据存到mysql数据库中
engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
yuan=engine
hot_products_11.to_sql('pt_hot_products_november_yuan',con = yuan,if_exists = 'replace')
11月型号为Mountain-200 Silver销售量最多,为395辆;较 10月增长10.64%
11月,型号为Touring-1000 Yellow增速最快;较10月增长 27.18%
网友评论