美文网首页
adventure项目总结

adventure项目总结

作者: Helluin92 | 来源:发表于2020-11-03 14:58 被阅读0次

一、项目背景介绍

Adventure Works Cycles是Adventure Works样本数据库所虚构的公司,这是一家大型跨国制造公司。该公司生产和销售自行车到北美,欧洲和亚洲的商业市场。虽然其基地业务位于华盛顿州博塞尔,拥有290名员工,但几个区域销售团队遍布整个市场。

1 客户类型

个人:客户通过网上零售店铺购买商品;
经销商:从Adventure Works Cycles销售代表处购买转售产品的零售店或批发店。

2 产品介绍

Adventure Works Cycles生产的自行车;
自行车部件,例如车轮,踏板或制动组件;
从供应商处购买的自行车服装,用于转售给Adventure Works Cycles的客户;
从供应商处购买的自行车配件,用于转售给Adventure Works Cycles的客户。

项目数据来源:数据来源于adventure Works Cycles公司的的样本数据库。

3 项目目标

通过现有数据监控商品的线上和线下销售情况,并且获取最新的商品销售趋势,以及区域分布情况,为公司的制造和销售提供指导性建议,以增加公司的收益。

二 、2019年11月自行车业务分析

目录:

  • 一、自行车整体销售表现
  • 二、2019年11月自行车地域销售表现
  • 三、2019年11月自行车产品销售表现
  • 四、用户行为分析
  • 五、2019年11月热品销售分析

项目准备

计算结果存入数据库_对应表名:

  • 自行车整体销售表现:pt_overall_sale_performance_1
  • 2019年11月自行车地域销售表现:pt_bicy_november_territory_2、pt_bicy_november_october_city_3
  • 2019年11月自行车产品销售表现:pt_bicycle_product_sales_month_4、pt_bicycle_product_sales_order_month_4、pt_bicycle_product_sales_order_month_11
  • 用户行为分析:pt_user_behavior_november
  • 2019年11月热品销售分析:pt_hot_products_november
导入模块
import pandas as pd
import numpy as np
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine

一、自行车整体销售表现

1.1、从数据库读取源数据:dw_customer_order
读取源数据。不同城市,每天产品销售信息
创建数据库引擎
engine = create_engine('mysql:/XXXXXXXXX/charset=gbk')
datafrog=engine
gather_customer_order=pd.read_sql_query("select * from dw_customer_order",con = datafrog)
查看源数据前5行,观察数据,判断数据是否正常识别
gather_customer_order.head()
image.png
查看表的数据类型
gather_customer_order.info()
image.png
增加create_year_month月份字段。按月维度分析时使用
gather_customer_order['create_year_month']=gather_customer_order["create_date"].apply(lambda x:x.strftime("%Y-%m"))

筛选产品类别为自行车的数据
gather_customer_order = gather_customer_order.loc[gather_customer_order['cplb_zw']=='自行车']
gather_customer_order
image.png
1.2、自行车整体销售量表现
每月订单数量和销售金额,用groupby创建一个新的对象,需要将order_num、sum_amount"求和
overall_sales_performance = gather_customer_order.groupby("create_year_month").agg({"order_num":sum,"sum_amount":sum})

按日期降序排序,方便计算环比
overall_sales_performance.sort_values(by="create_year_month",ascending=False,inplace=True)
image.png
每月自行车销售订单量环比,观察最近一年数据变化趋势
order_num_diff = list((overall_sales_performance.order_num.diff()/overall_sales_performance.order_num)/-1)
order_num_diff.pop(0) #删除列表中第一个元素
order_num_diff.append(0) #将0新增到列表末尾
overall_sales_performance["order_num_diff"] = order_num_diff
overall_sales_performance

每月自行车销售金额环比
sum_amount_diff = list((overall_sales_performance.sum_amount.diff()/overall_sales_performance.sum_amount)/-1)
sum_amount_diff.pop(0) #删除列表中第一个元素
sum_amount_diff.append(0) #将0新增到列表末尾
sum_amount_diff
将环比转化为DataFrame
overall_sales_performance["sum_amount_diff"] = sum_amount_diff
overall_sales_performance
image.png
销量环比字段名order_diff,销售金额环比字段名amount_diff
按照日期排序,升序
overall_sales_performance.reset_index()
overall_sales_performance = overall_sales_performance.rename(columns={"order_num_diff":"order_diff","sum_amount_diff":"amount_diff"}).reset_index(drop=True)\
.sort_values(by="create_year_month",ascending=True)
查看每月自行车订单量、销售金额、环比
overall_sales_performance
image.png

字段注释:
create_year_month:时间,
order_num:本月累计销售数量,
sum_amount:本月累计销售金额,
order_diff:本月销售数量环比,
sum_amount_diff:本月销售金额环比,
dw_customer_order:用户订单表

将数据存入数据库
engine = create_engine('mysql://XXXXX/XXXXX?charset=gbk')
datafrog=engine
overall_sales_performance.to_sql('pt_overall_sale_performance_1',con = datafrog,if_exists='append', index=False)

二、2019年11月自行车地域销售表现

2.1、源数据dw_customer_order,数据清洗筛选10月11月数据
gather_customer_order在分析自行车整体表现时已从数据库导入表(dw_customer_order),并筛选仅自行车数据
gather_customer_order.head()
image.png
筛选10月11月自行车数据
gather_customer_order_10_11 = gather_customer_order.loc[ gather_customer_order["create_year_month"].isin(["2019-10","2019-11"])]

10月11月自行车订单数据共6266条
len(gather_customer_order_10_11)
6266
2.2、2019年11月自行车区域销售量表现
按照区域、月分组,订单量求和,销售金额求和
gather_customer_order_10_11_group= gather_customer_order_10_11.groupby(['chinese_territory','create_year_month']).agg({"order_num":sum,"sum_amount":sum})

将区域存为列表
region_list=[]
x=gather_customer_order_10_11["chinese_territory"].value_counts()
for key in x.keys():
    region_list.append(key)
region_list
['华东', '华中', '西南', '华北', '华南', '西北', '东北', '台港澳']

不同区域10月11月环比
order_x = pd.Series([])
amount_x = pd.Series([])
for i in region_list:
    a=gather_customer_order_10_11_group.loc[gather_customer_order_10_11_group['chinese_territory']==i]['order_num'].pct_change()
    b=gather_customer_order_10_11_group.loc[gather_customer_order_10_11_group['chinese_territory']==i]['sum_amount'].pct_change()
    order_x=order_x.append(a)
    amount_x = amount_x.append(b)

新增order_diff和amount_diff两列
gather_customer_order_10_11_group['order_diff']=order_x
gather_customer_order_10_11_group['amount_diff']=amount_x
10月11月各个区域自行车销售数量、销售金额环比
gather_customer_order_10_11_group.head()
image.png

字段注释:
chinese_territory:区域,
create_year_month:时间,
order_num:区域销售数量,
sum_amount:区域销售金额,
order_diff:本月销售数量环比,
amount_diff:本月销售金额环比

将数据存入数据库
engine = create_engine('mysql://XXXXXXX/xxxxxx?charset=gbk')
datafrog=engine
gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory_2',con = datafrog,if_exists='append', index=False)
2.3、2019年11月自行车销售量TOP10城市环比
筛选11月自行车交易数据
gather_customer_order_11 = gather_customer_order.loc[gather_customer_order["create_year_month"]=="2019-11"]
将gather_customer_order_11按照chinese_city城市分组,求和销售数量order_num
gather_customer_order_city_11= gather_customer_order_11.groupby("chinese_city").agg({"order_num":sum}).reset_index()
11月自行车销售数量前十城市
gather_customer_order_city_head = gather_customer_order_city_11.sort_values(by="order_num",ascending=False).iloc[0:10]
查看11月自行车销售数量前十城市
gather_customer_order_city_head
image.png
在10月11月自行车销售数据表gather_customer_order_city_head筛选销售前十城市
gather_customer_order_10_11_head = gather_customer_order_10_11[gather_customer_order_10_11["chinese_city"].isin(list(gather_customer_order_city_head["chinese_city"]))]

分组计算前十城市,自行车销售数量销售金额
gather_customer_order_city_10_11 = gather_customer_order_10_11_head.groupby(["chinese_city",'create_year_month']).agg({"order_num":sum,"sum_amount":sum}).reset_index()

计算前十城市环比
city_top_list = list(gather_customer_order_city_head["chinese_city"])
order_top_x = pd.Series([])
amount_top_x = pd.Series([])
for i in city_top_list:
    #print(i)
    a=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11["chinese_city"]==i]["order_num"].pct_change()
    b=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11["chinese_city"]==i]["sum_amount"].pct_change()
    order_top_x=  order_top_x.append(a)
    amount_top_x =amount_top_x.append(b)

重命名order_diff为销售数量环比,amount_diff为销售金额环比
gather_customer_order_city_10_11['order_diff']=order_top_x
gather_customer_order_city_10_11['amount_diff']=amount_top_x
gather_customer_order_city_10_11.head(5)
image.png

字段注释
chinese_city:城市,
create_year_month:时间,
order_num:本月销售数量,
sum_amount:本月销售金额,
order_diff:本月销售数量环比,
amount_diff:本月销售金额环比

存入数据库
engine = create_engine('mysql://XXXXX/xxxxx?charset=gbk')
datafrog=engine
gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3',con = datafrog,if_exists='append', index=False)

三、2019年11月自行车产品销售表现

3.1、细分市场销量表现
每个月自行车累计销售数量
gather_customer_order_group_month = gather_customer_order.groupby("create_year_month").agg({"order_num":sum}).reset_index()
计算自行车销量/自行车每月销量占比
order_num_proportion['order_proportion'] = order_num_proportion["order_num_x"]/order_num_proportion["order_num_y"]
order_num_proportion
重命名sum_month_order:自行车每月销售量
order_num_proportion = order_num_proportion.rename(columns={"order_num_y":"sum_month_order"})
order_num_proportion.head()
image.png
image.png
字段注释

create_date:时间,
product_name:产品名,
cpzl_zw:产品类别,
cplb_zw:产品大类,
order_num_x:产品当天销售数量,
customer_num:当天用户购买人数,
sum_amount:产品当天销售金额,
chinese_province:省份,
chinese_city:城市,
chinese_territory:区域,
create_year_month:月份,
sum_month_order:本月累计销量,
order_proportion:产品销量占比

将每月自行车销售信息存入数据库
engine = create_engine('mysql://frogdata05:XXXXX@xxxxx/datafrog05_adventure?charset=gbk')
datafrog=engine
order_num_proportion.to_sql('pt_bicycle_product_sales_month_4',con = datafrog,if_exists='append', index=False)

3.3、公路/山地/旅游自行车细分市场表现

查看自行车有那些产品子类
gather_customer_order['cpzl_zw']
公路自行车细分市场销量表现
gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '公路自行车']
公路自行车不同型号产品销售数量
gather_customer_order_road_month = gather_customer_order_road.groupby(['create_year_month','product_name']).agg({"order_num":sum}).reset_index()

公路自行车不同型号产品销售数量
gather_customer_order_road_month = gather_customer_order_road.groupby(['create_year_month','product_name']).agg({"order_num":sum}).reset_index()

合并公路自行车gather_customer_order_road_month与每月累计销售数量
用于计算不同型号产品的占比
gather_customer_order_road_month = pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,on="create_year_month")
gather_customer_order_road_month 
image.png
山地自行车和旅游自行车与公路自行车分析方法一致
最后将三个张表数据合并
将山地自行车、旅游自行车、公路自行车每月销量信息合并
gather_customer_order_month = pd.concat([gather_customer_order_road_month,gather_customer_order_Mountain_month,gather_customer_order_tour_month],axis=0,sort=False)

各类自行车,销售量占每月自行车总销售量比率
gather_customer_order_month['order_num_proportio'] = gather_customer_order_month["order_num_x"]/gather_customer_order_month["order_num_y"]

将占比重命名为order_month_product当月产品累计销量,sum_order_month当月自行车总销量
gather_customer_order_month.rename(columns={"order_num_x":"order_month_product","order_num_y":"sum_order_month"},inplace=True)
gather_customer_order_month
image.png
字段注释:

create_year_month:时间,
product_name:产品名,
order_month_product:本月产品累计销量,
sum_order_month:当月自行车总销量,
order_num_proportio:本月产品销量占比

将数据存入数据库
engine = create_engine('mysql://XXXX@xxxxx/datafrog05_adventure?charset=gbk')
datafrog=engine
gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4',con = datafrog,if_exists='append', index=False)

计算2019年11月自行车环比

计算11月环比,先筛选10月11月数据
gather_customer_order_month_10_11 = gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2019-10','2019-11'])]

将10月11月自行车销售信息排序
gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month'])

取出自行车产品名称
product_name = list(gather_customer_order_month_10_11.product_name.drop_duplicates(keep='first'))

计算自行车销售数量环比
order_top_x = pd.Series([])
for i in product_name:
    b=gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11["product_name"]==i]["order_month_product"].pct_change().fillna(0)
    order_top_x=order_top_x.append(b)

将环比列重命名为order_top_x
gather_customer_order_month_10_11['order_num_diff'] = order_top_x

gather_customer_order_month_10_11.head()
image.png

计算2019年1月至11月产品累计销量

筛选2019年1月至11月自行车数据
gather_customer_order_month_1_11 =  gather_customer_order_month[gather_customer_order_month['create_year_month'].isin(['2019-01','2019-02','2019-03','2019-04','2019-05','2019-06','2019-07','2019-08','2019-09','2019-10','2019-11'])]

计算2019年1月至11月自行车累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby(by = 'product_name').order_month_product.sum().reset_index()

重命名sum_order_1_11:1-11月产品累计销量
gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})

2019年11月自行车产品销量、环比、累计销量

累计销量我们在gather_customer_order_month_1_11_sum中已计算好,11月自行车环比、及产品销量占比在gather_customer_order_month_11已计算好,这里我们只需将两张表关联起来

按相同字段product_name产品名,合并两张表
gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on="product_name")
gather_customer_order_month_11.head()
image.png
存入数据库
engine = create_engine('mysql://XXXX@xxxxx/datafrog05_adventure?charset=gbk')
datafrog=engine
gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11',con = datafrog,if_exists='append', index=False)

四、用户行为分析

读取数据库客户信息表(分析数据为2019年,所以读取数据库时加入判定条件,优化读取速度)

engine = create_engine('mysql://XXXX@xxxxx/adventure_ods?charset=gbk')
datafrog=engine
df_CUSTOMER = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = datafrog)
查看表结构
df_CUSTOMER.info()
image.png

读取数据库销售订单表b

engine = create_engine('mysql://XXXX@xxxx/adventure_ods?charset=gbk')
datafrog=engine
df_sales_orders_11 = pd.read_sql_query("select *  from ods_sales_orders where create_date>='2019-11-1' and   create_date<'2019-12-1'",con = datafrog)
df_sales_orders_11.info()
image.png

销售订单表中仅客户编号,无客户年龄性别等信息,需要将销售订单表和客户信息表合并

sales_customer_order_11=pd.merge(df_sales_orders_11,df_CUSTOMER,on='customer_key',how='left')
sales_customer_order_11.head(3)
image.png

4.1、用户年龄分析

计算用户年龄

修改出生年为int数据类型
sales_customer_order_11['birth_year'] = sales_customer_order_11.birth_year.values.astype('int64')
sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']

年龄分层1

自定义分层函数
def fenceng(age):
    if age>=30 and age<35:
        return "30-34"
    elif age>=35 and age<40:
        return"35-40"
    elif age>=40 and age<45:
        return"40-44"
    elif age>=45 and age<50:
        return"45-50"
    elif age>=55 and age<60:
        return"55-60"
    else:
        return"60-64"
sales_customer_order_11['customer_age'].apply(fenceng)

新增'age_level'分层区间列
sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'],[0,30,35,40,45,50,55,60],labels=["30-34","35-39","40-44","45-49","50-54","55-59","60-64"])
image.png

筛选销售订单为自行车的订单信息

df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行车']

计算年龄比率

df_customer_order_bycle['age_level_rate'] = 1 / len(df_customer_order_bycle)

再将年龄段经行划分

将年龄分为3个层次
df_customer_order_bycle["age_level2"]=pd.cut(df_customer_order_bycle['customer_age'],[0,30,40,60],labels=['<=29','30-39','>=40'])
每个年龄段人数
age_level2_count_1 =df_customer_order_bycle['age_level2'].value_counts()
image.png

4.2、用户性别分析

按性别分组
gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
将性别表和年龄表2进行合并
df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'sales_order_key_y':'age_level2_count'})
计算年龄比率
df_customer_order_bycle['age_level2_rate'] = 1/df_customer_order_bycle['age_level2_count']
将订单表和性别表合并
df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
计算性别比率
df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
存入数据库
engine = create_engine('mysql://XXXX/xxxx/datafrog05_adventure?charset=gbk')
datafrog=engine
df_customer_order_bycle.to_sql('pt_user_behavior_november',con = datafrog,if_exists='append', index=False)

5.1、11月产品销量TOP10产品,销售数量及环比

计算TOP10产品
按照销量降序,取TOP10产品
customer_order_11_top10 = gather_customer_order_11.groupby(by = 'product_name').order_num.count().reset_index().\
                        sort_values(by = 'order_num',ascending = False).head(10)
customer_order_11_top10.head()
image.png

TOP10销量产品信息

list(customer_order_11_top10['product_name'])
image.png
计算TOP10销量及环比
customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]

customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].\
                                                        isin(list(customer_order_11_top10['product_name']))]

customer_order_month_10_11['category'] = '本月TOP10销量'
customer_order_month_10_11.head()
image.png

5.2、11月增速TOP10产品,销售数量及环比

customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].\
                            sort_values(by = 'order_num_diff',ascending = False).head(10)

customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].\
                                                        isin(list(customer_order_month_11['product_name']))]
customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
customer_order_month_11_top10_seep['category'] = '本月TOP10增速'

合并TOP10销量表与TOP10增速表 ,按照行维度合并

hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
hot_products_11.tail()
image.png
存入数据库
engine = create_engine('mysql://XXXXXXX@xxxxx/datafrog05_adventure?charset=gbk')
datafrog=engine
hot_products_11.to_sql('pt_hot_products_november',con = datafrog,if_exists='append', index=False)

三、可视化展示和总结

数据可视化

image.png

1.整体销售情况

(1)自行车整体销售情况

image.png

近11个月销量最高月份为11月,为3316辆;较10月增长7.1%,3月份环比最高,较2月份增长12%,2月份销量全年最低

(2)自行车整体销售金额情况

image.png

近11个月,11月自行车销售金额最高,为6190万元,较10月增长8.7%;自行车销售金额与销售数量
趋势一致

2.地域销售分析

(1)地域销售环比增速

image.png

华东整体销量高于其他地区,华南地区的销售增长最高达到15%

(2)Top10城市销售情况

image.png

北京和上海在10,11月份销量领先,郑州市的增长最快

3.产品销售分析

细分市场销量分析

(1)细分市场销量分析
image.png
image.png

公路自行车销量占比最高达到市场份额一半,旅游自行车销量最低,消费者更偏爱公路自行车

(2)公路自行车销量分析
image.png

11月公路自行车,除Road-350-W Yellow外,其他型号的自行车环比都呈上升趋势 Road-650 较10月增长14.29%,增速最快
公路自行车中型号150red 、750black、550 W-Yellow销量占比相当,更受消费者欢迎

(3)山地自行车销量分析
image.png

11月山地自行车,除Mountain-200 Black外,其他型号的自行车环比呈上升的趋势
型号Mountain-500 Silver增速最快,为19.51%
山地自行车中型号Mountain-200 Silver,Mountain-200 Black销售份额占比最高,更受消费者青睐,说明这个车型设计较为受欢迎

(3)旅行自行车销量分析
image.png

11月旅游自行车,除型号Touring-2000 Blue、Touring-3000 Blue外,其他型号的自行车环呈上升趋势
型号Touring-1000 Yellow较10月增速最快,为27.18%
旅游自行车型号Touring-1000 Blue,Touring-1000 Yellow销售份额占比最大,更受消费者青睐,说明这个车型设计较为受欢迎

5.用户行为分析

(1)年龄

image.png
image.png

根据年龄断划分,年龄35-39岁消费人数占比最高为29%,之后随着年龄的增长,占比逐渐下降。

(2)性别

image.png
image.png

按照性别分析,男性消费者占比略多为55%,公路自行车无论男女都是最受欢迎产品,其次是山地自行车
性别消费占比基本一致

6.热品销售分析

(1)11月Top10销量产品

image.png

11月型号为Mountain-200 Silver销售量最多,为395辆;较10月增长10.64%

(2)11月Top10销量增速产品

image.png

11月型号为Touring-1000 Yellow增速最快;较10月增长28.4%

相关文章

网友评论

      本文标题:adventure项目总结

      本文链接:https://www.haomeiwen.com/subject/vqevmktx.html