美文网首页
Adventure项目分析(一)

Adventure项目分析(一)

作者: 我就是那个无敌大长腿 | 来源:发表于2020-08-07 18:16 被阅读0次

    本文是对Adventure项目案例的分析总结,主要使用jupyter进行数据处理,将处理好的数据存储到数据库中,连接到PowerBI实现可视化。

    项目目录

    • 项目简介
    • 分析思路与过程
    • 制作PPT

    一、项目简介

    • 公司业务简介

    Adventure Works Cycle是国内一家制造公司,该公司生产和销售金属和复合材料自行车在全国各个市场。销售方式主要有两种,前期主要是分销商模式,但是2018年公司实现财政收入目标后,2019就开始通过公司自有网站获取线上商户进一步扩大市场。

    • 分析背景

    2019年12月需要向领导汇报2019年11月自行车销售情况,为精细化运营提供数据支持,能精准的定位目标客户群体。

    • 分析目的

    1、制定销售策略,调整产品结构,保持高速增长,获取更多的收益,占领更多市场份额。
    2、通过对整个公司的自行车销量持续监测和分析,掌握公司自行车销售状况、走势的变化,为客户制订、调整和检查销售策略,完善产品结构提供依据。

    • 数据源简介

    根据业务需求,从数据库中梳理出三张表分析:
    1.ods_sales_orders 订单明细表——用于用户行为分析。
    2.dw_customer_order 时间地区产品聚合表——用于整体销售表现,地域销售表现,产品销售表现,热品销售分析。
    3.ods_customer 每日新增用户表——用户用户行为分析。

    ods_sales_orders订单明细表 dw_customer_order时间地区聚合表 ods_customer每日新增用户表

    二、分析思路与过程

    • 分析思路

    • 分析过程

    0、数据集观察

    (1)导入常用包

    #导入数据模块
    import pandas as pd
    import numpy as np
    #引入pymysql
    import pymysql
    pymysql.install_as_MySQLdb()
    from sqlalchemy import create_engine
    import datetime
    

    (2)导入数据集

    #从数据库读取数据源:从Mysql读取dw_customer_order,形成DataFrame格式,赋予变量gather_customer_order
    #创建数据库引擎
    engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83/adventure_ods?charset=utf8')
    yuan = engine
    gather_customer_order = pd.read_sql_query('select * from dw_customer_order ',con = yuan)
    

    (3)数据集的初步了解

    gather_customer_order.head()
    
    gather_customer_order.info()
    

    为了便于后续按月分析数据,需要增加一个月份字段create_year_month,用于存储年月数据。

    # 利用create_date字段增加create_year_month月份字段
    gather_customer_order['create_year_month'] = gather_customer_order['create_date'].apply(lambda x :x.strftime('%Y-%m'))
    gather_customer_order['create_year_month'] .head()
    

    (4)筛选出自行车的数据

    # 筛选产品类型cplb_zw中的自行车作为新的gather_customer_order
    gather_customer_order=gather_customer_order.loc[gather_customer_order['cplb_zw']=='自行车']
    gather_customer_order
    

    1、整体销售表现:分析2019.1—2019.11自行车整体销售表现

    字段解释
    create_date 订单日期
    product_name 产品名
    cpzl_zw 产品子类
    cplb_zw 产品类别
    order_num 产品销售数量
    customer_num 购买客户数
    sum_amount 产品销售金额
    is_current_year 是否当前年(1:是,0:否)
    is_last_year 是否上一年(1:是,0:否)
    is_yesterday 是否昨天(1:是,0:否)
    is_today 是否今天(1:是,0:否)
    is_current_month 是否当前余额(1:是,0:否)
    is_current_quarter 是否当前季度(1:是,0:否)
    chinese_province 所在省份
    chinese_city 所在城市
    chinese_territory 所在区域

    pd.set_option('display.float_format', lambda x: '%.6f' % x)#取消科学计数法
    

    (1)、自行车整体销量表现

    # 聚合每月订单数量和销售金额,具体groupby创建一个新的对象,需要将order_num、sum_amount求和,对日期降序排序,记得重置索引
    overall_sales_performance = gather_customer_order.groupby('create_year_month').agg({'order_num' : sum , 'sum_amount' : sum}).reset_index().\
                                 sort_values('create_year_month',ascending = False)        
    overall_sales_performance
    
    #新增一列order_num_diff,此为每月自行车销售订单量环比,本月与上月相比
    order_num_diff =list((overall_sales_performance.order_num.diff())/(overall_sales_performance.order_num)/-1)
    order_num_diff.pop(0)
    order_num_diff.append(0)
    order_num_diff
    

    这里使用diff()函数计算环比,diff()=前一个数—后一个数,将上述的环比列表转换为DataFrame并重命名,并将其拼接在overall_sales_performance中,并重命名为order_num_diff

    order_num_diff=pd.DataFrame(order_num_diff)
    
    overall_sales_performance =overall_sales_performance.set_index('create_year_month').reset_index()
    overall_sales_performance = pd.concat([overall_sales_performance ,order_num_diff] ,axis = 1)
    overall_sales_performance
    

    (2)自行车整体销售额表现

    # 新增一列sum_amount_diff,此为每月自行车销售金额环比,原理一样,但是所需字段不同,最后形成按照日期升序排列
    sum_amount_diff = list((overall_sales_performance.sum_amount.diff())/(overall_sales_performance.sum_amount)/-1)
    sum_amount_diff.pop(0)
    sum_amount_diff.append(0)
    sum_amount_diff
    #将环比转换为Datadiff
    sum_amount_diff = pd.DataFrame(sum_amount_diff,columns=['sum_amount_diff'])],axis=1 )
    sum_amount_diff
    #将overall_sales_performance和sum_amount_diff拼接起来
    overall_sales_performance = pd.concat([overall_sales_performance ,sum_amount_diff],axis = 1)
    overall_sales_performance
    

    将最终的overall_sales_performance的DataFrame存到Mysql的pt_overall_sale_performance_1当中。

    engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
    yuan = engine
    overall_sales_performance.to_sql('pt_overall_sale_performance_1_yuan',con = yuan ,if_exists= 'replace')
    
    可视化实现
    自行车销量走势图

    近11个月,11月自行车销售额最多,为3316辆,较10月增长了7.1%


    自行车销售额走势图

    近11月自行车销售量最多,为6190万元,较10月增长了8.7%,销售金额与销售数量趋势一致。

    二、2019年11月自行车地域销售表现

    (1)2019年11月区域销售表现

    数据清洗筛选10月和11月的自行车数据

    # 筛选10、11月的自行车数据,赋值变量为gather_customer_order_10_11
    gather_customer_order_10_11 = gather_customer_order[(gather_customer_order[ 'create_year_month' ]=='2019-10')|(gather_customer_order[ 'create_year_month' ]=='2019-11') ]
    gather_customer_order_10_11
    
    # 按照'chinese_territory','create_year_month',区域、月份分组,订单量求和、销售金额求和,赋予变量gather_customer_order_10_11_group,记得重置索引
    gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(['chinese_territory','create_year_month']).agg({'order_num':sum , 'sum_amount':sum}).reset_index()
    gather_customer_order_10_11_group 
    

    提取各个区域并存储在列表中,为后续计算11月的环比数据做准备。

    region_list = gather_customer_order_10_11['chinese_territory'].unique()
    region_list 
    

    这里需要生成order_x和amount_x两个空Series,用来存放11月各区域销售量和销售额的环比。pct_change()是(后一个值——前一个值)/前一个值

    order_x=pd.Series([])
    amount_x=pd.Series([])
    #因为没有九月份的数据所以10月份的环比就为NaN,所以这里将Nan换成0
    for i in region_list:
        a = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['order_num'].pct_change().fillna(0)
        b = gather_customer_order_10_11_group[gather_customer_order_10_11_group['chinese_territory'] == i]['sum_amount'].pct_change().fillna(0)
        order_x = order_x.append(a)
        amount_x = amount_x.append(b)
    gather_customer_order_10_11_group['order_diff']=order_x
    gather_customer_order_10_11_group['amount_diff']= amount_x
    gather_customer_order_10_11_group.head()
    

    将最终的gather_customer_order_10_11_group的DataFrame存入Mysql的pt_bicy_november_territory_2当中,

    engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
    yuan = engine
    gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory_2_yaun',con = yuan ,if_exists= 'replace')
    

    将其导入到Excel中

    gather_customer_order_10_11_group.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicy_november_territory_2_yaun.xlsx')
    

    (2)2019年11月自行车销售量Top10城市环比

    筛选11月自行车交易数据 赋予变量为gather_customer_order_11

    gather_customer_order_11 = gather_customer_order_10_11.loc[gather_customer_order_10_11['create_year_month']== '2019-11']
    gather_customer_order_11
    

    按照城市分组并对销售量求和,并进行降序排列,查看销量前十的城市。 ```

    # 按照customer_order_11将gather_hinese_city城市分组,求和销售数量order_num,
    # 最终查看11月自行车销售数量前十城市,赋予变量gather_customer_order_city_head
    gather_customer_order_11 = gather_customer_order_11.groupby('chinese_city').agg({'order_num': sum}).reset_index().sort_values(by = 'order_num', ascending = False)
    gather_customer_order_11=gather_customer_order_11.head(10)
    gather_customer_order_11_head
    
    # 根据gather_customer_order_city_head的前十城市,查看10月11月自行车销售数据gather_customer_order_10_11
    # 赋予变量gather_customer_order_10_11_head
    #查看10月11月的自行车销售数据
    gather_customer_order_10_11.head()
    #查看10月11月的自行车销售数据,筛选的是11月top10的城市,这里会用到isin()函数
    gather_customer_order_10_11_head = gather_customer_order_10_11[gather_customer_order_10_11.chinese_city.isin(list(gather_customer_order_11_head['chinese_city']))]
    #分组计算前十城市,自行车销售数量销售金额
    gather_customer_order_city_10_11 =gather_customer_order_10_11_head.groupby(['chinese_city','create_year_month']).agg({'order_num':sum ,'sum_amount':sum}).reset_index()
    gather_customer_order_city_10_11
    

    注意这里的isin()函数是要筛选出gather_customer_order_city_10_11中11月Top10城市
    计算11月份销售额和销售量的环比

    # 根据gather_customer_order_city_10_11,计算前10的销售金额及销售量环比
    city_top_list = gather_customer_order_city_10_11.chinese_city.unique()
    order_top_x=pd.Series([])
    amount_top_x=pd.Series([])
    for i in city_top_list:
        a =gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city']==i]['order_num'].pct_change().fillna(0)
        b =gather_customer_order_city_10_11[gather_customer_order_city_10_11['chinese_city']==i]['sum_amount'].pct_change().fillna(0)
        order_top_x=order_top_x.append(a)
        amount_top_x=amount_top_x.append(b)
    gather_customer_order_city_10_11['order_diff']=order_top_x
    gather_customer_order_city_10_11['amount_diff'] =  amount_top_x
    gather_customer_order_city_10_11
    

    将数据存到mysql中,并导出到Excel中

    engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
    yuan = engine
    gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3_yuan',con = yuan ,if_exists= 'replace')
    
    gather_customer_order_city_10_11.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicy_november_october_city_3_yuan.xlsx')
    
    实现可视化
    地域销售环比增速

    11月华东地区自行车销售量在8个地区中最多 ,较10月,华南地区增加23.6%,增速最快


    Top10城市销售量表现
    TOP城市市场份额占比

    北京市和上海市销售量最多,郑州市环比增长最快,达到4.8%
    TOP城市市场份额总占比13.41%

    三、2019年11月自行车产品销量表现

    (1)细分市场销量表现
    gather_customer_order表利用groupby聚合月份,求每个月自行车的销售数量,赋值给变量gather_customer_order_group_month

    # gather_customer_order表利用groupby聚合月份,求每个月自行车的销售数量,赋值给变量gather_customer_order_group_month
    gather_customer_order_group_month = gather_customer_order.groupby('create_year_month').agg({'order_num':sum}).reset_index()
    gather_customer_order_group_month
    

    利用pd.merge模块合并自行车销售信息表(gather_customer_order)+自行车每月累计销售数量表(gather_customer_order_group_month)

    # 利用pd.merge模块合并自行车销售信息表(gather_customer_order)+自行车每月累计销售数量表(gather_customer_order_group_month)
    # 赋值变量给order_num_proportion
    order_num_proportion = pd.merge(gather_customer_order ,gather_customer_order_group_month, on = ['create_year_month'])
    order_num_proportion 
    

    通过自行车销量/自行车每月销量计算每单每月的销售量占比

    # 计算自行车销量/自行车每月销量占比,计算结果形成新的列'order_proportion'
    order_num_proportion['order_proportion']=(order_num_proportion['order_num_x'])/(order_num_proportion['order_num_y'])
    order_num_proportion
    

    将每月自行车销售数据存到mysql中,将最终的order_num_proportion的DataFrame存入Mysql的ppt_bicycle_product_sales_month_4

    engine = create_engine('mysql+pymysql://frogdataXX:密码@106.15.121.232/datafrog05_adventure?charset=utf8')
    yuan = engine
    order_num_proportion.to_sql('ppt_bicycle_product_sales_month_4_yuan',con = yuan ,if_exists= 'replace')
    

    导入到Excel 中

    order_num_proportion.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\ppt_bicycle_product_sales_month_4_yuan.xlsx')
    

    查看cpzl_zw有哪些产品子类

    # 查看cpzl_zw有哪些产品子类
    gather_customer_order['cpzl_zw'].unique()
    

    (2)公路自行车细分市场表现

    筛选出公路自行车,并将按照月份和不同型号的公路自行车进行分组,对销售量求和,并重置索引。

    # 求公路自行车不同型号'product_name'字段的产品销售数量,赋值变量为gather_customer_order_road_month
    gather_customer_order_road_month = gather_customer_order_road.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
    gather_customer_order_road_month
    
    # 求每个月公路自行车累计销售数量 赋值为gather_customer_order_road_month_sum,记得重置索引
    gather_customer_order_road_month_sum =gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
    gather_customer_order_road_month_sum =gather_customer_order_road_month[gather_customer_order_road_month['cpzl_zw'] == '公路自行车']. groupby('create_year_month').agg({'order_num':sum}).reset_index()                                 
    gather_customer_order_road_month_sum.head()
    
    # 在gather_customer_order_road_month基础上,合并公路自行车每月累计销售数量gather_customer_order_road_month_sum,主键为'create_year_month'
    gather_customer_order_road_month = pd.merge(gather_customer_order_road_month , gather_customer_order_road_month_sum,on='create_year_month')
    gather_customer_order_road_month
    

    (3)山地自行车

    与公路自行车处理过程一致,赋予变量gather_customer_order_Mountain筛选山地自行车→求山地自行车不同型号的产品销售数量→求每月累计销售数量→合并→目的是用于产品子类比较环比

    #筛选出山地自行车
    gather_customer_order_Mountain = gather_customer_order[gather_customer_order['cpzl_zw']=='山地自行车']
    # 求山地自行车不同型号的产品销售数量
    gather_customer_order_Mountain_month = gather_customer_order_Mountain.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
    gather_customer_order_Mountain_month['cpzl_zw'] = '山地自行车'
    gather_customer_order_Mountain_month
    
    #求每月累计销售数量
    gather_customer_order_Mountain_month_sum = gather_customer_order_Mountain_month.groupby('create_year_month').agg({'order_num': sum}).reset_index()
    gather_customer_order_Mountain_month_sum
    
    #合并gather_customer_order_Mountain_month,gather_customer_order_Mountain_month_sum两个表
    gather_customer_order_Mountain_month=pd.merge(gather_customer_order_Mountain_month ,gather_customer_order_Mountain_month_sum ,on = 'create_year_month' )
    gather_customer_order_Mountain_month 
    

    (4)旅游自行车

    与公路自行车处理过程一致,赋予变量gather_customer_order_tour筛选山地自行车→求山地自行车不同型号的产品销售数量→求每月累计销售数量→合并→目的是用于产品子类比较环比

    #筛选
    gather_customer_order_tour = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行车']
    gather_customer_order_tour
    
    #求旅游自行车不同型号产品销售数量
    gather_customer_order_tour_month = gather_customer_order_tour.groupby(['create_year_month','product_name']).agg({'order_num':sum}).reset_index()
    gather_customer_order_tour_month ['cpzl_zw'] = '旅游自行车'
    gather_customer_order_tour_month 
    
    #求每月累计销售数量
    gather_customer_order_tour_month_sum = gather_customer_order_tour_month.groupby('create_year_month').agg({'order_num':sum}).reset_index()
    gather_customer_order_tour_month_sum 
    
    #合并
    gather_customer_order_tour_month = pd.merge(gather_customer_order_tour_month ,gather_customer_order_tour_month_sum ,on= 'create_year_month')
    gather_customer_order_tour_month
    

    将山地自行车、旅游自行车、公路自行车每月销量信息合并,并计算占比

    #将山地自行车、旅游自行车、公路自行车每月销量信息合并
    gather_customer_order_month = pd.concat([gather_customer_order_road_month , gather_customer_order_Mountain_month,gather_customer_order_tour_month])
    gather_customer_order_month
    # 新增一列'order_num_proportio',为销售量占每月自行车总销售量比率
    #各类自行车,销售量占每月自行车总销售量比率
    gather_customer_order_month['order_num_proportio'] = (gather_customer_order_month['order_num_x'])/(gather_customer_order_month['order_num_y'])
    gather_customer_order_month
    

    修改列名

    gather_customer_order_month = gather_customer_order_month.rename(columns = {'order_num_x':'order_month_product','order_num_y':'sum_order_month'})
    gather_customer_order_month
    

    将数据存入数据库,并将其导入到Excel中

    #将数据存入数据库
    engine = create_engine('mysql+pymysql://frogdataXXXX:密码@106.15.121.232/datafrog05_adventure?charset=utf8')
    yuan=engine
    gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4_yuan',con = yuan,if_exists = 'replace')
    
    gather_customer_order_month.to_excel('D:\\Users\\yuan\\Desktop\\linshi\\pt_bicycle_product_sales_order_month_4_yuan.xlsx')
    

    (5)2019年11月自行车环比

    筛选出2019年10月和11月的自行车数据

    gather_customer_order_month_10_11 =gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2019-10','2019-11'])]
    gather_customer_order_month_10_11 
    

    将10月和11月的自行车销售信息排序

    #排序。将10月11月自行车销售信息排序
    gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month'])
    gather_customer_order_month_10_11.head()
    

    查看自行车的种类

    product_name =list(gather_customer_order_month_10_11['product_name'].unique())
    product_name
    

    计算每个类型自行车11月份的环比数据

    # 计算每个类型11月份自行车的环比
    order_top_x = pd.Series([])
    for i in product_name:
        a =gather_customer_order_month_10_11[gather_customer_order_month_10_11['product_name']==i]['order_month_product'].pct_change().fillna(0)
        order_top_x = order_top_x.append(a)
    gather_customer_order_month_10_11['order_num_diff'] =order_top_x
    gather_customer_order_month_10_11
    

    筛选出11月份的数据

    gather_customer_order_month_11 = gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month']== '2019-11']
    gather_customer_order_month_11
    

    (6)2019年1月至11月产品累计销量

    筛选出1月到11月的数据

    #使用str.contains()函数筛选出2019年的数据,然后用~取反,将12月份的数据排除,这里的str.contains()类似于SQL中的like
    gather_customer_order_month_1_11 = gather_customer_order_month[gather_customer_order_month['create_year_month'].str.contains('2019') & ~gather_customer_order_month['create_year_month'].str.contains('12')]
    gather_customer_order_month_1_11.head()
    
    #计算2019年1月至11月自行车累计销量
    gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby('product_name').agg({'order_month_product':sum}).reset_index()
    gather_customer_order_month_1_11_sum 
    
    #重命名sum_order_1_11:1-11月产品累计销量
    gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})
    gather_customer_order_month_1_11_sum.head()
    

    (7)2019年11月自行车产品销量、环比、累计销量

    累计销量我们在gather_customer_order_month_1_11_sum中已计算好,11月自行车环比、及产品销量占比在gather_customer_order_month_11已计算好,这里我们只需将两张表关联起来,用pd.merge()

    #按相同字段product_name产品名,合并两张表
    gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,on = 'product_name')
    gather_customer_order_month_11 
    

    将最终gather_customer_order_month_11的DataFrame存入Mysql的pt_bicycle_product_sales_order_month_11当中

    #将最终gather_customer_order_month_11的DataFrame存入Mysql的pt_bicycle_product_sales_order_month_11当中
    engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
    yuan=engine
    gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11_yuan',con = yuan,if_exists = 'replace')
    
    细分市场销量表现
    细分市场销量表现表

    11月公路自行车占比最多 ,较10月相比,旅游自行车增速最快


    公路自行车细分市场销量表现
    公路自行车细分市场销量表现表

    11月公路自行车,除Road-350-W Yellow外,其他型号的自行车环比都呈上升趋势 Road-650 较10月增长14.29%,增速最快 。Road-150 Red销售占比最高,约为19.63%


    山地自行车细分市场销售表现
    山地自行车细分市场销售表现表
    11月山地自行车,除Mountain-200 Black外,其他型号的自行车环比呈上升的趋势 型号Mountain-500 Silver增速最快,为19.51%  型号Mountain-200 Silver销售份额占比最大
    旅游自行车细分市场销售表现
    旅游自行车细分市场销售表现表
    11月旅游自行车,除型号Touring-2000 Blue、Touring-3000 Blue外,其他型号的自行车环呈上升趋势,型号Touring-1000 Yellow较10月增速最快,为27.18% ,型号Touring-1000 Blue销售份额占比最大,为32.52%

    四、用户行为分析

    这里我们需要使用订单明细表:ods_sales_orders,ods_customer用户表
    需要读取数据库客户信息表

    #读取数据库客户信息表
    # 导入order_customer表
    engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83:3306/adventure_ods?charset=gbk')
    datafrog=engine
    df_CUSTOMER = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = datafrog)
    
    #导入ods_sales_orders表
    engine = create_engine('mysql+pymysql://frogXXXX:mima@106.13.128.83:3306/adventure_ods?charset=gbk')
    datafrog=engine
    df_sales_orders_11 = pd.read_sql_query("select *  from ods_sales_orders where create_date>='2019-11-1' and   create_date<'2019-12-1'",con = datafrog)
    

    观察数据可知,销售订单表中没有客户年龄性别等信息,因此需要将销售信息表和客户信息表合并。

    sales_customer_order_11=pd.merge(df_sales_orders_11,df_CUSTOMER,on='customer_key',how= 'left')
    sales_customer_order_11
    

    利用split函数提取sales_customer_order_11['birth_date'],获取客人的年份作为新的一列,以字符串类型存储

    customer_birth_year  = sales_customer_order_11['birth_date'].str.split('-',2).apply(lambda x :x[0] if type(x) == list else x)
    customer_birth_year.name='birth_year'
    sales_customer_order_11 = pd.concat([sales_customer_order_11,customer_birth_year],axis = 1)
    sales_customer_order_11
    

    (1)用户年龄分析

    #修改出生年为int数据类型
    sales_customer_order_11['birth_year'] = sales_customer_order_11['birth_year'].fillna(method  = 'ffill').astype('int')
    # 计算用户年龄
    sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']
    sales_customer_order_11.head()
    

    利用pd.cut()函数对年龄进行分层

    # 请利用customer_age字段,进行年龄分层,划分层次为"30-34","35-39","40-44","45-49","50-54","55-59","60-64",最终形成age_level字段
    customer_age_lst =[i for i in range(30 , 68 ,5)]
    sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'] , bins =  customer_age_lst ,right =False, labels = ['30-34','35-39','40-44','45-49','50-54','55-59','60-64'])
    sales_customer_order_11
    

    筛选出销售订单信息为自行车的订单信息

    #筛选销售订单为自行车的订单信息
    df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行车']
    df_customer_order_bycle 
    

    计算年龄比例

    # 计算年龄比例,最终形成df_customer_order_bycle['age_level_rate']
    df_customer_order_bycle ['age_level_rate'] = 1/(df_customer_order_bycle.customer_key.count())
    

    将年龄划分为3个层次,分别为<=29'、'30-39'、'>=40',因为年龄最大的用户是62岁,所以将上线设置为100

    # 将年龄分为3个层次,分别为'<=29'、'30-39'、'>=40'
    df_customer_order_bycle['age_level2'] = pd.cut(df_customer_order_bycle['customer_age'], bins = [0,30,40,100] ,right= False, labels = ['<=29','30-39','>=40'])
    # 求每个年龄段人数
    age_level2_count = df_customer_order_bycle.groupby(by = 'age_level2').sales_order_key.count().reset_index()
    age_level2_count
    

    (2)用户性别
    计算不同性别的总人数

    gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
    gender_count
    

    计算每个客户的年龄在该年龄段的比率

    #将age_level2_count合并在df_customer_order_bycle中,并改名
    df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'sales_order_key_y':'age_level2_count'})
    
    df_customer_order_bycle['age_level2_rate'] = 1/df_customer_order_bycle['age_level2_count']
    

    计算每个客户的性别在该性别的比率

    #将gender_count合并在df_customer_order_bycle中,并改名
    df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
    
    df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
    df_customer_order_bycle.head()
    

    将df_customer_order_bycle 将11月自行车用户存入数据库

    #df_customer_order_bycle 将11月自行车用户存入数据库
    #存入数据库
    engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
    yuan=engine
    df_customer_order_bycle.to_sql('pt_user_behavior_november_yuan',con = yuan ,if_exists='replace')
    
    2019年11月全国网络年龄分布
    年龄段消费群分析

    根据年龄断划分,年龄35-39岁消费人数占比最 高,为29%;之后随着年龄的增长,占比逐渐下降
    针对年龄(大于30岁)和细分市场的关联分析, 购买公路自行车占比最大,旅游自行车占比最小。


    全国男女比例
    男女消费群分析

    男性与女性购买自行车占比几乎相同
    针对性别和细分市场的关联分析,男性和女 性购买公路自行车占比最高,购买旅游自行 车占比最少

    五、2019年11月热品销售分析

    (1)11月产品销售量TOP10产品,销售数量及环比

    筛选11 月的数据

    #筛选11月数据
    gather_customer_order_11 = gather_customer_order.loc[gather_customer_order['create_year_month'] == '2019-11']
    gather_customer_order_11 
    

    计算产品销售数量,按照销量降序,取TOP10产品

    #计算产品销售数量,\ 为换行符
    #按照销量降序,取TOP10产品
    customer_order_11_top10 = gather_customer_order_11.groupby('product_name').agg({'order_num': 'count' }).reset_index().\
                                sort_values(by = 'order_num',ascending = False).head(10)
    #TOP10销量产品信息
    list(gather_customer_order_11_top10['product_name'])
    

    计算TOP10销量和环比

    #查看11月环比数据
    gather_customer_order_month_10_11.head()
    
    

    这里我们只需要四个字段:create_year_month月份,product_name产品名,order_month_product本月销量,cpzl_zw产品类别,order_num_diff本月产品销量环比

    customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
    customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].\                                                      isin(list(customer_order_11_top10['product_name']))]
    customer_order_month_10_11 
    

    给销量前10的型号加上一个字段本月TOP10销量

    customer_order_month_10_11['category'] = '本月TOP10销量'
    customer_order_month_10_11.head()
    

    (2)11月增速TOP10产品,销售数量及环比

    customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].\
                                sort_values(by = 'order_num_diff',ascending = False).head(10)
    customer_order_month_11
    

    筛选出11 月增速TOP10 的型号

    customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].\                                                       isin(list(customer_order_month_11['product_name']))]
    

    筛选我们需要的四个字段:create_year_month月份,product_name产品名,order_month_product本月销量,cpzl_zw产品类别,order_num_diff本月产品销量环比

    customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
    customer_order_month_11_top10_seep['category'] = '本月TOP10增速'
    customer_order_month_11_top10_seep
    

    将增速top10的表和销量top10的表合并

    #axis = 0按照行维度合并,axis = 1按照列维度合并
    hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
    hot_products_11
    

    将数据存到mysql数据库中

    engine = create_engine('mysql+pymysql://frogdata05:Frogdata!123@106.15.121.232/datafrog05_adventure?charset=utf8')
    yuan=engine
    hot_products_11.to_sql('pt_hot_products_november_yuan',con = yuan,if_exists = 'replace')
    


    11月型号为Mountain-200 Silver销售量最多,为395辆;较 10月增长10.64%




    11月,型号为Touring-1000 Yellow增速最快;较10月增长 27.18%

    相关文章

      网友评论

          本文标题:Adventure项目分析(一)

          本文链接:https://www.haomeiwen.com/subject/asshrktx.html