美文网首页
Adventure项目

Adventure项目

作者: kh辰辰辰 | 来源:发表于2020-11-10 19:26 被阅读0次

    一、项目概述

    1. 成果预览

    分析概述:本文是对Adventure项目的总结,记录项目分析与实现过程,主要任务是对MySQL数据库中原始数据通过pandasnumpy聚合、分析,并使用聚合后数据进行dashboard的搭建,为业务、运营人员提供自主分析工具。在聚合后数据的基础上,汇报2019年11月自行车销售情况,为精细化运营提供数据支持,精准定位目标客户群体。

    dashboard预览:dashboard看板连接

    主页

    PPT预览:

    PPT

    2. 背景介绍

    2.1 公司背景

    Adventure Works Cycles是基于微软AdventureWorks示例数据库的虚拟公司,是一家大型跨国制造公司。该公司生产金属和复合材料自行车,并在全国进行销售。2019年12月业务部门需要向上级汇报11月份的自行车销售情况,为精细化运营提供数据支持,精准定位目标客户群体。

    2.2 产品介绍

    产品分为三个大类:自行车(公司主要产品)、服装、配件

    • 自行车:分为三种类型,公路自行车、山地自行车、旅游自行车
    • 服装:包括帽子、手套、短裤、背心、袜子、运动衫
    • 配件:包括头盔、挡泥板、水袋、清洁工、瓶子和笼子、自行车架、自行车看台、车胎和内胎

    二、数据搭建

    1. 指标搭建

    1.1 查看数据

    adventure Works Cycles公司的mysql数据库中有三张表:

    • ods_sales_orders:订单明细表(一个订单表示销售一个产品),主要包括每个订单的详细信息:订单主键、订单日期、客户编号、产品编号、产品名、产品类别、产品子类、产品单价这些字段


      订单明细表
    • ods_customer:每日新增用户表,本表与“ods_sales_orders 订单明细表”可通过客户编号字段关联,本表记录每日新增客户系信,包括日期、出生日期、性别、婚否、年收入以及详细地理位置:省份、城市、地址、区域和经纬度等字段

    每日新增用户表
    • dim_date_df:日期维度表,本表可通过日期字段与上述两张表关联,本表通过1和0对是否是当日、当月、当年、等字段的记录,可在后续对表的聚合加工中,对订单表与用户表进行不同时间维度查询


      日期维度表

    1.2 确定需求

    结合业务要求和现有数据,确定数据指标,为后续dashboard搭建和相关报告输出,提供完善数据。对MySQL数据库中三张表数据进行聚合和各维度关联等操作,完成下面三张表格:

    • dw_order_by_day:每日环比表
    1. 本表是对每日数据的一个汇总聚合,通过对ods_sales_orders(订单明细表)中的“create_date”字段进行聚合,获得每日总销售金额、总销量、并计算出平均订单价
    2. 设定每日销售量与销售金额目标值,便于后续dashboard展示目标完成情况
    3. 通过“create_date”字段关联dim_date_df(日期维度表),方便后续对日、月、季度、年等时间维度的销售情况进行聚合展示
    4. 计算每日销售金额与昨日销售金额环比
    每日环比表
    • dw_customer_order:时间-地区-产品聚合表
    1. 提取当日维度下ods_sales_orders(订单明细表)和ods_customer(每日新增用户表),并通过“customer_key”字段关联两张表
    2. 提取订单主键、订单日期、客户编号、产品名、产品子类、产品类别、产品单价、所在区域、所在省份、所在城市这些字段,使用groupby方法对订单日期、产品名、产品子类、产品类别、所在区域、所在省份、所在城市的逐级聚合
    3. 通过“create_date”字段关联dim_date_df(日期维度表),方便后续对日、月、季度、年等时间维度产品信息和地区信息的聚合展示
    4. 获得相应的订单总量、客户总量和销售总金额,完成对当日时间维度下产品和地区的聚合,可用于dashboard地区维度的切片和地图可视化
    时间-地区-产品聚合表
    • dw_amount_diff:当日维度表
    1. 提取生成的的dw_order_by_day(每日环比表)的数据
    2. 通过“is_today”、“is_yesterday”、“is_current_year”、“is_last_year”等这些字段中的值:1和0来判断时间
    3. 对当日、昨日、今年、去年等的销售金额,销售量和客单价聚合起来,完成当日同比、昨日同比 、本月同比、本季度同比以及本年同比
    4. 使用“amount”、“order“、”avg“标记销售金额,销售量和客单价的同比结果,方便横向提取数据
    每日环比表

    2. python处理数据

    使用python,连接服务器中MySQL数据库,通过关联、聚合、计算以及一些格式的转换等步骤,完成上述指标的搭建,并将生成的表格上传至MySQL数据库。详细代码如下:

    2.1 生成dw_order_by_day表

    导入相关模块,连接数据库

    import pandas as pd
    import pymysql
    import random
    from sqlalchemy import create_engine
    pymysql.install_as_MySQLdb()
    import datetime
    import warnings
    warnings.filterwarnings("ignore")
    # 连接adventure_ods库
    adventure_ods = create_engine('mysql://root:password@localhost:3306/adventure_ods?charset=gbk')
    # 连接adventure_dw库
    adventure_dw = create_engine('mysql://root:password@localhost:3306/adventure_dw?charset=gbk')
    

    step1:读取ods_sales_orders(订单明细表),生成sum_amount_order(销量订单聚合表),求总销售金额与客单价

    # 1、读取ods_sales_orders(订单明细表)
    ods_sales_orders = pd.read_sql_query("select * from ods_sales_orders ", con=adventure_ods)
    # 2、根据create_date分组求总销量及客户数量
    sum_amount_order = ods_sales_orders.groupby(by='create_date').agg({'unit_price': sum, 'customer_key': pd.Series.nunique}).reset_index()
    # 3、修改对应列名(unit_price→sum_amount', 'customer_key'→ 'sum_order)
    sum_amount_order.rename(columns = {'unit_price': 'sum_amount', 'customer_key': 'sum_order'},inplace = True)
    # 4、新增amount_div_order客单价列
    sum_amount_order['amount_div_order'] = sum_amount_order['sum_amount'] / sum_amount_order['sum_order']
    

    step2:利用空列表及循环生成对应随机值,与销量订单聚合表合并形成sum_amount_order_goal(销量订单聚合目标表)

    sum_amount_goal_list = []
    sum_order_goal_list = []
    # 转为list类型,遍历每个日期
    create_date_list = list(sum_amount_order['create_date'])
    for i in create_date_list:
        # 1.生成一个在[0.85,1.1]随机数
        a = random.uniform(0.85,1.1)
        b = random.uniform(0.85,1.1)
        
        # 2.对应日期下生成总金额(sum_amount)*a的列
        amount_goal = list(sum_amount_order[sum_amount_order['create_date'] == i]['sum_amount'])[0] * a
    
        # 3.对应日期下生成总订单数(sum_order)*a的列
        order_goal = list(sum_amount_order[sum_amount_order['create_date'] == i]['sum_order'])[0] * b
        
        # 4.将生成的目标值加入空列表
        sum_amount_goal_list.append(amount_goal)
        sum_order_goal_list.append(order_goal)
        
    # 5.合并sum_amount_order表与刚生成的目标值列,形成sum_amount_order_goal
    sum_amount_order_goal = pd.concat([sum_amount_order, pd.DataFrame({'sum_amount_goal': sum_amount_goal_list, 'sum_order_goal': sum_order_goal_list})], axis=1)
    

    step3:读取dim_date_df日期维度表

    dim_date_df = """ 
            select create_date,
                    is_current_year,
                    is_last_year,
                    is_yesterday,
                    is_today,
                    is_current_month,
                    is_current_quarter
                    from dim_date_df"""
    dim_date_df = pd.read_sql_query(dim_date_df, con=adventure_ods)
    

    step4:进行数据的融合,生成dw_order_by_day表

    # 1.转化create_date格式为标准日期格式
    sum_amount_order_goal['create_date'] = sum_amount_order_goal['create_date'].apply(lambda x: x.strftime('%Y-%m-%d'))
    # 2.通过主键create_date连接日期维度,输出:dw_order_by_day每日环比表
    dw_order_by_day = pd.merge(sum_amount_order_goal, dim_date_df, on='create_date', how='inner')
    # 3.在sum_amount列基础上,增添环比数据列amount_diff
    dw_order_by_day['amount_diff'] = dw_order_by_day['sum_amount'].pct_change()
    dw_order_by_day['amount_diff'].fillna(0,inplace=True)
    

    step5:删除旧的dw_order_by_day(每日环比表),存储新的dw_order_by_day

    # 因为删除插入更新操作没有返回值,程序会抛出ResourceClosedError,并终止程序。使用try捕捉此异常。
    try:
        # 作业代码:删除旧表数据
        pd.read_sql_query("Truncate table dw_order_by_day", con=adventure_dw)
        
    except Exception as e:
        print("删除旧的dw_order_by_day表,error:{}".format(e))
    # 替换原先的dw_order_by_day
    dw_order_by_day.to_sql('dw_order_by_day', con=adventure_dw, if_exists='replace', index=False)
    

    2.2 生成dw_customer_order表

    导入相关模块,连接数据库

    import pandas as pd
    import pymysql
    from sqlalchemy import create_engine
    pymysql.install_as_MySQLdb()
    import datetime
    import warnings
    warnings.filterwarnings("ignore")
    # 连接adventure_ods库
    adventure_ods = create_engine('mysql://root:password@localhost:3306/adventure_ods?charset=gbk')
    # 连接adventure_dw库
    adventure_dw = create_engine('mysql://root:password@localhost:3306/adventure_dw?charset=gbk')
    

    step1:读取最新日期的ods_sales_orders(订单明细表)

    ods_sales_orders = \
            '''select sales_order_key,
                      create_date,
                      customer_key,
                      english_product_name,
                      cpzl_zw,
                      cplb_zw,
                      unit_price  
                from ods_sales_orders where create_date =(
                select create_date
                from dim_date_df
                order by create_date desc
                limit 1) '''
    ods_sales_orders = pd.read_sql_query(ods_sales_orders, con=adventure_ods)
    ods_sales_orders.info()
    

    step2:读取每日新增用户表ods_customer

    ods_customer = ''' 
           select customer_key,
                   chinese_territory,
                   chinese_province,
                   chinese_city
                   from ods_customer'''
    ods_customer = pd.read_sql_query(ods_customer, con=adventure_ods)
    ods_customer.info()
    

    step3:读取日期维度表dim_date_df

    dim_date_df = """ 
           select create_date,
                   is_current_year,
                   is_last_year,
                   is_yesterday,
                   is_today,
                   is_current_month,
                   is_current_quarter
                   from dim_date_df"""
    dim_date_df = pd.read_sql_query(dim_date_df, con=adventure_ods)
    dim_date_df.info()
    

    step4:进行数据的聚合

    # 将customer_key由int类型转为str类型
    ods_customer['customer_key'] = ods_customer['customer_key'].map(lambda x:str(x))
    # 通过客户id连接表
    sales_customer_order = pd.merge(ods_sales_orders, ods_customer, left_on='customer_key', right_on='customer_key', how='left')
    # 1.提取订单主键/订单日期/客户编号/产品名/产品子类/产品类别/产品单价/所在区域/所在省份/所在城市
    sales_customer_order = sales_customer_order[['sales_order_key', 'create_date', 'customer_key', 'english_product_name',\
                                                 'cpzl_zw','cplb_zw','unit_price', 'chinese_territory', 'chinese_province', 'chinese_city']]
    # 2.形成按照 订单日期/产品名/产品子类/产品类别/所在区域/所在省份/所在城市的逐级聚合表,获得订单总量/客户总量/销售总金额
    sum_customer_order = sales_customer_order.groupby(['create_date','english_product_name','cpzl_zw','cplb_zw','chinese_territory',\
                                                     'chinese_province', 'chinese_city'],as_index=False).\
                                                    agg({'sales_order_key':pd.Series.nunique,'customer_key':pd.Series.nunique,'unit_price':sum})
    sum_customer_order.rename(columns = {'sales_order_key':'order_num','customer_key':'customer_num',\
                                         'unit_price':'sum_amount','english_product_name':'product_name'},inplace=True)
    # 3.转化订单日期为字符型格式
    sum_customer_order['create_date'] = sum_customer_order['create_date'].apply(lambda x: x.strftime('%Y-%m-%d'))
    # 4.获取当日日期维度
    dw_customer_order = pd.merge(sum_customer_order, dim_date_df, on='create_date', how='inner')
    

    step5:进行数据的存储

    dw_customer_order.to_sql('dw_customer_order', con=adventure_dw,if_exists='replace', index=False)
    

    2.3 生成dw_amount_diff表

    导入相关模块,连接数据库

    import pandas as pd
    import pymysql
    from sqlalchemy import create_engine
    pymysql.install_as_MySQLdb()
    import datetime
    import warnings
    warnings.filterwarnings("ignore")
    # 连接adventure_dw库
    adventure_dw = create_engine('mysql://root:password@localhost:3306/adventure_dw?charset=gbk')
    

    step1:读取dw_order_by_day表

    dw_order_by_day = pd.read_sql_query("select * from dw_order_by_day", con=adventure_dw)
    dw_order_by_day['create_date'] = pd.to_datetime(dw_order_by_day['create_date'])
    

    step2:求取各阶段的总金额

    """当天"""
    # 当天的总金额
    today_amount = dw_order_by_day[dw_order_by_day['is_today'] == 1]['sum_amount'].sum()
    # 去年同期的日期维度
    before_year_today = list(dw_order_by_day[dw_order_by_day['is_today'] == 1]['create_date'] + datetime.timedelta(days=-365))
    # 去年同期总金额
    before_year_today_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_today)]['sum_amount'].sum()
    
    """昨天"""
    # 昨天的总金额
    yesterday_amount = dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]['sum_amount'].sum()
    # 去年同期的日期维度
    before_year_yesterday = list(dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]['create_date'] + datetime.timedelta(days=-365))
    # 去年同期总金额
    before_year_yesterday_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_yesterday)]['sum_amount'].sum()
    
    """当前月份"""
    # 当月的总金额
    month_amount = dw_order_by_day[dw_order_by_day['is_current_month'] == 1]['sum_amount'].sum()
    # 去年同期的日期维度
    before_year_month = list(dw_order_by_day[dw_order_by_day['is_current_month'] == 1]['create_date'] + datetime.timedelta(days=-365))
    # 去年同期总金额
    before_year_month_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_month)]['sum_amount'].sum()
    
    """当前季度"""
    # 当季的总金额
    quarter_amount = dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]['sum_amount'].sum()
    # 去年同期的日期维度
    before_year_quarter = list(dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]['create_date'] + datetime.timedelta(days=-365))
    # 去年同期总金额
    before_year_quarter_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_quarter)]['sum_amount'].sum()
    
    """当前年份"""
    # 当年的总金额
    year_amount = dw_order_by_day[dw_order_by_day['is_current_year'] == 1]['sum_amount'].sum()
    # 去年同期的日期维度
    before_year_year = list(dw_order_by_day[dw_order_by_day['is_current_year'] == 1]['create_date'] + datetime.timedelta(days=-365))
    # 去年同期总金额
    before_year_year_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_year)]['sum_amount'].sum()
    

    step3:求取各阶段的总客户数

    """当天"""
    # 当天的总客户数
    today_order = dw_order_by_day[dw_order_by_day['is_today'] == 1]['sum_order'].sum()
    # 去年同期总客户数
    before_year_today_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_today)]['sum_order'].sum()
    
    """昨天"""
    # 昨天的总客户数
    yesterday_order = dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]['sum_order'].sum()
    # 去年同期总客户数
    before_year_yesterday_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_yesterday)]['sum_order'].sum()
    
    """当前月份"""
    # 当月的总客户数
    month_order = dw_order_by_day[dw_order_by_day['is_current_month'] == 1]['sum_order'].sum()
    # 去年同期总客户数
    before_year_month_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_month)]['sum_order'].sum()
    
    """当前季度"""
    # 当季的总客户数
    quarter_order = dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]['sum_order'].sum()
    # 去年同期总客户数
    before_year_quarter_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_quarter)]['sum_order'].sum()
    
    """当前年份"""
    # 当年的总客户数
    year_order = dw_order_by_day[dw_order_by_day['is_current_year'] == 1]['sum_order'].sum()
    # 去年同期总客户数
    before_year_year_order = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_year)]['sum_order'].sum()
    

    step4:求取各阶段的总金额、订单数的同期对比数据

    amount_dict = {'today_diff': [today_amount / before_year_today_amount - 1,
                                  today_order / before_year_today_order - 1,
                                  (today_amount / today_order) / (before_year_today_amount /before_year_today_order) - 1],
                   'yesterday_diff': [yesterday_amount / before_year_yesterday_amount - 1,
                                      yesterday_order / before_year_yesterday_order - 1,
                                      (yesterday_amount / yesterday_order) / (before_year_yesterday_amount / before_year_yesterday_order) - 1],
                   'month_diff': [month_amount / before_year_month_amount - 1,
                                  month_order / before_year_month_order - 1,
                                  (month_amount / month_order) / (before_year_month_amount / before_year_month_order) - 1],
                   'quarter_diff': [quarter_amount / before_year_quarter_amount - 1,
                                    quarter_order / before_year_quarter_order - 1,
                                    (quarter_amount / quarter_order) / (before_year_quarter_amount / before_year_quarter_order) - 1],
                   'year_diff': [year_amount / before_year_year_amount - 1,
                                 year_order / before_year_year_order - 1,
                                 (year_amount / year_order) / (before_year_year_amount / before_year_year_order) - 1],
                   'flag': ['amount', 'order', 'avg']}
    
    dw_amount_diff = pd.DataFrame(amount_dict)
    

    step5:删除旧的dw_amount_diff表,存储新的dw_amount_diff表

    # 因为删除插入更新操作没有返回值,程序会抛出ResourceClosedError,并终止程序。使用try捕捉此异常。
    try:
        pd.read_sql_query("Truncate table dw_amount_diff", con=adventure_dw)
    except Exception as e:
        print("删除旧的dw_amount_diff表,error:{}".format(e))
    # 存储新的dw_amount_diff表
    dw_amount_diff.to_sql('dw_amount_diff', con=adventure_dw, if_exists='replace', index=False)
    

    3. 搭建Dashboard

    3.1 可视化看板布置

    1. 可视化工具:这里用到的可视化工具有切片器、KPI、卡片图、表、环形图、地图等。
    2. 筛选器:这里用于日期、区域等字段的筛选。
    3. 书签窗格:这里将按钮和书签结合使用,用于制作导航栏和动态图表。

    3.2 报表展示

    1、主页

    • 展示今日销售金额和目标完成情况、客户量和客单价,以及同比情况
    • 产品今日详细销售情况
    • 环形图直观表示各个品类订单销量的百分比
    • 通过地区和城市切片,可以动态展示不同地区和城市的上述内容
    • 通过地图动态展示各个城市销售订单量
    主页

    2、趋势详情页

    • 通过日期筛选器筛选出最近一段时间的销售趋势
    • 动态展示最近14天销售金额和目标销售金额的趋势
    • 动态展示最近14天顾客量和目标顾客量的趋势
    • 动态展示最近14天客单价的趋势
    趋势详情页

    3、地区详情页

    • 对区域和省份进行切片
    • 通过切片展示不同地区订单详细情况
    地区详情页

    三、报告输出

    1. 指标搭建

    1.1 目的

    • 根据需求,需要输出2019.11月自行车销售情况,为精细化运营提供数据支持,能精准的定位目标客户群体;
    • 制定销售策略,调整产品结构,才能保持高速增长,获取更多的收益,占领更多市场份额;
    • 报告通过对整个公司的自行车销量持续监测和分析,掌握公司自行车销售状况、走势的变化,为客户制订、调整和检查销售策略,完善产品结构提供依据。

    1.2 数据来源

    • dw_customer_order:产品销售信息事实表
    • ods_customer:每天新增客户信息表
    • dim_date_df:日期表
    • ods_sales_orders:订单明细表

    1.3 指标制定

    • 从整体的角度:分析2019.1—2019.11自行车整体销售表现
    • 从地域的角度:分析11月每个区域销售量表现、11月TOP10城市销售量表现
    • 从产品的角度:分析11月类别产品销售量表现、11月细分产品销售量表现
    • 热销产品:分析11月TOP10产品销量榜、11月TOP10销量增速榜
    • 从用户的角度:分析11月用户年龄分布及每个年龄段产品购买喜好、11月男女用户比例及产品购买喜好

    2. python加工数据

    代码如下:

    #导入模块
    import pandas as pd
    import numpy as np
    import pymysql
    pymysql.install_as_MySQLdb()
    from sqlalchemy import create_engine
    

    part1:自行车整体销售表现

    '''1. 从数据库读取源数据:dw_customer_order'''
    # 读取dw_customer_order
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_ods')
    gather_customer_order=pd.read_sql_query("select * from dw_customer_order",con = engine)
    # 查看源数据前5行,观察数据,判断数据是否正常识别
    gather_customer_order.head()
    # 查看源数据类型
    gather_customer_order.info()
    # 利用create_date字段增加create_year_month月份字段
    gather_customer_order['create_year_month'] = gather_customer_order["create_date"].apply(lambda x: x.strftime('%Y-%m'))
    # 筛选产品类型cplb_zw中的自行车作为新的gather_customer_order
    gather_customer_order = gather_customer_order[gather_customer_order['cplb_zw']=="自行车"]
    
    '''2. 自行车整体销售量表现'''
    # 取消科学计数法
    pd.set_option('float_format', lambda x: '%.6f' % x)
    #每月订单数量和销售金额,用groupby创建一个新的对象,需要将order_num、sum_amount求和,按日期降序排序,方便计算环比
    overall_sales_performance = gather_customer_order.groupby("create_year_month").agg({"order_num":"sum","sum_amount":"sum"}).\
                                                        sort_values(by = "create_year_month",ascending = False).reset_index()
    # 新增一列order_num_diff,此为每月自行车销售订单量环比,本月与上月对比,例如本期2019-02月销售额与上一期2019-01月销售额做对比
    order_num_diff = list(-(overall_sales_performance.order_num.diff()/overall_sales_performance.order_num))
     # 删除列表中第一个元素
    order_num_diff.pop(0)
    # 将0新增到列表末尾
    order_num_diff.append(0) 
    # 将环比转化为DataFrame
    overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame(order_num_diff)],axis=1)
    # 新增一列sum_amount_diff,此为每月自行车销售金额环比,最后形成按照日期升序排列
    sum_amount_diff = list(-(overall_sales_performance.sum_amount.diff()/overall_sales_performance.sum_amount))
    sum_amount_diff.pop(0) 
    sum_amount_diff.append(0) 
    sum_amount_diff
    # 将环比转化为DataFrame
    overall_sales_performance = pd.concat([overall_sales_performance,pd.DataFrame(sum_amount_diff)],axis = 1)
    # 销量环比字段名order_diff,销售金额环比字段名sum_amount_diff
    overall_sales_performance.columns = ['create_year_month', 'order_num', 'sum_amount', 'order_diff','sum_amount_diff']
    # 按照日期排序,升序
    overall_sales_performance = overall_sales_performance.sort_values(by = "create_year_month",ascending = True)
    
    '''3. 将最终的overall_sales_performance的DataFrame存入数据库中'''
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
    overall_sales_performance.to_sql('pt_overall_sale_performance_1_cc',con = engine,if_exists='replace')
    
    
    overall_sales_performance

    part2:2019年11月自行车地域销售表现

    '''1. 源数据dw_customer_order,数据清洗筛选10月11月数据'''
    # 筛选10、11月的自行车数据,赋值变量为gather_customer_order_10_11
    gather_customer_order_10_11 = gather_customer_order[gather_customer_order["create_year_month"].isin(['2019-10','2019-11'])]
    # 查看10月、11月的自行车订单数量
    len(gather_customer_order_10_11)
    
    '''2. 2019年11月自行车区域销售量表现'''
    #按照区域、月分组,订单量求和,销售金额求和
    gather_customer_order_10_11_group = gather_customer_order_10_11.groupby(["chinese_territory","create_year_month"]).\
                                                agg({"order_num":"sum","sum_amount":"sum"}).reset_index()
    #将区域存为列表
    region_list = list(gather_customer_order["chinese_territory"].unique())
    # pct_change()当前元素与先前元素的相差百分比,求不同区域10月11月环比
    order_x = pd.Series([])
    amount_x = pd.Series([])
    for i in region_list:
        a = gather_customer_order_10_11_group.loc[gather_customer_order_10_11_group['chinese_territory']==i]['order_num'].pct_change()
        b = gather_customer_order_10_11_group.loc[gather_customer_order_10_11_group['chinese_territory']==i]['sum_amount'].pct_change()
        order_x = order_x.append(a)
        amount_x = amount_x.append(b)
    gather_customer_order_10_11_group['order_diff'] = order_x
    gather_customer_order_10_11_group['amount_diff'] = amount_x
    # 填充NaN值
    gather_customer_order_10_11_group.fillna(value = 0,inplace = True)
    # 将数据存入数据库
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
    gather_customer_order_10_11_group.to_sql('pt_bicy_november_territory_2_cc',con = engine,if_exists='replace')
    
    gather_customer_order_10_11_group
    '''3. 2019年11月自行车销售量TOP10城市环比'''
    #筛选11月自行车交易数据
    gather_customer_order_11 = gather_customer_order[(gather_customer_order["create_year_month"]=='2019-11')]
    # 将gather_customer_order_11按照chinese_city城市分组,求和销售数量order_num,最终查看11月自行车销售数量前十城市,赋予变量gather_customer_order_city_head
    gather_customer_order_city_11 = gather_customer_order_11.groupby("chinese_city").agg({"order_num":"sum"}).reset_index().\
                                                        sort_values(by = "order_num",ascending = False)
    #11月自行车销售数量前十城市
    gather_customer_order_city_head = gather_customer_order_city_11.head(10)
    #筛选销售前十城市的10月11月自行车销售数据
    gather_customer_order_10_11_head = gather_customer_order_10_11[gather_customer_order_10_11["chinese_city"].isin(list(gather_customer_order_city_head.chinese_city))]
    #分组计算前十城市,自行车销售数量销售金额
    gather_customer_order_city_10_11 = gather_customer_order_10_11_head.groupby(["chinese_city","create_year_month"]).\
                                                agg({"order_num":"sum","sum_amount":"sum"}).reset_index()
    #计算前十城市环比
    city_top_list = gather_customer_order_city_head["chinese_city"]
    order_top_x = pd.Series([])
    amount_top_x = pd.Series([])
    for i in city_top_list:
        a=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11['chinese_city']==i]['order_num'].pct_change()
        b=gather_customer_order_city_10_11.loc[gather_customer_order_city_10_11['chinese_city']==i]['sum_amount'].pct_change()
        order_top_x = order_top_x.append(a)
        amount_top_x = amount_top_x.append(b)
    #order_diff销售数量环比,amount_diff销售金额环比
    gather_customer_order_city_10_11['order_diff']=order_top_x
    gather_customer_order_city_10_11['amount_diff']=amount_top_x
    # 填充NaN值
    gather_customer_order_city_10_11.fillna(value = 0,inplace = True)
    # 11月销售数量前十城市的销售数量、销售金额环比。存入数据库
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
    gather_customer_order_city_10_11.to_sql('pt_bicy_november_october_city_3_cc',con = engine,if_exists='replace')
    
    gather_customer_order_city_10_11

    part3:2019年11月自行车产品销售表现

    # 细分市场销量表现
    '''1. 数据源 dw_customer_order'''
    #查看数据源
    gather_customer_order.head(5)
    
    '''2.细分市场销量表现'''
    #求每个月自行车累计销售数量
    gather_customer_order_group_month = gather_customer_order.groupby("create_year_month").agg({"order_num":"sum"}).reset_index()
    #合并自行车销售信息表+自行车每月累计销售数量表
    order_num_proportion = pd.merge(gather_customer_order,gather_customer_order_group_month,how = "left",on = "create_year_month")
    #计算自行车销量/自行车每月销量占比
    order_num_proportion['order_proportion'] = order_num_proportion['order_num_x']/order_num_proportion['order_num_y']
    #重命名sum_month_order:自行车每月销售量
    order_num_proportion = order_num_proportion.rename(columns = {"order_num_x":"order_num","order_num_y":"sum_month_order"})
    #将每月自行车销售信息存入数据库
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
    order_num_proportion.to_sql('pt_bicycle_product_sales_month_4_cc',con = engine,if_exists = 'replace')
    
    order_num_proportion
    '''3. 公路/山地/旅游自行车细分市场表现'''
    #查看自行车有那些产品子类
    gather_customer_order['cpzl_zw'].unique()
    '''3.1 公路自行车细分市场销量表现'''
    gather_customer_order_road = gather_customer_order[gather_customer_order['cpzl_zw'] == '公路自行车']
    #求公路自行车不同型号产品销售数量
    gather_customer_order_road_month = gather_customer_order_road.groupby(by = ['create_year_month','product_name']).agg({"order_num":"sum"}).reset_index()
    gather_customer_order_road_month['cpzl_zw'] = '公路自行车'
    #每个月公路自行车累计销售数量
    gather_customer_order_road_month_sum = gather_customer_order_road_month.groupby("create_year_month").agg({"order_num":"sum"}).reset_index()
    #合并公路自行车gather_customer_order_road_month与每月累计销售数量
    #用于计算不同型号产品的占比
    gather_customer_order_road_month = pd.merge(gather_customer_order_road_month,gather_customer_order_road_month_sum,how='left',on='create_year_month')
    '''3.2 山地自行车细分市场销量表现'''
    # 筛选
    gather_customer_order_Mountain = gather_customer_order[gather_customer_order['cpzl_zw'] == '山地自行车']
    #求山地自行车不同型号产品销售数量
    gather_customer_order_Mountain_month = gather_customer_order_Mountain.groupby(by = ['create_year_month','product_name']).agg({"order_num":"sum"}).reset_index()
    gather_customer_order_Mountain_month['cpzl_zw'] = '山地自行车'
    #每个月山地自行车累计销售数量
    gather_customer_order_Mountain_month_sum = gather_customer_order_Mountain_month.groupby("create_year_month").agg({"order_num":"sum"}).reset_index()
    #合并山地自行车hz_customer_order_Mountain_month与每月累计销售数量
    #用于计算不同型号产品的占比
    gather_customer_order_Mountain_month = pd.merge(gather_customer_order_Mountain_month,gather_customer_order_Mountain_month_sum,how='left',on='create_year_month')
    
    '''3.3 旅游自行车细分市场销量表现'''
    gather_customer_order_tour = gather_customer_order[gather_customer_order['cpzl_zw'] == '旅游自行车']
    #求旅游自行车不同型号产品销售数量
    gather_customer_order_tour_month = gather_customer_order_tour.groupby(by = ['create_year_month','product_name']).agg({"order_num":"sum"}).reset_index()
    gather_customer_order_tour_month['cpzl_zw'] = '旅游自行车'
    gather_customer_order_tour_month_sum = gather_customer_order_tour_month.groupby("create_year_month").agg({"order_num":"sum"}).reset_index()
    gather_customer_order_tour_month = pd.merge(gather_customer_order_tour_month,gather_customer_order_tour_month_sum,how='left',on='create_year_month')
    
    '''3.4 将山地自行车、旅游自行车、公路自行车每月销量信息合并'''
    gather_customer_order_month = pd.concat([gather_customer_order_road_month,gather_customer_order_Mountain_month,gather_customer_order_tour_month])
    gather_customer_order_month
    #各类自行车,销售量占每月自行车总销售量比率
    gather_customer_order_month['order_num_proportio'] = gather_customer_order_month['order_num_x']/gather_customer_order_month['order_num_y']
    # 重命名:order_month_product当月产品累计销量,sum_order_month当月自行车总销量
    gather_customer_order_month.rename(columns = {'order_num_x':'order_month_product','order_num_y':'sum_order_month'})
    #将数据存入数据库
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
    gather_customer_order_month.to_sql('pt_bicycle_product_sales_order_month_4_cc',con = engine,if_exists = 'replace')
    
    gather_customer_order_month
    '''4. 计算2019年11月自行车环比'''
    #计算11月环比,先筛选10月11月数据
    gather_customer_order_month_10_11 = gather_customer_order_month[gather_customer_order_month.create_year_month.isin(['2019-10','2019-11'])]
    #排序。将10月11月自行车销售信息排序
    gather_customer_order_month_10_11 = gather_customer_order_month_10_11.sort_values(by = ['product_name','create_year_month'])
    product_name  = list(gather_customer_order_month_10_11.product_name.drop_duplicates())
    
    '''5. 计算自行车销售数量环比'''
    order_top_x = pd.Series([])
    for i in product_name:
        a = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name']==i]["order_num_x"].pct_change()
        order_top_x = order_top_x.append(a)
    gather_customer_order_month_10_11['order_num_diff'] = order_top_x
    gather_customer_order_month_10_11.fillna(value = 0,inplace = True)
    gather_customer_order_month_10_11.rename(columns = {"order_num_x":"order_month_product","order_num_y":"sum_order_month"},inplace = True)
    #筛选出11月自行车数据
    gather_customer_order_month_11 = gather_customer_order_month_10_11[gather_customer_order_month_10_11['create_year_month'] == '2019-11']
    
    '''6. 计算2019年1月至11月产品累计销量'''
    #筛选2019年1月至11月自行车数据,这里使用的是‘~’取对立面
    gather_customer_order_month_1_11 = gather_customer_order_month[~gather_customer_order_month['create_year_month'].\
                                    isin(['2019-12','2020-01','2020-02','2020-03','2020-04','2020-05','2020-06','2020-07','2020-08','2020-09','2020-10','2020-11'])]
    gather_customer_order_month_1_11.rename(columns = {"order_num_x":"order_month_product","order_num_y":"sum_order_month"},inplace = True)
    #计算2019年1月至11月自行车累计销量
    gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11.groupby(by = 'product_name').agg({'order_month_product':'sum'}).reset_index()
    #重命名sum_order_1_11:1-11月产品累计销量
    gather_customer_order_month_1_11_sum = gather_customer_order_month_1_11_sum.rename(columns = {'order_month_product':'sum_order_1_11'})
    
    '''7. 2019年11月自行车产品销量、环比、累计销量'''
    #按相同字段product_name产品名,合并两张表
    gather_customer_order_month_11 = pd.merge(gather_customer_order_month_11,gather_customer_order_month_1_11_sum,how='left',on='product_name')
    
    ''' 8. 存入数据库'''
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
    gather_customer_order_month_11.to_sql('pt_bicycle_product_sales_order_month_11_cc',con = engine, if_exists = 'replace')
    
    
    gather_customer_order_month_11

    part4:用户行为分析

    # 读取数据库客户信息表
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_ods?charset=gbk')
    df_customer = pd.read_sql_query("select customer_key,birth_date,gender,marital_status from ods_customer where create_date < '2019-12-1'",con = engine)
    # 查看数据类型
    df_customer.info()
    # 将customer_key由int类型转为str类型
    df_customer['customer_key'] = df_customer.customer_key.map(lambda x:str(x))
    
    # 读取数据库销售订单表
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_ods?charset=gbk')
    df_sales_orders_11 = pd.read_sql_query("select *  from ods_sales_orders where create_date>='2019-11-1' and   create_date<'2019-12-1'",con = engine)
    
    # pd.merge,how = 'inner join'
    sales_customer_order_11 = pd.merge(df_sales_orders_11,df_customer,on='customer_key',how='inner')
    
    # 根据sales_customer_order_11['birth_date'],获取客人的年份作为新的一列,以字符串类型存储
    birth_year = sales_customer_order_11.birth_date.astype(str).apply(lambda x: x.split('-')[0]).rename('birth_year')
    sales_customer_order_11 = pd.concat([sales_customer_order_11,birth_year],axis = 1)
    
    '''1. 用户年龄分析'''
    #修改出生年为int数据类型
    sales_customer_order_11['birth_year'] = sales_customer_order_11.birth_year.astype(int)
    # 计算用户年龄
    sales_customer_order_11['customer_age'] = 2019 - sales_customer_order_11['birth_year']
    # 利用customer_age字段,进行年龄分层,
    #年龄分层1
    listBins = [30,35,40,45,50,55,60,65]
    listLabels = ['31-35','36-40','41-45','46-50','51-55','56-60','61-65']
    #新增'age_level'分层区间列
    sales_customer_order_11['age_level'] = pd.cut(sales_customer_order_11['customer_age'],bins=listBins, labels=listLabels, include_lowest=False)
    #筛选销售订单为自行车的订单信息
    df_customer_order_bycle = sales_customer_order_11.loc[sales_customer_order_11['cplb_zw'] == '自行车']
    # 计算年龄比率
    df_customer_order_bycle['age_level_rate'] = 1/df_customer_order_bycle['age_level'].count()
    # 年龄分层2
    # 将年龄分为3个层次,分别为'<=29'、'30-39'、'>=40'
    listBins = [0,29,39,65]
    listLabels = ['<=29','30-39','>=40']
    df_customer_order_bycle['age_level2'] = pd.cut(sales_customer_order_11['customer_age'],bins=listBins, labels=listLabels, include_lowest=False)
    # 求每个年龄段人数
    age_level2_count = df_customer_order_bycle.groupby(by = 'age_level2').sales_order_key.count().reset_index()
    
    '''2. 用户性别'''
    gender_count = df_customer_order_bycle.groupby(by = 'gender').cplb_zw.count().reset_index()
    df_customer_order_bycle = pd.merge(df_customer_order_bycle,age_level2_count,on = 'age_level2').rename(columns = {'sales_order_key_y':'age_level2_count'})
    df_customer_order_bycle['age_level2_rate'] = 1/df_customer_order_bycle['age_level2_count']
    df_customer_order_bycle = pd.merge(df_customer_order_bycle,gender_count,on = 'gender').rename(columns = {'cplb_zw_y':'gender_count'})
    df_customer_order_bycle['gender_rate'] = 1/df_customer_order_bycle['gender_count']
    
    #df_customer_order_bycle 将11月自行车用户存入数据库
    #存入数据库
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
    df_customer_order_bycle.to_sql('pt_user_behavior_november_cc',con = engine,if_exists='replace', index=False)
    
    
    df_customer_order_bycle

    part5:2019年11月热品销售分析

    '''1. 11月产品销量TOP10产品,销售数量及环比'''
    # 筛选11月数据
    gather_customer_order_11 = gather_customer_order.loc[gather_customer_order['create_year_month'] == '2019-11']
    # 按照销量降序,取出TOP10产品
    customer_order_11_top10 = gather_customer_order_11.groupby(by = 'product_name').order_num.count().reset_index().sort_values(by = 'order_num',ascending = False).head(10)
    # TOP10销量产品信息
    list(customer_order_11_top10['product_name'])
    
    # 计算TOP10销量及环比
    # 查看11月环比数据
    gather_customer_order_month_10_11.head()
    customer_order_month_10_11 = gather_customer_order_month_10_11[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
    customer_order_month_10_11 = customer_order_month_10_11[customer_order_month_10_11['product_name'].isin(list(customer_order_11_top10['product_name']))]
    customer_order_month_10_11['category'] = '本月TOP10销量'
    
    '''2. 11月增速TOP10产品,销售数量及环比'''
    customer_order_month_11 = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['create_year_month'] == '2019-11'].sort_values(by = 'order_num_diff',ascending = False).head(10)
    customer_order_month_11_top10_seep = gather_customer_order_month_10_11.loc[gather_customer_order_month_10_11['product_name'].isin(list(customer_order_month_11['product_name']))]
    customer_order_month_11_top10_seep = customer_order_month_11_top10_seep[['create_year_month','product_name','order_month_product','cpzl_zw','order_num_diff']]
    customer_order_month_11_top10_seep['category'] = '本月TOP10增速'
    
    # 合并TOP10销量表customer_order_month_10_11,TOP10增速customer_order_month_11_top10_seep
    #axis = 0按照行维度合并,axis = 1按照列维度合并
    hot_products_11 = pd.concat([customer_order_month_10_11,customer_order_month_11_top10_seep],axis = 0)
    #存入数据库
    engine = create_engine('mysql+pymysql://root:password@localhost:3306/adventure_dw')
    hot_products_11.to_sql('pt_hot_products_november_cc',con = engine,if_exists='replace', index=False)
    
    hot_products_11

    3. PPT汇报输出

    通过上述数据加工,将存入mysql的数据接入powerbi与Excel,分别绘制图像与表格,通过以下5个方面,完成线上自行车业务分析报告:

    • 整体销售表现
    • 地域销售表现
    • 产品销售表现
    • 热品销售表现
    • 用户行为分析
    幻灯片1.PNG 幻灯片2.PNG 幻灯片3.PNG 幻灯片4.PNG 幻灯片5.PNG 幻灯片6.PNG 幻灯片7.PNG 幻灯片8.PNG 幻灯片9.PNG 幻灯片10.PNG 幻灯片11.PNG 幻灯片12.PNG 幻灯片13.PNG 幻灯片14.PNG 幻灯片15.PNG 幻灯片16.PNG 幻灯片17.PNG 幻灯片18.PNG 幻灯片19.PNG 幻灯片20.PNG 幻灯片21.PNG 幻灯片22.PNG 幻灯片23.PNG 幻灯片24.PNG

    相关文章

      网友评论

          本文标题:Adventure项目

          本文链接:https://www.haomeiwen.com/subject/asrjvktx.html