美文网首页数据分析
python实战——UCI Data Cohort Analys

python实战——UCI Data Cohort Analys

作者: 许志辉Albert | 来源:发表于2019-10-18 23:24 被阅读0次

    数据来自 https://archive.ics.uci.edu/ml/datasets/online+retail#记录了01/12/2010至 09/12/2011 期间某个英国电商网站所有用户真实的交易数据,共约50万行

    1.加载模块

    import pandas as pd
    import numpy as np
    import datetime as dt
    import matplotlib.pyplot as plt
    import matplotlib as mpl
    

    2.加载数据文件

    df = pd.read_csv('uci_csv')
    df.head(10)
    

    输出结果:

    UCI数据前10行

    3.定义日期相关函数

    def get_month(x):
        date_time = pd.Timestamp(x)
        return dt.datetime(date_time.year,date.month,1)
    
    def month_differ(x,y):
        date_time_x = pd.Timestamp(x)
        date_time_y = pd.Timestamp(y)
        month_differ = (date_time_x.year - date_time_y.year) *12 +(date_time_x.month - date_time_y.month)
        return month_differ
    

    4.获得每个用户最早购买的月份CohortMonth

    df['OrderMonth'] = df['InvoiceDate'].apply(get_month)  
    df['cohortMonth'] = df.groupby("CustomerID")["OrderMonth"].transform(np.min)
    df.head()
    

    输出结果:

    5.获得每个订单是用户在第几个月购买的

    df["CohortIndex"] = df.apply(lambda x:month_differ(x,OrderMonth,x.CohortMonth),axis = 1)
    df.head()
    

    输出结果:

    6.Group BY 统计每个Cohort Group 第0个月到第n个月的用户数

    cohort_data = df.groupby(['CohortMonth','CohortIndex'])['CustomerID'].agg('nunique').reset_index()
    cohort_data.head()
    

    输出结果:

    7.Pivot

    cohort_pivot = cohort_data.pivot_table(index = 'CohortIndex',columns = 'CohortMonth',values = 'CustomerID')
    cohort_pivot.columns = cohort_pivot.columns.date
    cohort_pivot.fillna(' ')
    

    输出结果:

    8.计算百分比

    cohort_base = cohort_pivot.iloc[0,:]
    retention = cohort_pivot.divide(cohort_base,axis=1)
    retention.fillna(' ')
    retention
    

    输出结果:

    9.绘制图形

    retention.iloc[:,:5].plt()
    plt.show()
    

    输出结果:

    相关文章

      网友评论

        本文标题:python实战——UCI Data Cohort Analys

        本文链接:https://www.haomeiwen.com/subject/enawmctx.html