数据来自 https://archive.ics.uci.edu/ml/datasets/online+retail#记录了01/12/2010至 09/12/2011 期间某个英国电商网站所有用户真实的交易数据,共约50万行
1.加载模块
import pandas as pd
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib as mpl
2.加载数据文件
df = pd.read_csv('uci_csv')
df.head(10)
输出结果:
UCI数据前10行3.定义日期相关函数
def get_month(x):
date_time = pd.Timestamp(x)
return dt.datetime(date_time.year,date.month,1)
def month_differ(x,y):
date_time_x = pd.Timestamp(x)
date_time_y = pd.Timestamp(y)
month_differ = (date_time_x.year - date_time_y.year) *12 +(date_time_x.month - date_time_y.month)
return month_differ
4.获得每个用户最早购买的月份CohortMonth
df['OrderMonth'] = df['InvoiceDate'].apply(get_month)
df['cohortMonth'] = df.groupby("CustomerID")["OrderMonth"].transform(np.min)
df.head()
输出结果:
5.获得每个订单是用户在第几个月购买的
df["CohortIndex"] = df.apply(lambda x:month_differ(x,OrderMonth,x.CohortMonth),axis = 1)
df.head()
输出结果:
6.Group BY 统计每个Cohort Group 第0个月到第n个月的用户数
cohort_data = df.groupby(['CohortMonth','CohortIndex'])['CustomerID'].agg('nunique').reset_index()
cohort_data.head()
输出结果:
7.Pivot
cohort_pivot = cohort_data.pivot_table(index = 'CohortIndex',columns = 'CohortMonth',values = 'CustomerID')
cohort_pivot.columns = cohort_pivot.columns.date
cohort_pivot.fillna(' ')
输出结果:
8.计算百分比
cohort_base = cohort_pivot.iloc[0,:]
retention = cohort_pivot.divide(cohort_base,axis=1)
retention.fillna(' ')
retention
输出结果:
9.绘制图形
retention.iloc[:,:5].plt()
plt.show()
网友评论