可视化部分的笔记可能没之前记的那么详尽,特别是seaborn这部分,特别是绘图函数的参数,我认为等以后用到这些图的时候再去研究详细的参数,记录笔记是要知道有这些图可以使用。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
分布
- distplot 概率分布图
- kdeplot 概率密度图
- jointplot 联合密度图
- painplot 多变量图
分类
- boxplot 箱线图
- violinplot 提琴图
- barplot 柱形图
- factorplot 因子图
线性
- lmplot 回归图
- heatmap 热力图
columns = ['use_id','order_dt','order_products','order_amount']
df = pd.read_table('CDNOW_master.txt',names=columns,sep='\s+')
sns.distplot(df.order_amount) #概率分布图
data:image/s3,"s3://crabby-images/cec7f/cec7fb5f48d6f169d7a84a9967bd1788e36fec02" alt=""
sns.kdeplot(df.order_amount) #概率密度图
data:image/s3,"s3://crabby-images/e12b3/e12b3c0ab875b4d62554e10d79cf4c21b1ba1a34" alt=""
grouped_user =df.groupby('use_id').sum()
sns.jointplot(grouped_user.order_products,grouped_user.order_amount) #联合密度图
data:image/s3,"s3://crabby-images/17e7a/17e7a8144f7dd504c70166a779b56f64201dfc6a" alt=""
df['order_dt'] =pd.to_datetime(df.order_dt,format = '%Y-%m-%d')
rfm = df.pivot_table(index = 'use_id',
values = ['order_products','order_amount','order_dt'],
aggfunc = {'order_products':'max',
'order_amount':'sum',
'order_dt':'count'
})
rfm['R']=-(rfm.order_dt - rfm.order_dt.max())
rfm.rename(columns={'order_products':'F','order_amount':'M'},inplace = True)
sns.jointplot(rfm.R,rfm.F) #联合密度图
data:image/s3,"s3://crabby-images/cbadc/cbadcda81cce3f0db4944951a3e6cf92e97b305f" alt=""
sns.pairplot(rfm[['R','F','M']]) # 多变量图
data:image/s3,"s3://crabby-images/e978a/e978a9240240dee3342466674e03c272d1b09aea" alt=""
plt.rcParams['font.sans-serif'] ='SimHei'
df = pd.read_csv('cy.csv',encoding='gbk')
plt.figure(figsize = (20,5))
sns.boxplot(x='类型',y='口味',data=df) # 箱线图
data:image/s3,"s3://crabby-images/3ba46/3ba46f49af69525c9dafc39de2ee37eba7e6fc3d" alt=""
df2 =df.query("(城市 == '上海')|(城市 =='北京')")
plt.figure(figsize = (20,5))
sns.violinplot(x='类型',y='口味',hue='城市',data=df2,split =True)
data:image/s3,"s3://crabby-images/9e284/9e284bd379df585be22abc0960be724bef274260" alt=""
plt.figure(figsize = (20,5))
sns.violinplot(x='类型',y='口味',hue='城市',data=df2,split =True) # 提琴图
data:image/s3,"s3://crabby-images/096a9/096a97b9f7eadb5d169c2fe2183f0667e964b307" alt=""
sns.factorplot(x='类型',y='口味',data=df2,size=10) # 因子图
data:image/s3,"s3://crabby-images/20348/203481cdf08f7c0e563b66882c9410c523fe863a" alt=""
sns.lmplot(x='口味',y='环境',data=df) # 回归图
data:image/s3,"s3://crabby-images/a46d0/a46d00998bf337190f60e7bfc3531f5591c9f444" alt=""
pt =df.pivot_table(index='城市',columns='类型',values='口味',aggfunc='mean')
plt.figure(figsize=(20,20))
sns.heatmap(pt) # 热力图
data:image/s3,"s3://crabby-images/ae482/ae48215cd7784ac51b704b509ca927078cca10ea" alt=""
网友评论