首先,准备环境:
%matplotlib inline
import numpy as np
import pandas as pd
from scipy import stats,integrate
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
1. 直方图
常用直方图来观察某一特征(变量)的数据分布情况:
# 从高斯分布随中,机产生100个样本数据
x = np.random.normal(size=100)
# 绘制直方图.kde即kernel density estimate,用于控制是否绘制核密度估计
sns.distplot(x,kde=True)
![](https://img.haomeiwen.com/i13764292/e4d41ace1a37c3e7.png)
2. 散点图
观察两个变量之间的关系
mean,cov =[0,3],[(1,.5),(.5,1)]
# mean是多维分布的均值;cov是协方差矩阵。注意:协方差矩阵必须是对称的且需为半正定矩阵;
data = np.random.multivariate_normal(mean,cov,200)
df = pd.DataFrame(data,columns=["x","y"])
# 创建密度图
sns.jointplot(x="x",y="y",data=df)
![](https://img.haomeiwen.com/i13764292/f7e1d92c60519086.png)
当样本特别多的时候,可以用散点密度图:
x,y = np.random.multivariate_normal(mean,cov,1000).T
with sns.axes_style("white"):
sns.jointplot(x=x,y=y,kind="hex",color="k")
![](https://img.haomeiwen.com/i13764292/ffd36e68d56c9d7a.png)
3. 多变量特征对比图
pairplot可以展示特征的二元关系,将两个变量之间关系以散点图形式展示出来
iris = sns.load_dataset("iris")
sns.pairplot(iris)
![](https://img.haomeiwen.com/i13764292/60376c25b5677b03.png)
4. stripplot
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
tips = sns.load_dataset("tips")
sns.stripplot(x="day",y="total_bill",data=tips)
![](https://img.haomeiwen.com/i13764292/7f1862b65b8e8c0a.png)
当样本比较多时,图形变为直线,看不出分布频率,可以添加抖动
sns.stripplot(x="day",y="total_bill",data=tips,jitter=True)
![](https://img.haomeiwen.com/i13764292/45e6b471e2cb9575.png)
5. swarmplot
swarmplot在展示数据时,会将值重叠的数据向两边展开。
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
tips = sns.load_dataset("tips")
# hue表示按照性别区分样本颜色
sns.swarmplot(x="day",y="total_bill",data=tips,hue="sex")
![](https://img.haomeiwen.com/i13764292/f8d4c21605df79f3.png)
5. 盒图
sns.boxplot(x="day",y="total_bill",data=tips,hue="time")
![](https://img.haomeiwen.com/i13764292/085d439aa6cbbdc3.png)
6. 手风琴图
# split将会按hue中定义的变量值区分颜色
sns.violinplot(x="day",y="total_bill",data=tips,hue="sex",split=True)
![](https://img.haomeiwen.com/i13764292/5de9973a050c5c06.png)
7. 条形图
显示值的集中趋势,可以用条形图:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
titanic = sns.load_dataset("titanic")
# hue表示按照性别区分样本颜色
sns.barplot(x="sex",y="survived",data=titannic,hue="class")
![](https://img.haomeiwen.com/i13764292/f49b282bddba3d9f.png)
8. 点图
点图可以更好的描述变化差异
sns.pointplot(x="sex",y="survived",data=titannic,hue="class")
![](https://img.haomeiwen.com/i13764292/22ce082e3bf4eba9.png)
参数控制:
sns.pointplot(x="class",y="survived",data=titannic,hue="sex",palete={"male":"g","female":"m"},markers=["^","o"],linestyles=["-","--"])
![](https://img.haomeiwen.com/i13764292/ff17acdaa7988c00.png)
9. factorplot
factorplot封装了各种图形,根据kind参数不同可以实现不同图形。例如:
sns.factorplot(x="sex",y="survived",data=titannic,hue="class",kind="bar")
![](https://img.haomeiwen.com/i13764292/31da530d56ed519e.png)
sns.factorplot(x="day",y="total_bill",data=tips,hue="smoker",kind="swarm")
![](https://img.haomeiwen.com/i13764292/636516b4038de617.png)
10. facetgrid
facetgrid可以为不同的参数值分布绘制不同的图形
# 为time列的每一个值分类绘制图形
g = sns.FacetGrid(tips,col="time",hue="smoker")
g.map(plt.scatter,"total_bill","tip",alpha=.5)
g.add_legend()
![](https://img.haomeiwen.com/i13764292/de746d00bb7e4b10.png)
g = sns.FacetGrid(tips,col="day",size=4,aspect=.5)
g.map(plt.bar,"sex","total_bill")
![](https://img.haomeiwen.com/i13764292/bd44776a72228e4f.png)
11. 热度图
# 乘机统计
flights = sns.load_dataset("flights")
flights = flights.pivot("month","year","passengers")
ax = sns.heatmap(flights)
![](https://img.haomeiwen.com/i13764292/8d1f77ba01423e4c.png)
网友评论