2 二元数据集的分布

作者: readilen | 来源:发表于2018-03-24 10:48 被阅读29次

数据之间的关联的经典做法是皮尔逊和斯皮尔曼计算，最简单的方法就是jointplot了，这个函数很厉害，可以绘制多个面板，详细的展示两个变量的关联,

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats, integrate

np.random.seed(sum(map(ord, 'distributions')))

mean, cov = [0, 1],[(1, .5),(.5, 1)]
data = np.random.multivariate_normal(mean, cov, 200)
df = pd.DataFrame(data, columns=['x', 'y'])

分散点图scatterplot

最简单的观察方法是分散点图，plt.scatter也可以，x轴表示x的数据，y轴表示y的数据，使用jointplot

sns.jointplot(x="x", y="y", data=df);

scatter

六角硬币图(Hexbin)

使用一个六角形的硬币的颜色反应落在该上的数值多少。

sns.jointplot(x="x", y="y", data=df, kind='hexbin')

hexbin

核密度估计（Kernel density estimation）

二元变量的分布也可以使用核密度函数，像不像等高线图哈哈

sns.jointplot(x="x", y="y", data=df, kind='kde')

kde

核密度函数还有另一种画法

f, ax = plt.subplots(figsize=(12, 8))
sns.kdeplot(df.x, df.y, ax=ax)
sns.rugplot(df.x, color="g", ax=ax)
sns.rugplot(df.y, vertical=True, ax=ax)

kde.png

如果你想让图形显示的连续写，可以修改参数

f, ax = plt.subplots(figsize=(12, 8))
cmap = sns.cubehelix_palette(as_cmap=True, dark=0, light=1, reverse=True)
sns.kdeplot(df.x, df.y, cmap=cmap, n_levels=60, shade=True)

continue

jiointplot使用一个JointGrid来管理图形，可以直接使用JointGrid来添加函数，例如

g = sns.jointplot(x="x", y="y", data=df, kind="kde", color="m")
g.plot_joint(plt.scatter, c="w", s=30, linewidth=1, marker="+")
g.ax_joint.collections[0].set_alpha(0)
g.set_axis_labels("$X$", "$Y$");

JionGrid

本文最后介绍一个多元变量的二元关系的画法，pairplot创建一个矩阵，每一个小图显示两个变量之间的关联,默认对角线上显示一元变量图。

iris = sns.load_dataset("iris")
sns.pairplot(iris);

pairplot

jointplot和pairplot非常相似，jointplot使用JoinGrid管理图形，pairplot使用PairGrid管理图形，可以更灵活的使用

g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.kdeplot, cmap="Blues_d", n_levels=6);

PairGrid

网友评论

本文标题：2 二元数据集的分布

本文链接：https://www.haomeiwen.com/subject/fiqqcftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

2 二元数据集的分布

分散点图scatterplot

六角硬币图(Hexbin)

核密度估计（Kernel density estimation）

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

数据可视化

大数据

玩转大数据

机器学习与数据挖掘

数据结构和算法分析