Matplotlib 绘图

作者: 大石兄 | 来源:发表于2019-02-17 22:15 被阅读1次

Matplotlib

Matplotlib官网

matplotlib是PYTHON绘图的基础库,是模仿matlab绘图工具开发的一个开源库。 PYTHON其它第三方绘图库都依赖与matplotlib。 本节课我们重点学习三种绘图方式:

  1. matplotlib绘制基础图形
  2. pandas plot API
  3. seaborn绘制统计图形

我们可视化课程的重点是利用图形去理解数据,而不是注重图形的美观。因此本课程讲解的图形都是基于数据统计分析的简单图形,类似于雷达图这样的复杂图形不会在课程中讲解。

Hello World

import numpy as np 
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号


X = np.linspace(0, 2*np.pi,100)# 均匀的划分数据
Y = np.sin(X)
Y1 = np.cos(X)

plt.title("Hello World!!")
plt.plot(X,Y)
plt.plot(X,Y1)
[<matplotlib.lines.Line2D at 0x20136a19470>]
output_2_1.png
X = np.linspace(0, 2*np.pi,100)
Y = np.sin(X)
plt.subplot(211) # 等价于 subplot(2,1,1)
plt.plot(X,Y)

plt.subplot(212)
plt.plot(X,np.cos(X),color = 'r')
[<matplotlib.lines.Line2D at 0x20136db8f28>]

[图片上传失败...(image-701854-1550412905440)]

BAR CHART

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally

<img src="http://wx4.sinaimg.cn/mw690/d409b13egy1fo5lwgz9ddj218g0gojsu.jpg" width = "500" height = "300" alt="图片名称" align=center />

Verticle

data = [5,25,50,20]
plt.bar(range(len(data)),data)

<Container object of 4 artists>

[图片上传失败...(image-3b5f34-1550412905441)]

Horizontal

data = [5,25,50,20]
plt.barh(range(len(data)),data)

<Container object of 4 artists>

[图片上传失败...(image-a28625-1550412905441)]

多个bar

data = [[5,25,50,20],
        [4,23,51,17],
        [6,22,52,19]]
X = np.arange(4)

plt.bar(X + 0.00, data[0], color = 'b', width = 0.25,label = "A")
plt.bar(X + 0.25, data[1], color = 'g', width = 0.25,label = "B")
plt.bar(X + 0.50, data[2], color = 'r', width = 0.25,label = "C")

plt.legend()

<matplotlib.legend.Legend at 0x2013723d550>

[图片上传失败...(image-4a1959-1550412905441)]

Stacked

data = [[5,25,50,20],
        [4,23,51,17],
        [6,22,52,19]]
X = np.arange(4)

plt.bar(X, data[0], color = 'b', width = 0.25)
plt.bar(X, data[1], color = 'g', width = 0.25,bottom = data[0])
plt.bar(X, data[2], color = 'r', width = 0.25,bottom = np.array(data[0]) + np.array(data[1]))

plt.show()

[图片上传失败...(image-807bf7-1550412905441)]

SCATTER POINTS

<img src="http://wx4.sinaimg.cn/mw690/d409b13egy1fo5lw4ermvj207e05amx0.jpg" width = "500" height = "500" alt="图片名称" align=center />

散点图用来衡量两个连续变量之间的相关性


N = 50
x = np.random.rand(N)
y = np.random.rand(N)

plt.scatter(x, y)
<matplotlib.collections.PathCollection at 0x20137391cc0>

[图片上传失败...(image-9e99fb-1550412905441)]


N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.randn(N)
area = np.pi * (15 * np.random.rand(N))**2  #  调整大小

plt.scatter(x, y, c=colors, alpha=0.5, s = area)
<matplotlib.collections.PathCollection at 0x201373f4ba8>

[图片上传失败...(image-3a84c1-1550412905441)]


N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.randint(0,2,size =50)
plt.scatter(x, y, c=colors, alpha=0.5,s = area)
<matplotlib.collections.PathCollection at 0x201374572b0>

[图片上传失败...(image-d73cb4-1550412905441)]

Histogram

A histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson.[1] It is a kind of bar graph. To construct a histogram, the first step is to "bin" the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size.

解释:直方图是用来衡量连续变量的概率分布的。在构建直方图之前,我们需要先定义好bin(值的范围),也就是说我们需要先把连续值划分成不同等份,然后计算每一份里面数据的数量。

<img src="http://wx2.sinaimg.cn/mw690/d409b13egy1fo5lwbw21uj20fk0boa9z.jpg" width = "500" height = "300" alt="图片名称" align=center />

a = np.random.rand(100)
plt.hist(a,bins= 20)
plt.ylim(0,15)
(0, 15)

[图片上传失败...(image-5e741c-1550412905441)]

a = np.random.randn(10000)
plt.hist(a,bins=50)
plt.title("Standard Normal Distribution")
Text(0.5,1,'Standard Normal Distribution')

[图片上传失败...(image-1aded0-1550412905441)]

BOXPLOTS

boxlot用于表达连续特征的百分位数分布。统计学上经常被用于检测单变量的异常值,或者用于检查离散特征和连续特征的关系

<img src="http://wx1.sinaimg.cn/mw690/d409b13egy1fo5lw7l1u7j207405hdfm.jpg" width = "300" height = "300" alt="图片名称" align=center />

<img src="https://gss0.bdstatic.com/94o3dSag_xI4khGkpoWK1HF6hhy/baike/c0%3Dbaike80%2C5%2C5%2C80%2C26/sign=4e5ee1bdacaf2eddc0fc41bbec796a8c/aa18972bd40735fade9ad1029e510fb30f240826.jpg" width = "300" height = "300" alt="图片名称" align=center />

x = np.random.randint(20,100,size = (30,3))

plt.boxplot(x)
plt.ylim(0,120)
plt.xticks([1,2,3],['A','B','C'])
plt.hlines(y = np.mean(x,axis = 0)[1] ,xmin =0,xmax=3)
<matplotlib.collections.LineCollection at 0x2013851ebe0>

[图片上传失败...(image-8bdda1-1550412905441)]

np.mean(x,axis = 0)
array([ 55.63333333,  62.13333333,  66.26666667])
np.median(x,axis = 0)
array([ 52.5,  71. ,  71.5])

COLORS/TEXTS/annotate

** NAMED COLORS**


color
fig, ax = plt.subplots(facecolor='darkseagreen')
data = [[5,25,50,20],
        [4,23,51,17],
        [6,22,52,19]]
X = np.arange(4)

plt.bar(X, data[0], color = 'darkorange', width = 0.25,label = 'A')
plt.bar(X, data[1], color = 'steelblue', width = 0.25,bottom = data[0],label = 'B')
plt.bar(X, data[2], color = 'violet', width = 0.25,bottom = np.array(data[0]) + np.array(data[1]),label = 'C')
ax.set_title("Figure 1")
plt.legend()

<matplotlib.legend.Legend at 0x20138638dd8>

[图片上传失败...(image-ea24b6-1550412905441)]

增加文字

plt.text(x, y, s, fontdict=None, withdash=False, **kwargs)
fig, ax = plt.subplots(facecolor='teal')
data = [[5,25,50,20],
        [4,23,51,17],
        [6,22,52,19]]
X = np.arange(4)

plt.bar(X+0.00, data[0], color = 'darkorange', width = 0.25,label = 'A')
plt.bar(X+0.25, data[1], color = 'steelblue', width = 0.25)
plt.bar(X+0.50, data[2], color = 'violet', width = 0.25,label = 'C')
ax.set_title("Figure 2")
plt.legend()
# 添加文字描述
W = [0.00,0.25,0.50]
for i in range(3):
    for a,b in zip(X+W[i],data[i]):
        plt.text(a,b,"%.0f"% b,ha="center",va= "bottom")
plt.xlabel("Group")
plt.ylabel("Num")
plt.text(0.0,48,"TEXT")
Text(0,48,'TEXT')

[图片上传失败...(image-b59f39-1550412905441)]

在数据可视化的过程中,图片中的文字经常被用来注释图中的一些特征。使用annotate()方法可以很方便地添加此类注释。在使用annotate时,要考虑两个点的坐标:被注释的地方xy(x, y)和插入文本的地方xytext(x, y)


X = np.linspace(0, 2*np.pi,100)# 均匀的划分数据
Y = np.sin(X)
Y1 = np.cos(X)

plt.plot(X,Y)
plt.plot(X,Y1)
plt.annotate('Points',
         xy=(1, np.sin(1)),
         xytext=(2, 0.5), fontsize=16,
         arrowprops=dict(arrowstyle="->"))
plt.title("这是一副测试图!")
Text(0.5,1,'这是一副测试图!')

[图片上传失败...(image-3c3e79-1550412905441)]

?plt.annotate

想要让matplotlib正确的显示中文,我们需要进行一行特殊的设置


import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签
plt.rcParams['axes.unicode_minus']=False #用来正常显示负号

Subplots

matplotlib.pyplot.subplots(nrows=1, ncols=1, sharex=False, sharey=False, squeeze=True, subplot_kw=None, gridspec_kw=None, **fig_kw)

使用 subplot 绘制多个图形

subplot(nrows, ncols, index, **kwargs)

pylab.rcParams['figure.figsize'] = (10, 6) # 调整图片大小

np.random.seed(19680801)

n_bins = 10
x = np.random.randn(1000, 3)

fig, axes = plt.subplots(nrows=2, ncols=2,facecolor='darkslategray') 
ax0, ax1, ax2, ax3 = axes.flatten()

# colors = ['red', 'tan', 'lime']
# ax0.hist(x, n_bins, normed=1, histtype='bar', color=colors, label=colors)
# ax0.legend(prop={'size': 10})
# ax0.set_title('bars with legend')

ax1.hist(x, n_bins, normed=1, histtype='bar', stacked=True)
ax1.set_title('stacked bar')

ax2.hist(x, n_bins, histtype='step', stacked=True, fill=False)
ax2.set_title('stack step (unfilled)')

# Make a multiple-histogram of data-sets with different length.
# x_multi = [np.random.randn(n) for n in [10000, 5000, 2000]]
# ax3.hist(x_multi, n_bins, histtype='bar')
# ax3.set_title('different sample sizes')

fig.tight_layout() # Adjust subplot parameters to give specified padding.
plt.show()
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-20-61d2cde86b2d> in <module>()
      1 
----> 2 pylab.rcParams['figure.figsize'] = (10, 6) # 调整图片大小
      3 
      4 np.random.seed(19680801)
      5 


NameError: name 'pylab' is not defined
# ShareX or ShareY
N_points = 100000
n_bins = 20

# Generate a normal distribution, center at x=0 and y=5
x = np.random.randn(N_points)
y = .4 * x + np.random.randn(100000) + 5

fig, axs = plt.subplots(1, 2, sharey=True, tight_layout=True)

# We can set the number of bins with the `bins` kwarg
axs[0].hist(x, bins=n_bins)
axs[1].hist(y, bins=n_bins)

PANDAS API

import pandas as pd 
df = pd.read_csv("NBAPlayers.txt",sep='\t')
df.head()
df.plot.scatter(x = "height",y = "weight",c = "born")
df['birth_state'].value_counts()[:50].plot.barh()
grouped = df.groupby("birth_state")
gs = grouped.size()
gs[gs >=10].sort_values().plot.bar()
df[['height','weight']].plot.hist()
df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000),
                     'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])
df4.plot.hist(alpha=0.5)
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
df.plot(kind = "box")

Sseaborn: statistical data visualization

Visualizing the distribution of a dataset

import seaborn as sns
sns.set()

tips = sns.load_dataset("tips")
iris = sns.load_dataset("iris")
sns.distplot(iris.sepal_length)
?sns.distplot
sns.distplot(iris.sepal_length,bins = 20,kde = False)
# 多个变量在一幅图中比较
sns.distplot(iris.sepal_length,bins = 20,kde = False)
sns.distplot(iris.sepal_width,bins = 20,kde = False)
sns.distplot(iris.sepal_length,bins = 20,hist=False,label = "Length")
sns.distplot(iris.sepal_width,bins = 20,hist = False,label="Width")

Plotting bivariate distributions

iris.dtypes
# 返回的结果是散点图,以及两个变量的直方图
sns.jointplot(x = "sepal_length",y = "sepal_width",data=iris)
sns.jointplot(x = "sepal_length",y = "sepal_width",data=iris,kind = 'kde')

Visualizing pairwise relationships in a dataset

sns.pairplot(iris)

Plotting with categorical data

sns.stripplot(x="day", y="total_bill", data=tips);
sns.stripplot(x="day", y="total_bill", data=tips, jitter=True,hue = "smoker");
# POINT不会重叠
sns.swarmplot(x="day", y="total_bill", data=tips);
tips.dtypes
sns.barplot(x="tip", y="day",hue = "smoker", data=tips)
?sns.barplot

Visualizing linear relationships

sns.lmplot(x="total_bill", y="tip", data=tips)
anscombe = sns.load_dataset("anscombe")
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),
            scatter_kws={"s": 80});
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
           ci=None, scatter_kws={"s": 80});
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
           order=2, ci=None, scatter_kws={"s": 80});
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),
           ci=None, scatter_kws={"s": 80});
sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),
           robust=True, ci=None, scatter_kws={"s": 80});
sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips)

相关文章

网友评论

    本文标题:Matplotlib 绘图

    本文链接:https://www.haomeiwen.com/subject/pofcsqtx.html