内容小结:
- 环境配置
- 折线图
- 柱状图
- 直方图
- 密度图
- 双峰正态分布图
- 散点图
学习小结:
2k+页的(经常改细节的)纯英手册很难看完,但没必要看完,需要时可以搜索官方pdf文档。函数和图示核心是为了更好地展示数据,更重要的是理解图示特点和重要参数。
1. 环境配置
matplotlib代码要写很长,套用函数是为了少写一点代码。
没错,这本书做到一半,作者说[这本书旧了!去看pandas官网的资料吧!],目瞪口呆.jpg
- 升级到官网最新版
以下是是Anaconda集成环境
#看到是旧版
lee>conda list pandas
# packages in environment at C:\Users\****\Anaconda3:
# Name Version Build Channel
pandas 0.20.3 py36hce827b7_2
#升级一下
lee> conda update pandas
- 去官网下载一份最新的RN,并绝望地发现它有2573页
死心知道看不完,用的时候搜关键词,每次多看一点点。
release.png- 注意事实画图在 Anaconda prompt打开
ipython --pylab
2. 折线图
In [2]: s = Series(np.random.randn(10).cumsum(),index=np.arange(0,100,10))
In [3]: s
Out[3]:
0 0.630734
10 -0.497936
20 0.499530
30 -0.242562
40 0.479425
50 2.252005
60 3.065480
70 1.579776
80 0.616986
90 2.368518
dtype: float64
In [4]: s.plot()
Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x451e694f28>
plot.png
In [5]: df = DataFrame(np.random.randn(10,4).cumsum(0),
...: columns=['A','B','C','D'],
...: index=np.arange(0,100,10))
In [6]:
In [6]: df.plot()
Out[6]: <matplotlib.axes._subplots.AxesSubplot at 0x451f98d6a0>
zx2.png
3. 柱状图
- 垂直柱状图
In [29]: data = Series(np.random.rand(16),index=['a', 'b', 'c', 'd', 'e', 'f',
...: 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p'])
In [30]: data
Out[30]:
a 0.653354
b 0.388024
c 0.341464
d 0.275227
e 0.968719
f 0.085227
g 0.496338
h 0.276607
i 0.302645
j 0.954232
k 0.293769
l 0.423546
m 0.400934
n 0.397526
o 0.849696
p 0.269723
dtype: float64
In [32]: data.plot(kind='bar',color='k',alpha=0.3)
Out[32]: <matplotlib.axes._subplots.AxesSubplot at 0x45247e16d8>
bar1.png
- 水平柱状图
In [34]: data.plot(kind='barh',color='k',alpha=0.3)
Out[34]: <matplotlib.axes._subplots.AxesSubplot at 0x452484a240>
barh.png
2.2 排序后的水平柱状图(sort(), order()在pandas23.4不能用了,变为sort_values())
In [54]: result['Zinc, Zn'].sort_values()
Out[54]:
fgroup
Fats and Oils 0.020
Beverages 0.040
Fruits and Fruit Juices 0.100
Soups, Sauces, and Gravies 0.200
Vegetables and Vegetable Products 0.330
Sweets 0.360
Baby Foods 0.590
Meals, Entrees, and Sidedishes 0.630
Baked Products 0.660
Finfish and Shellfish Products 0.670
Restaurant Foods 0.800
Ethnic Foods 1.045
Cereal Grains and Pasta 1.090
Legumes and Legume Products 1.140
Fast Foods 1.250
Dairy and Egg Products 1.390
Snacks 1.470
Sausages and Luncheon Meats 2.130
Pork Products 2.320
Poultry Products 2.500
Spices and Herbs 2.750
Breakfast Cereals 2.885
Nut and Seed Products 3.290
Lamb, Veal, and Game Products 3.940
Beef Products 5.390
Name: value, dtype: float64
In [55]:
In [55]:
In [55]: result['Zinc, Zn'].sort_values().plot(kind='barh')
Out[55]: <matplotlib.axes._subplots.AxesSubplot at 0xea2e812710>
sort_values_barh.png
- 分组柱状图
书上那条指令会挤成一团,因为DataFrame的引用方式改了。
# 错误的挤成一团
In [2]: tips = pd.read_csv('ch08/tips.csv')
In [3]: party_counts = pd.crosstab(tips.day,tips.size)
In [4]: party_counts
Out[4]:
col_0 1708
day
Fri 19
Sat 87
Sun 76
Thur 62
# 正确引用
In [5]: party_counts = pd.crosstab(tips['day'],tips['size'])
In [6]: party_counts
Out[6]:
size 1 2 3 4 5 6
day
Fri 1 16 1 1 0 0
Sat 2 53 18 13 1 0
Sun 0 39 15 18 3 1
Thur 1 48 4 5 1 3
In [8]: party_counts.plot(kind='bar')
Out[8]: <matplotlib.axes._subplots.AxesSubplot at 0xe2f2f46160>
bar3.png
``
- 规格化为百分比的柱状图(和为1)
In [9]: party_pcts = party_counts.div(party_counts.sum(1).astype(float),axis=0)
...:
In [10]: party_pcts.plot(kind='bar',stacked = True)
Out[10]: <matplotlib.axes._subplots.AxesSubplot at 0xe2f3eea320>
bar4.png
4. 直方图
In [13]: tips['tips_pct'] = tips['tip'] / tips['total_bill']
In [14]: tips['tips_pct'].hist(bins=50)
Out[14]: <matplotlib.axes._subplots.AxesSubplot at 0xe2f682e978>
hist1.png
5. 密度图
核密度估计Kernel Density Estimation(KDE)
In [18]: tips['tips_pct'].plot(kind='kde')
Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0xe2fa6dd7b8>
kde1.png
6. 双峰正态分布图
In [23]: comp1 = np.random.normal(0,1,size=200)
In [24]: comp2 = np.random.normal(10,2,size = 200)
In [25]: values = Series(np.concatenate([comp1,comp2]))
In [27]: values.hist(bins=100,alpha=0.3,color='k',normed = True)
Out[27]: <matplotlib.axes._subplots.AxesSubplot at 0xe2fa8f9780>
In [28]: values.plot(kind='kde',style='g--')
Out[28]: <matplotlib.axes._subplots.AxesSubplot at 0xe2fa8f9780>
double_normal.png
7. 散点图
In [29]: macro = pd.read_csv('ch08/macrodata.csv')
In [30]: data = macro[['cpi','m1','tbilrate','unemp']]
In [31]: trans_data = np.log(data).diff().dropna()
In [32]: trans_data[-5:]
Out[32]:
cpi m1 tbilrate unemp
198 -0.007904 0.045361 -0.396881 0.105361
199 -0.021979 0.066753 -2.277267 0.139762
200 0.002340 0.010286 0.606136 0.160343
201 0.008419 0.037461 -0.200671 0.127339
202 0.008894 0.012202 -0.405465 0.042560
In [33]: plt.scatter(trans_data['m1'],trans_data['unemp'])
Out[33]: <matplotlib.collections.PathCollection at 0xe2fafd7710>
In [34]: plt.title('changes in log %s vs. log %s' % ('m1','unemp'))
Out[34]: Text(0.5,1,'changes in log m1 vs. log unemp')
scatter.png
一组数量的散点图,用于看规律。
In [39]: pd.scatter_matrix(trans_data,diagonal='kde',color = 'k',alpha=0.3)
scatter_matrix.png
2018.8.20
依旧是《用python进行数据分析》,这本书真好,卖力安利!亚马逊有kindle版本,可以用来搜索关键词。不过源码细节在pandas新版本有更改,以上是我调试过的可行代码。
其实是上周学的, 今天工作里也用上了,yeah~
网友评论