美文网首页
pandas 0.23.4 中的绘图函数, Anaconda ‘

pandas 0.23.4 中的绘图函数, Anaconda ‘

作者: LeeMin_Z | 来源:发表于2018-08-20 23:05 被阅读80次

    内容小结:

    1. 环境配置
    2. 折线图
    3. 柱状图
    4. 直方图
    5. 密度图
    6. 双峰正态分布图
    7. 散点图

    学习小结:

    2k+页的(经常改细节的)纯英手册很难看完,但没必要看完,需要时可以搜索官方pdf文档。函数和图示核心是为了更好地展示数据,更重要的是理解图示特点和重要参数。

    1. 环境配置

    matplotlib代码要写很长,套用函数是为了少写一点代码。

    没错,这本书做到一半,作者说[这本书旧了!去看pandas官网的资料吧!],目瞪口呆.jpg

    1. 升级到官网最新版

    以下是是Anaconda集成环境

    #看到是旧版
    
    lee>conda list pandas
    # packages in environment at C:\Users\****\Anaconda3:
    
    # Name                    Version                   Build  Channel
    pandas                    0.20.3           py36hce827b7_2
    
    #升级一下
    
    lee> conda update pandas
    
    1. 去官网下载一份最新的RN,并绝望地发现它有2573页

    死心知道看不完,用的时候搜关键词,每次多看一点点。

    release.png
    1. 注意事实画图在 Anaconda prompt打开 ipython --pylab

    2. 折线图

    In [2]: s = Series(np.random.randn(10).cumsum(),index=np.arange(0,100,10))
    
    In [3]: s
    Out[3]:
    0     0.630734
    10   -0.497936
    20    0.499530
    30   -0.242562
    40    0.479425
    50    2.252005
    60    3.065480
    70    1.579776
    80    0.616986
    90    2.368518
    dtype: float64
    
    In [4]: s.plot()
    Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x451e694f28>
    
    plot.png
    In [5]: df = DataFrame(np.random.randn(10,4).cumsum(0),
       ...: columns=['A','B','C','D'],
       ...: index=np.arange(0,100,10))
    
    In [6]:
    
    In [6]: df.plot()
    Out[6]: <matplotlib.axes._subplots.AxesSubplot at 0x451f98d6a0>
    
    zx2.png

    3. 柱状图

    1. 垂直柱状图
    In [29]: data = Series(np.random.rand(16),index=['a', 'b', 'c', 'd', 'e', 'f',
        ...: 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p'])
    
    In [30]: data
    Out[30]:
    a    0.653354
    b    0.388024
    c    0.341464
    d    0.275227
    e    0.968719
    f    0.085227
    g    0.496338
    h    0.276607
    i    0.302645
    j    0.954232
    k    0.293769
    l    0.423546
    m    0.400934
    n    0.397526
    o    0.849696
    p    0.269723
    dtype: float64
    
    In [32]: data.plot(kind='bar',color='k',alpha=0.3)
    Out[32]: <matplotlib.axes._subplots.AxesSubplot at 0x45247e16d8>
    
    bar1.png
    1. 水平柱状图
    In [34]: data.plot(kind='barh',color='k',alpha=0.3)
    Out[34]: <matplotlib.axes._subplots.AxesSubplot at 0x452484a240>
    
    barh.png

    2.2 排序后的水平柱状图(sort(), order()在pandas23.4不能用了,变为sort_values())

    In [54]: result['Zinc, Zn'].sort_values()
    Out[54]:
    fgroup
    Fats and Oils                        0.020
    Beverages                            0.040
    Fruits and Fruit Juices              0.100
    Soups, Sauces, and Gravies           0.200
    Vegetables and Vegetable Products    0.330
    Sweets                               0.360
    Baby Foods                           0.590
    Meals, Entrees, and Sidedishes       0.630
    Baked Products                       0.660
    Finfish and Shellfish Products       0.670
    Restaurant Foods                     0.800
    Ethnic Foods                         1.045
    Cereal Grains and Pasta              1.090
    Legumes and Legume Products          1.140
    Fast Foods                           1.250
    Dairy and Egg Products               1.390
    Snacks                               1.470
    Sausages and Luncheon Meats          2.130
    Pork Products                        2.320
    Poultry Products                     2.500
    Spices and Herbs                     2.750
    Breakfast Cereals                    2.885
    Nut and Seed Products                3.290
    Lamb, Veal, and Game Products        3.940
    Beef Products                        5.390
    Name: value, dtype: float64
    
    In [55]:
    
    In [55]:
    
    In [55]: result['Zinc, Zn'].sort_values().plot(kind='barh')
    Out[55]: <matplotlib.axes._subplots.AxesSubplot at 0xea2e812710>
    
    sort_values_barh.png
    1. 分组柱状图

    书上那条指令会挤成一团,因为DataFrame的引用方式改了。

    # 错误的挤成一团
    
    In [2]: tips = pd.read_csv('ch08/tips.csv')
    
    In [3]: party_counts = pd.crosstab(tips.day,tips.size)
    
    In [4]: party_counts
    Out[4]:
    col_0  1708
    day
    Fri      19
    Sat      87
    Sun      76
    Thur     62
    
    # 正确引用
    
    In [5]: party_counts = pd.crosstab(tips['day'],tips['size'])
    
    In [6]: party_counts
    Out[6]:
    size  1   2   3   4  5  6
    day
    Fri   1  16   1   1  0  0
    Sat   2  53  18  13  1  0
    Sun   0  39  15  18  3  1
    Thur  1  48   4   5  1  3
    
    In [8]: party_counts.plot(kind='bar')
    Out[8]: <matplotlib.axes._subplots.AxesSubplot at 0xe2f2f46160>
    
    bar3.png

    ``

    1. 规格化为百分比的柱状图(和为1)
    In [9]: party_pcts = party_counts.div(party_counts.sum(1).astype(float),axis=0)
       ...:
    
    In [10]: party_pcts.plot(kind='bar',stacked = True)
    Out[10]: <matplotlib.axes._subplots.AxesSubplot at 0xe2f3eea320>
    
    bar4.png

    4. 直方图

    In [13]: tips['tips_pct'] = tips['tip'] / tips['total_bill']
    
    In [14]: tips['tips_pct'].hist(bins=50)
    Out[14]: <matplotlib.axes._subplots.AxesSubplot at 0xe2f682e978>
    
    hist1.png

    5. 密度图

    核密度估计Kernel Density Estimation(KDE)

    In [18]: tips['tips_pct'].plot(kind='kde')
    Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0xe2fa6dd7b8>
    
    kde1.png

    6. 双峰正态分布图

    In [23]: comp1 = np.random.normal(0,1,size=200)
    
    In [24]: comp2 = np.random.normal(10,2,size = 200)
    
    In [25]: values = Series(np.concatenate([comp1,comp2]))
    
    In [27]: values.hist(bins=100,alpha=0.3,color='k',normed = True)
    Out[27]: <matplotlib.axes._subplots.AxesSubplot at 0xe2fa8f9780>
    
    In [28]: values.plot(kind='kde',style='g--')
    Out[28]: <matplotlib.axes._subplots.AxesSubplot at 0xe2fa8f9780>
    
    double_normal.png

    7. 散点图

    In [29]: macro = pd.read_csv('ch08/macrodata.csv')
    
    In [30]: data = macro[['cpi','m1','tbilrate','unemp']]
    
    In [31]: trans_data = np.log(data).diff().dropna()
    
    In [32]: trans_data[-5:]
    Out[32]:
              cpi        m1  tbilrate     unemp
    198 -0.007904  0.045361 -0.396881  0.105361
    199 -0.021979  0.066753 -2.277267  0.139762
    200  0.002340  0.010286  0.606136  0.160343
    201  0.008419  0.037461 -0.200671  0.127339
    202  0.008894  0.012202 -0.405465  0.042560
    
    In [33]: plt.scatter(trans_data['m1'],trans_data['unemp'])
    Out[33]: <matplotlib.collections.PathCollection at 0xe2fafd7710>
    
    In [34]: plt.title('changes in log %s vs. log %s' % ('m1','unemp'))
    Out[34]: Text(0.5,1,'changes in log m1 vs. log unemp')
    
    scatter.png

    一组数量的散点图,用于看规律。

    In [39]: pd.scatter_matrix(trans_data,diagonal='kde',color = 'k',alpha=0.3)
    
    scatter_matrix.png

    2018.8.20

    依旧是《用python进行数据分析》,这本书真好,卖力安利!亚马逊有kindle版本,可以用来搜索关键词。不过源码细节在pandas新版本有更改,以上是我调试过的可行代码。

    其实是上周学的, 今天工作里也用上了,yeah~

    相关文章

      网友评论

          本文标题:pandas 0.23.4 中的绘图函数, Anaconda ‘

          本文链接:https://www.haomeiwen.com/subject/vmswbftx.html