美文网首页
pandas 基础入门: 向量化方法

pandas 基础入门: 向量化方法

作者: 拙峰朽木 | 来源:发表于2017-05-17 23:46 被阅读610次

求某列数据的平均数(mean)

此处需要用到numpy
求test列的平均数: numpy.mean(test)
求test列的标准差: numpy.sdt(test)
求test列的中位数 : numpy.median(test)

下面有一题:求金牌数不小于1的所以国家的银牌数的平均数:
数据:

countries = ['Russian Fed.', 'Norway', 'Canada', 'United States',
                 'Netherlands', 'Germany', 'Switzerland', 'Belarus',
                 'Austria', 'France', 'Poland', 'China', 'Korea', 
                 'Sweden', 'Czech Republic', 'Slovenia', 'Japan',
                 'Finland', 'Great Britain', 'Ukraine', 'Slovakia',
                 'Italy', 'Latvia', 'Australia', 'Croatia', 'Kazakhstan']

    gold = [13, 11, 10, 9, 8, 8, 6, 5, 4, 4, 4, 3, 3, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
    silver = [11, 5, 10, 7, 7, 6, 3, 0, 8, 4, 1, 4, 3, 7, 4, 2, 4, 3, 1, 0, 0, 2, 2, 2, 1, 0]
    bronze = [9, 10, 5, 12, 9, 5, 2, 1, 5, 7, 1, 2, 2, 6, 2, 4, 3, 1, 2, 1, 0, 6, 2, 1, 0, 1]
    
    olympic_medal_counts = {'country_name':Series(countries),
                            'gold': Series(gold),
                            'silver': Series(silver),
                            'bronze': Series(bronze)}
    df = DataFrame(olympic_medal_counts)

看下数据框显示:

  bronze    country_name  gold  silver
0        9    Russian Fed.    13      11
1       10          Norway    11       5
2        5          Canada    10      10
3       12   United States     9       7
4        9     Netherlands     8       7
5        5         Germany     8       6
6        2     Switzerland     6       3
7        1         Belarus     5       0
8        5         Austria     4       8
9        7          France     4       4
10       1          Poland     4       1
11       2           China     3       4
12       2           Korea     3       3
13       6          Sweden     2       7
14       2  Czech Republic     2       4
15       4        Slovenia     2       2
16       3           Japan     1       4
17       1         Finland     1       3
18       2   Great Britain     1       1
19       1         Ukraine     1       0
20       0        Slovakia     1       0
21       6           Italy     0       2
22       2          Latvia     0       2
23       1       Australia     0       2
24       0         Croatia     0       1
25       1      Kazakhstan     0       0
  • 结题思路:

1). 获取所有金牌数大于0的国家的数据

at_least_one_gold = df[df.gold > 0]

输出结果:

    bronze    country_name  gold  silver
0        9    Russian Fed.    13      11
1       10          Norway    11       5
2        5          Canada    10      10
3       12   United States     9       7
4        9     Netherlands     8       7
5        5         Germany     8       6
6        2     Switzerland     6       3
7        1         Belarus     5       0
8        5         Austria     4       8
9        7          France     4       4
10       1          Poland     4       1
11       2           China     3       4
12       2           Korea     3       3
13       6          Sweden     2       7
14       2  Czech Republic     2       4
15       4        Slovenia     2       2
16       3           Japan     1       4
17       1         Finland     1       3
18       2   Great Britain     1       1
19       1         Ukraine     1       0
20       0        Slovakia     1       0

2). 只筛选出银牌数据:

bronze_at_least_one_gold = at_least_one_gold['bronze']

数据:

0      9
1     10
2      5
3     12
4      9
5      5
6      2
7      1
8      5
9      7
10     1
11     2
12     2
13     6
14     2
15     4
16     3
17     1
18     2
19     1
20     0

3). 求出平均数:

avg_bronze_at_least_one_gold = numpy.mean(bronze_at_least_one_gold)

结果: 4.2380952381

相关文章

网友评论

      本文标题:pandas 基础入门: 向量化方法

      本文链接:https://www.haomeiwen.com/subject/dbujxxtx.html