美文网首页
pandas中groupby("x").count和groupb

pandas中groupby("x").count和groupb

作者: 橘子kire | 来源:发表于2020-03-02 22:43 被阅读0次

1、官方文档
ndarray.size
Number of elements in the array.矩阵中元素的个数。

s = pd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3

df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df.size
4

2、size包括NaN值,count不包括:

In [46]:
df = pd.DataFrame({'a':[0,0,1,2,2,2], 'b':[1,2,3,4,np.NaN,4], 'c':np.random.randn(6)})
df

Out[46]:
   a   b         c
0  0   1  1.067627
1  0   2  0.554691
2  1   3  0.458084
3  2   4  0.426635
4  2 NaN -2.238091
5  2   4  1.256943

In [48]:
print(df.groupby(['a'])['b'].count())
print(df.groupby(['a'])['b'].size())

a
0    2
1    1
2    2
Name: b, dtype: int64

a
0    2
1    1
2    3
dtype: int64 

3、即使数据没有NA值,count()的结果也更加冗长

In [114]:
grouped = fec_mrbo.groupby(['cand_nm',labels])
grouped.size().unstack(0)
Out[114]:
cand_nm Obama, Barack   Romney, Mitt
contb_receipt_amt       
(0, 1]  493.0   77.0
(1, 10] 40070.0 3681.0
(10, 100]   372280.0    31853.0
(100, 1000] 153991.0    43357.0
(1000, 10000]   22284.0 26186.0
(10000, 100000] 2.0 1.0
(100000, 1000000]   3.0 NaN
(1000000, 10000000] 4.0 NaN

In [115]:
grouped = fec_mrbo.groupby(['cand_nm',labels])
grouped.count().unstack(0)
Out[115]:

cmte_id cand_id contbr_nm   contbr_city contbr_st   ... memo_cd memo_text   form_tp file_num    parties
cand_nm Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    ... Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt    Obama, Barack   Romney, Mitt
contb_receipt_amt                                                                                   
(0, 1]  493.0   77.0    493.0   77.0    493.0   77.0    493.0   77.0    493.0   77.0    ... 31.0    1.0 138.0   10.0    493.0   77.0    493.0   77.0    493.0   77.0
(1, 10] 40070.0 3681.0  40070.0 3681.0  40070.0 3681.0  40070.0 3681.0  40070.0 3681.0  ... 4645.0  14.0    4781.0  53.0    40070.0 3681.0  40070.0 3681.0  40070.0 3681.0
(10, 100]   372280.0    31853.0 372280.0    31853.0 372280.0    31853.0 372276.0    31853.0 372280.0    31853.0 ... 33331.0 74.0    33789.0 236.0   372280.0    31853.0 372280.0    31853.0 372280.0    31853.0
(100, 1000] 153991.0    43357.0 153991.0    43357.0 153991.0    43357.0 153991.0    43355.0 153987.0    43357.0 ... 31674.0 347.0   31897.0 849.0   153991.0    43357.0 153991.0    43357.0 153991.0    43357.0
(1000, 10000]   22284.0 26186.0 22284.0 26186.0 22284.0 26186.0 22284.0 26185.0 22284.0 26186.0 ... 16622.0 640.0   16693.0 2217.0  22284.0 26186.0 22284.0 26186.0 22284.0 26186.0
(10000, 100000] 2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0 ... 0.0 1.0 1.0 1.0 2.0 1.0 2.0 1.0 2.0 1.0
(100000, 1000000]   3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN ... 3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN
(1000000, 10000000] 4.0 NaN 4.0 NaN 4.0 NaN 4.0 NaN 4.0 NaN ... 4.0 NaN 4.0 NaN 4.0 NaN 4.0 NaN 4.0 NaN

相关文章

网友评论

      本文标题:pandas中groupby("x").count和groupb

      本文链接:https://www.haomeiwen.com/subject/eijwkhtx.html