python 高级进阶之词频统计问题

作者: 与蟒唯舞 | 来源:发表于2017-02-16 10:29 被阅读216次

python 高级进阶之词频统计问题
lupengday03
python统计词频
python统计词频
python 词频统计
Python | 词频统计
Python词频统计
Python词频统计
python词频统计实例
Python 进行词频统计

现有列表如下：
[1, 7, 10, 4, 9, 10, 9, 8, 5, 8]
希望统计出各个元素出现的次数，最终得到一个这样的结果：{8: 2, 9: 2...}，即：{某个元素: 出现的次数...}。

方法一：
首先要将这些元素作为字典的键，建立一个初始值为0的字典：

>>> from random import randint
>>> data = [randint(1,10) for x in xrange(10)]
>>> data
[1, 7, 10, 4, 9, 10, 9, 8, 5, 8]
>>> d = dict.fromkeys(data, 0)
>>> d
{1: 0, 4: 0, 5: 0, 7: 0, 8: 0, 9: 0, 10: 0}
>>> for x in data:
>>>     d[x] += 1
>>> d
{1: 1, 4: 1, 5: 1, 7: 1, 8: 2, 9: 2, 10: 2}

方法二：
利用 collections 模块中的 Counter ，Counter 是一个简单的计数器：

>>> from collections import Counter
>>> c = Counter(data)
>>> c
Counter({1: 1, 4: 1, 5: 1, 7: 1, 8: 2, 9: 2, 10: 2})
>>> isinstance(c, dict)
True
# 该 Counter 对象是 dict 的子类，所以可以通过键来访问对应值
>>> c[1]
1
# most_common(n)，直接统计出前n个最高词频
>>> c.most_common(2)
[(8, 2), (9, 2)]

参考文档：

class Counter(__builtin__.dict)
 |  Dict subclass for counting hashable items.  Sometimes called a bag
 |  or multiset.  Elements are stored as dictionary keys and their counts
 |  are stored as dictionary values.
 |
 |  >>> c = Counter('abcdeabcdabcaba')  # count elements from a string
 |
 |  >>> c.most_common(3)                # three most common elements
 |  [('a', 5), ('b', 4), ('c', 3)]
 |  >>> sorted(c)                       # list all unique elements
 |  ['a', 'b', 'c', 'd', 'e']
 |  >>> ''.join(sorted(c.elements()))   # list elements with repetitions
 |  'aaaaabbbbcccdde'
 |  >>> sum(c.values())                 # total of all counts
 |  15
 |
 |  >>> c['a']                          # count of letter 'a'
 |  5
 |  >>> for elem in 'shazam':           # update counts from an iterable
 |  ...     c[elem] += 1                # by adding 1 to each element's count
 |  >>> c['a']                          # now there are seven 'a'
 |  7
 |  >>> del c['b']                      # remove all 'b'
 |  >>> c['b']                          # now there are zero 'b'
 |  0
 |
 |  >>> d = Counter('simsalabim')       # make another counter
 |  >>> c.update(d)                     # add in the second counter
 |  >>> c['a']                          # now there are nine 'a'
 |  9
 |
 |  >>> c.clear()                       # empty the counter
 |  >>> c
 |  Counter()
 |
 |  Note:  If a count is set to zero or reduced to zero, it will remain
 |  in the counter until the entry is deleted or the counter is cleared:
 |
 |  >>> c = Counter('aaabbc')
 |  >>> c['b'] -= 2                     # reduce the count of 'b' by two
 |  >>> c.most_common()                 # 'b' is still in, but its count is zero |  [('a', 3), ('c', 1), ('b', 0)]

python 高级进阶之词频统计问题
现有列表如下：[1, 7, 10, 4, 9, 10, 9, 8, 5, 8]希望统计出各个元素出现的次数，最终得...
lupengday03
字典字典操作的方法词频统计高级字典 pandas
python统计词频
一、最终目的统计四六级真题中四六级词汇出现的频率，并提取对应的例句，最终保存到SQL数据库中。二、处理过程 1...
python统计词频
一、使用re库进行识别 1、代码 2、参考 python--10行代码搞定词频统计python：统计历年英语四六级...
python 词频统计
"""Count words.""" def count_words(s, n): """Return the...
Python | 词频统计
最近工作蛮忙的，就简单练习一下python基础吧。本周的练习是词频统计，主要使用了以下几个函数： text.sp...
Python词频统计
场景：现在要统计一个文本中的词频，然后按照频率的降序进行排列
Python词频统计
1.合并数据文件 2.词频统计
python词频统计实例
项目概述通过两个Python文件实现一个简单的词频统计。本工程共有4个文件： file01：要统计的词频文件。...
Python 进行词频统计
1. 利用字典map实现 2.利用collections模块中的Counter对象 3. 算法：...

网友评论

5c39c691b65a:老师好

我的代码：
print '所有数字统计:',collections.Counter(all_nums).most_common()

结果输出为：

所有数字统计:[(u'0', 10), (u'9', 10), (u'8', 10), (u'2', 7)]

我不想要出现的次数的值，即只需要前面的数字。。0 9 8 2（不要后面的次数）

zpx=[]
zpx = collections.Counter(all_nums).most_common()
print zpx[0][0]
得到输出结果：0

我想合并输出 0982 要怎么写呢？

5c39c691b65a:@与蟒唯舞
老师好。经测试。得到结果如下：
[u'0', u'9', u'8', u'2']
怎么才能进一步把列表结果输出为：0982

我自行添加如下代码测试，

zpx=collections.Counter(all_nums).most_common()
print zpx[0][0]+zpx[1][0]+zpx[2][0]+zpx[3][0]

能得到结果：0982
但这个代码看上去很笨拙，而且有一个“Bug”
当要统计的数字里不够4个，只有1-3个的话，代码就会报错、、

5c39c691b65a: @与蟒唯舞好的。谢谢老师。我试试

与蟒唯舞:可以试试列表推导，[x[0] for x in collections.Counter(all_nums).most_common()]

5c39c691b65a:老师好

我的代码：
print '所有数字统计:',collections.Counter(all_nums).most_common()

结果输出为：

所有数字统计:[(u'0', 10), (u'9', 10), (u'8', 10), (u'2', 7)]

我不想要出现的次数的值，即只需要前面的数字。。0 9 8 2（不要后面的次数）

zpx=[]
zpx = collections.Counter(all_nums).most_common()
print zpx[0][0]
得到输出结果：0

我想合并输出 0982 要怎么写呢？
5c39c691b65a:@与蟒唯舞
老师好。经测试。得到结果如下：
[u'0', u'9', u'8', u'2']
怎么才能进一步把列表结果输出为：0982

我自行添加如下代码测试，

zpx=collections.Counter(all_nums).most_common()
print zpx[0][0]+zpx[1][0]+zpx[2][0]+zpx[3][0]

能得到结果：0982
但这个代码看上去很笨拙，而且有一个“Bug”
当要统计的数字里不够4个，只有1-3个的话，代码就会报错、、

5c39c691b65a: @与蟒唯舞好的。谢谢老师。我试试
与蟒唯舞:可以试试列表推导，[x[0] for x in collections.Counter(all_nums).most_common()]

python 高级进阶之词频统计问题

相关文章

python 高级进阶之词频统计问题

lupengday03

python统计词频

python统计词频

python 词频统计

Python | 词频统计

Python词频统计

Python词频统计

python词频统计实例

Python 进行词频统计

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

python学习

Python

python开发

算法

首页投稿（暂停使用，暂停投稿）

程序员

Python 运维

Python语言与信息数据获取和机器学习