美文网首页
python词频统计实例

python词频统计实例

作者: 狼牙战士 | 来源:发表于2018-01-05 17:33 被阅读0次

项目概述

通过两个Python文件实现一个简单的词频统计。


项目截图.PNG

本工程共有4个文件:

  • file01:要统计的词频文件。
  • maptest.py:MapReduce的第一个阶段:map
  • file02:中间结果保存文件。
  • reducetest.py:MapReduce的第二个阶段:reduce

各个文件内容:

file01文件内容:

We think that could provide quite a buffer for the hormone replacement franchise
Meanwhile Spiros is determined to buffer his family against this uncertainty despite his deep patriotism
Everyone agrees on the destination: lots more pure equity, the highest-quality buffer against losses
The wind whooshed and whined, a buffer against the lonesome quiet of my strange hotel room
Meanwhile Spiros is determined to buffer his family against this uncertainty despite his deep patriotism

maptest.py文件内容:

# wordcount map阶段
"""
1.读取文件file01,将单词依次存入数组。
2.对数组进行排序。
3.将数组中的单词依次写入文件file02。
"""
ss = []
ff = open("file01", "r")
for x in ff.readlines():
    y = x.strip().split(" ")
    for xx in y:
        ss.append(xx)
ff.close()

ss.sort()
gg = open("file02", "w")
for y in ss:
    gg.write(y)
    gg.write('\n')
gg.close()

file02文件内容:

Everyone
Meanwhile
Meanwhile
Spiros
Spiros
The
We
a
a
against
against
against
against
agrees
and
buffer
buffer
buffer
buffer
buffer
could
deep
deep
despite
despite
destination:
determined
determined
equity,
family
family
for
franchise
highest-quality
his
his
his
his
hormone
hotel
is
is
lonesome
losses
lots
more
my
of
on
patriotism
patriotism
provide
pure
quiet
quite
replacement
room
strange
that
the
the
the
the
think
this
this
to
to
uncertainty
uncertainty
whined,
whooshed
wind

reducetest.py文件内容:

# wordcount reduce阶段

cur_word = None
sum = 0

ff = open("file02", "r")
for line in ff.readlines():
    x = line.strip()
    if cur_word == None:
        cur_word = x
    if cur_word != x:
        print('\t'.join([cur_word, str(sum)]))
        cur_word = x
        sum = 0
    sum += 1
print('\t'.join([cur_word, str(sum)]))

相关文章

  • python词频统计实例

    项目概述 通过两个Python文件实现一个简单的词频统计。 本工程共有4个文件: file01:要统计的词频文件。...

  • python统计词频

    一、最终目的 统计四六级真题中四六级词汇出现的频率,并提取对应的例句,最终保存到SQL数据库中。 二、处理过程 1...

  • python统计词频

    一、使用re库进行识别 1、代码 2、参考 python--10行代码搞定词频统计python:统计历年英语四六级...

  • python 词频统计

    """Count words.""" def count_words(s, n): """Return the...

  • Python | 词频统计

    最近工作蛮忙的,就简单练习一下python基础吧。 本周的练习是词频统计,主要使用了以下几个函数: text.sp...

  • Python词频统计

    场景: 现在要统计一个文本中的词频,然后按照频率的降序进行排列

  • Python词频统计

    1.合并数据文件 2.词频统计

  • Python 进行词频统计

    1. 利用字典map实现 2.利用collections模块中的Counter对象 3. 算法:...

  • Python实现词频统计

    《百年孤独》词频统计 学习更多?欢迎关注本人公众号:Python无忧

  • 教你用Python进行中文词频统计

    Python是用于数据挖掘的利器 用Python可以用来做很多很好玩的东西,下面就来用Python来进行词频统计 ...

网友评论

      本文标题:python词频统计实例

      本文链接:https://www.haomeiwen.com/subject/otpsnxtx.html