python词频统计实例

作者: 狼牙战士 | 来源:发表于2018-01-05 17:33 被阅读0次

python词频统计实例
python统计词频
python统计词频
python 词频统计
Python | 词频统计
Python词频统计
Python词频统计
Python 进行词频统计
Python实现词频统计
教你用Python进行中文词频统计

项目概述

通过两个Python文件实现一个简单的词频统计。

项目截图.PNG

本工程共有4个文件：

file01：要统计的词频文件。
maptest.py：MapReduce的第一个阶段：map
file02：中间结果保存文件。
reducetest.py：MapReduce的第二个阶段：reduce

各个文件内容：

file01文件内容：

We think that could provide quite a buffer for the hormone replacement franchise
Meanwhile Spiros is determined to buffer his family against this uncertainty despite his deep patriotism
Everyone agrees on the destination: lots more pure equity, the highest-quality buffer against losses
The wind whooshed and whined, a buffer against the lonesome quiet of my strange hotel room
Meanwhile Spiros is determined to buffer his family against this uncertainty despite his deep patriotism

maptest.py文件内容：

# wordcount map阶段
"""
1.读取文件file01，将单词依次存入数组。
2.对数组进行排序。
3.将数组中的单词依次写入文件file02。
"""
ss = []
ff = open("file01", "r")
for x in ff.readlines():
    y = x.strip().split(" ")
    for xx in y:
        ss.append(xx)
ff.close()

ss.sort()
gg = open("file02", "w")
for y in ss:
    gg.write(y)
    gg.write('\n')
gg.close()

file02文件内容：

Everyone
Meanwhile
Meanwhile
Spiros
Spiros
The
We
a
a
against
against
against
against
agrees
and
buffer
buffer
buffer
buffer
buffer
could
deep
deep
despite
despite
destination:
determined
determined
equity,
family
family
for
franchise
highest-quality
his
his
his
his
hormone
hotel
is
is
lonesome
losses
lots
more
my
of
on
patriotism
patriotism
provide
pure
quiet
quite
replacement
room
strange
that
the
the
the
the
think
this
this
to
to
uncertainty
uncertainty
whined,
whooshed
wind

reducetest.py文件内容：

# wordcount reduce阶段

cur_word = None
sum = 0

ff = open("file02", "r")
for line in ff.readlines():
    x = line.strip()
    if cur_word == None:
        cur_word = x
    if cur_word != x:
        print('\t'.join([cur_word, str(sum)]))
        cur_word = x
        sum = 0
    sum += 1
print('\t'.join([cur_word, str(sum)]))

网友评论

本文标题：python词频统计实例

本文链接：https://www.haomeiwen.com/subject/otpsnxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python词频统计实例

项目概述

各个文件内容：

相关文章

python词频统计实例

python统计词频

python统计词频

python 词频统计

Python | 词频统计

Python词频统计

Python词频统计

Python 进行词频统计

Python实现词频统计

教你用Python进行中文词频统计

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读