美文网首页
python学习---单词计数统计

python学习---单词计数统计

作者: 橘颂betty | 来源:发表于2017-11-09 06:48 被阅读0次

思路整理

1.打开文件,处理其中的每一行
2.将每个词添加到字典中更新其计数
3.显示输出,在此例中按频率高低进行排列

使用的python函数

Dict.items():所有键值对的元组列表
str.strip():用于移除字符串头尾指定的字符(默认为空格)
list.append(obj):用于在列表末尾添加新的对象
str.split():通过指定分隔符对字符串进行切片

函数定义

addWord:将每个单词添加到字典中
processLine:要处理行,还要完成剔除不同的字符,分割出单词等工作
prettyPrint:格式化的显示,将针对每种情况,将显示功能分离出来
main:使用主函数为主程序

addWord

def addWord(word,wordCountDict):
    '''Update the word frequency:word is the key,frequency is the value.'''
    if word in wordCountDict:
        wordCountDict[word] += 1
    else:
        wordCountDict[word] = 1

processLine

import sting
def processLine(line,wcDict):
    '''Process the line to get lowercase words to add to the dictionary.'''
    line = line.strip()
    wordList = line.split()
    for word in wordList:
        #ignore the '--' that is in file
       if word != '--':
           word = word.lower()
           word = word.strip()
           #get commas,periods and punctuation out as well

           word  = word.strip(string.punctuation)
           addWord(word,wcDict)

prettyPrint

def prettyPrint(wcDict):
    '''Print nicely from highest to lowest frequency.'''
    #create a list of tuple,(value,key)

    #valKeyList = [(val,key) for key,val in d.itemw()]

    valKeyList = []

    for key,val in wcDict.items():
        valKeyList.append((val,key))

    #sort method sorts on list's first element,here the frequency.
    #Reverse to get biggest first

    valKeyList.sort(reverse=True)
    print '%-10s%10s' %('Word','Count')
    print '_'*21
    for val,key in valKeyList:
        print '%-12s     %3d' %(key,val)

main

def main():
    wcDict = {}
    fobj = open('gettysbury.txt','r')
    for line in fobj:
        processLine(line,wcDict)
    print 'Length of the dictionary:',len(wcDict)
    prettyPrint(wcDict)

最终代码如下:

# __author__ == 'xjiao'
# -*- coding:utf-8 -*-

import string

def addWord(word,wordCountDict):
    '''Update the word frequency:word is the key,frequency is the value.'''

    if word in wordCountDict:

        wordCountDict[word] += 1 #如果单词在wordCountDict词典已存在,数量加1
    else:
        wordCountDict[word] = 1 #如果不存在,插入一条记录

def processLine(line,wcDict):
    '''Process the line to get lowercase words to add to the dictionary.'''

    line = line.strip() #去掉每行的空白
    wordList = line.split()  #去掉空白字符,分割字符

    for word in wordList:
        #ignore the '--' that is in file
        if word != '--':
            word = word.lower()
            word = word.strip()
           #get commas,periods and punctuation out as well

            word  = word.strip(string.punctuation)
            addWord(word,wcDict)


def prettyPrint(wcDict):
    '''Print nicely from highest to lowest frequency.'''
    #create a list of tuple,(value,key)

    #valKeyList = [(val,key) for key,val in d.itemw()]

    valKeyList = []

    for key,val in wcDict.items():
        valKeyList.append((val,key))

    #sort method sorts on list's first element,here the frequency.
    #Reverse to get biggest first

    valKeyList.sort(reverse=True)
    print '%-10s%10s' %('Word','Count')
    print '_'*21
    for val,key in valKeyList:
            print '%-12s     %3d' %(key,val)

def main():
    wcDict = {}
    fobj = open('gettysburg.txt','r')
    for line in fobj:
        processLine(line,wcDict)
    print 'Length of the dictionary:',len(wcDict)
    prettyPrint(wcDict)


if __name__ == "__main__":
    main()

输出结果如下:

D:\python\python.exe E:/python_project/gettysburg.py
Length of the dictionary: 136
Word           Count
_____________________
the               15
that              11
we                10
to                 9
here               9
of                 6
and                6
a                  6
nation             5
it                 5
have               5
in                 4
for                4
dedicated          4
who                3
us                 3
this               3
they               3
so                 3
shall              3
people             3
is                 3
great              3
dead               3
cannot             3
are                3
which              2
what               2
war                2
these              2
rather             2
our                2
or                 2
not                2
new                2
men                2
long               2
living             2
gave               2
from               2
field              2
far                2
devotion           2
dedicate           2
conceived          2
can                2
but                2
be                 2
years              1
world              1
work               1
will               1
whether            1
vain               1
upon               1
unfinished         1
thus               1
those              1
their              1
testing            1
task               1
take               1
struggled          1
should             1
seven              1
sense              1
score              1
say                1
resting-place       1
resolve            1
remember           1
remaining          1
proposition        1
proper             1
power              1
portion            1
perish             1
on                 1
now                1
note               1
nor                1
nobly              1
never              1
might              1
met                1
measure            1
lives              1
live               1
little             1
liberty            1
last               1
larger             1
increased          1
honored            1
highly             1
hallow             1
ground             1
government         1
full               1
freedom            1
four               1
fought             1
forth              1
forget             1
fitting            1
final              1
fathers            1
equal              1
engaged            1
endure             1
earth              1
do                 1
died               1
did                1
detract            1
created            1
continent          1
consecrated        1
consecrate         1
come               1
civil              1
cause              1
by                 1
brought            1
brave              1
birth              1
before             1
battle             1
as                 1
any                1
altogether         1
all                1
ago                1
advanced           1
add                1
above              1

Process finished with exit code 0

相关文章

  • python学习---单词计数统计

    思路整理 1.打开文件,处理其中的每一行2.将每个词添加到字典中更新其计数3.显示输出,在此例中按频率高低进行排列...

  • Python一键读取文件中英文单词数量

    一、 序言 今天用Python来试试,对一个文件里面的英文单词数量进行快速统计,告别传统计数方式。 目标文件 我也...

  • 书籍:python数据科学傻瓜书 Python for Data

    简介 学习Python编程和统计数据的快捷方法 Python是一种通用编程语言,创建于20世纪80年代后期 - 以...

  • Python统计数组所有值个数与找出数组重复的值

    Python统计数组所有值个数与找出数组重复的值 输出结果 结果1 结果2

  • 大数据带你解读图书馆借阅情况

    统计方法 数据来自学校官网,使用python爬虫抓取数据,统计数据截至2017/4/9,使用excel及魔镜进行数...

  • 通过腾讯云 Serverless Regsitry 快速开发与部

    在学习 MapReduce 的过程中,不少人接触的第一个项目就是单词计数。单词计数通过两个函数 Map 和 Red...

  • 统计分析基本知识

    一:统计要从反应全面情况的统计数字着手。 1:统计分析要从分析统计数字着手。 2:统计分析自始至终都要根据统计数字...

  • wc

    wc 命令 命令简介: 简单小巧的计数工具,wc命令用于统计并输出一个文件中行、单词和字节的数目.

  • python 学习笔记

    python操作excel计算项目代码总量批量更改名字读取.txt文件单词的总量单词出现的频率统计字母在字符串中出...

  • 统计数量

    终于解套。 有口诀,底部放量要跟,跟错也要跟,很对。但是如何跟,在什么位置去跟? 这次在操作苏利有了经验,从放量再...

网友评论

      本文标题:python学习---单词计数统计

      本文链接:https://www.haomeiwen.com/subject/tinumxtx.html