单词数统计

作者: MA木易YA | 来源:发表于2018-11-29 21:11 被阅读15次

splitlines函数可以对文本中的换行符进行去除操作，然后利用sub函数替换文本中的符号，对末尾为-的单词进行拼接，读取到新的单词即保存到字典中并对数目进行+1，最后对字典的长度以及键/值进行打印

words.txt

My Summer Holiday
Summer holiday is coming.I am going to do many things that I want to do. For exampie,first I will jion a soccer club,because I like playing soccer.During the summer holiday, I want to practice more to improve my skills.Second I will go to my grandma's house,because I miss her very much.I want to stay with her for several days.Last I will help my mother do some housework. She was really tired when I was go to school. Except for taking care of me, she also has to work. Therefore, I want to help her in the holidays.What will you do on Summer Holiday?

words_count.py

import re
def get_word_frequencies(file_name):
    dic = {}
    txt = open(file_name, 'r').read().splitlines()

    n=0
    for line in txt:
        print(line)
        line = re.sub(r'[.?!,""/]', ' ', line)   #要替换的标点符号，英文字符可能出现的
        line = re.sub(r' - ', ' ', line) #替换单独的‘-’
        for word in line.split():

            #当一行的最后一个字符是-的时候，需要跟下一个英文字符串联起来构成单词
            if word[-1] =='-':
                    m=word[:-1]
                    n=1
                    break
            if n==1:
                word=m+word
                n=0
            # print(word)
            dic.setdefault(word.lower(), 0)  #不区分大小写
            dic[word.lower()] += 1
    print(dic)
    print("单词总数:", len(dic))

if __name__ == '__main__':
    get_word_frequencies('words.txt')

更多代码详情参考我的Github

网友评论

本文标题：单词数统计

本文链接：https://www.haomeiwen.com/subject/gudqcqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

单词数统计

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读