2018-10-19三国演义词频统计

作者: 叛逆闲人 | 来源:发表于2018-10-19 12:52 被阅读0次

2018-10-19三国演义词频统计
用Py做文本分析3：制作词云图
词频统计
词频统计
辽经干python 元组和字典（2）
统计词频并按词频排序
开启自学人生
python统计词频
python统计词频
bash统计词频

使用jieba分词库相关知识，完成下列两题：

（1）查找出“threekingdoms.txt”文件中出现频率前十位的词汇

import jieba
txt=open("threekingdoms.txt","rb").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    else:
        counts[word] = counts.get(word,0) + 1
items =  list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(15):
    word,count = items[i]
    print("{0:<10}{1:>5}".format(word,count))

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\15228\AppData\Local\Temp\jieba.cache
Loading model cost 1.518 seconds.
Prefix dict has been built succesfully.
1586
曹操 953
孔明 836
将军 772
却说 656
玄德 586
关公 510
丞相 491
二人 469
不可 440
荆州 425
玄德曰 390
孔明曰 390
不能 383
如此 378

（2）统计出“threekingdoms.txt”文件 “关羽”、“曹操”、“诸葛亮”、“刘备” 等人名出现的次数。

import jieba
excludes={"将军","却说","荆州","二人","不可","不能","如此"}
txt=open("threekingdoms.txt","rb").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == "诸葛亮" or word =="孔明曰":
        rword="孔明"
    elif word == "关公" or word =="云长":
        rword="关羽"
    elif word == "玄德" or word =="玄德曰":
        rword="刘备"
    elif word == "孟德" or word =="丞相":
        rword="曹操"
    else:
        rword = word
    counts[rword] = counts.get(rword,0) + 1
for word in excludes:
    del(counts[word])
items =  list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(5):
    word,count = items[i]
    print("{0:<10}{1:>5}".format(word,count))

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\15228\AppData\Local\Temp\jieba.cache
Loading model cost 0.680 seconds.
Prefix dict has been built succesfully.
1586
曹操 1451
孔明 1383
刘备 1253
关羽 784

网友评论

本文标题：2018-10-19三国演义词频统计

本文链接：https://www.haomeiwen.com/subject/gyyyzftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

2018-10-19三国演义词频统计

使用jieba分词库相关知识，完成下列两题：

（1）查找出“threekingdoms.txt”文件中出现频率前十位的词汇

（2）统计出“threekingdoms.txt”文件 “关羽”、“曹操”、“诸葛亮”、“刘备” 等人名出现的次数。

相关文章

2018-10-19三国演义词频统计

用Py做文本分析3：制作词云图

词频统计

词频统计

辽经干python 元组和字典（2）

统计词频并按词频排序

开启自学人生

python统计词频

python统计词频

bash统计词频

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读