关于python中jieba第三方库的使用

作者: 脏脏的小泥娃 | 来源:发表于2020-06-01 14:48 被阅读0次

jieba库是优秀的中文分词第三方库。

通过

pip install jieba

进行安装。

jieba中文分词的原理：通过中文词库的方式来识别分词。利用中文词库，确定汉字之间的关联概率，汉字间概率大的组成词组，形成分词结果。

jieba分词的三种模式

精确模式:把文本精确的切分开，不存在冗余单词。
全模式：把文本中所有可能的词语都扫描出来，有冗余。
搜索引擎模式：在精确模式基础上，会长词再次切分。
jieba常用函数：
jieba.lcut(s) 精确模式，返回一个列表类型的分词结果。
jieba.lcut(s,cut_all=True) 全模式，返回一个列表类型的分词结果，存在冗余。
jieba.lcut_for_search(s) 搜索引擎模式，返回一个列表类型呢的分词结果，存在冗余。
jieba.add_word(w) 向分词词典增加新词w。
实际掌握jieba.lcut(s)即可

哈姆雷特词频统计实例：
文本链接

#CalHamletV1.py
def getText():
    txt = open("hamlet.txt","r").read()
    txt = txt.lower()
    for ch in '|"#$%&()*+,-./:;<=>?@[\\]^_{|}·~‘’':
        txt = txt.replace(ch," ")
    return txt
hamletTxt = getText()
words = hamletTxt.split()
counts = {}
for word in words:
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
'''
#完成的工作就是对一个列表按照键值对的2个元素的第二个元素进行由大到小排序。
sort()排序方法:lambda用来指定使用哪一个多元选项的列作为排序列，
默认从小到大排,reverse=True返回的是从大到小排

'''
for i in range(10):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word,count))

三国演义人物出场次数统计
文本链接

#CalThreekingdomsV1.py
import jieba
txt = open("threekingdoms.txt","r",encoding="utf-8").read()
excludes = {"将军","却说","荆州",
            "二人","不可","不能",
            "如此","商议","如何",
            "主公","军士","左右",
            "军马","引兵","大喜",
            "天下","次日","东吴",
            "于是","今日","不敢"}
words = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == "诸葛亮" or word == "孔明曰":
        rword = "孔明"
    elif word == "关公" or word == "云长":
        rword = "关羽"
    elif word == "玄德" or word == "玄德曰":
        rword = "刘备"
    elif word == "孟德" or word == "丞相":
        rword = "曹操"
    else:
        rword = word
    counts[rword] = counts.get(word,0) + 1
for word in excludes:
    del counts[word]
items = list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word,count))

网友评论

本文标题：关于python中jieba第三方库的使用

本文链接：https://www.haomeiwen.com/subject/evxszhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

关于python中jieba第三方库的使用

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读