一、安装
pip install jieba
pip install snownlp # 使用这个,建议使用Python3环境
pip install pypinyin
分词:
jieba分词
# jieba分词
>>> import jieba
>>> text = "我说我应该好好学习"
>>> cutafter = list(jieba.cut(text))
Building prefix dict from the default dictionary ...
Dumping model to file cache c:\users\ztdn00\appdata\local\temp\jieba.cache
Loading model cost 5.820 seconds.
Prefix dict has been built succesfully.
>>> print cutafter
[u'\u6211', u'\u8bf4', u'\u6211', u'\u5e94\u8be5', u'\u597d\u597d\u5b66\u4e60']
>>> for t in cutafter:
print t
我
说
我
应该
好好学习
>>>
snownlp 分词,Python3的环境下可以正常分词
# snownlp 分词
>>> import snownlp
>>> t = "我说我应该好好学习"
>>> sn = snownlp.SnowNLP(t).words
>>> print(sn)
['我', '说', '我', '应该', '好好', '学习']
>>>
Python2 环境下是酱紫的:
>>> import snownlp
>>> t = "我说我应该好好学习"
>>> print snownlp.SnowNLP(t).words
['\xce\xd2\xcb\xb5\xce\xd2\xd3\xa6\xb8\xc3\xba\xc3\xba\xc3\xd1\xa7\xcf\xb0']
>>>
可以看出分词是没成功的哈
网友评论