起因是一家实习公司的电话面试中,问到我有没有python可视化的实战经验。在这方面我完全没有经历。因为感觉公司岗位都不水,面试时间约到第二天的下午。于是我秉承着充分准备下总是不吃亏的信念,动也没动的做了4个小时,没想到还真的弄出来了
下面是我的代码和最终的成果。(纯属个人娱乐,侵权删)
#sjy实战 情感分析
import numpy as np
from snownlp import SnowNLP
import matplotlib.pyplot as plt
import imageio
import jieba
f = open('comment.txt', 'r', encoding='UTF-8')
list = f.readlines()
sentimentslist = []
for i in list:
s = SnowNLP(i)
# print s.sentiments
sentimentslist.append(s.sentiments)
plt.hist(sentimentslist, bins=np.arange(0, 1, 0.01), facecolor='b')
plt.xlabel('Sentiments Probability')
plt.ylabel('Quantity')
plt.title('Analysis of Sentiments')
plt.show()#以上部分是情感分析,画出的图形 接近1.0为正面情绪较多 接近0 负面较多
analysis.png
#词云
#coding=utf-8
import matplotlib.pyplot as plt
from scipy.misc import imread
from wordcloud import WordCloud
import jieba, codecs
from collections import Counter
text = codecs.open('comment.txt', 'r', encoding='utf-8').read()
text_jieba = jieba.cut(text)
#去停用词
# 创建停用词列表
def stopwordslist():
stopwords = [line.strip() for line in open('stopwords.txt',encoding='UTF-8').readlines()]
return stopwords
# 对句子进行中文分词
def seg_depart(sentence):
# 对文档中的每一行进行中文分词
sentence_depart = jieba.cut(sentence.strip())
# 创建一个停用词列表
stopwords = stopwordslist()
# 输出结果为outstr
outstr = ''
# 去停用词
for word in sentence_depart:
if word not in stopwords:
if word != '\t':
outstr += word
outstr += " "
return outstr
# 给出文档路径
filename = "comment.txt"
outfilename = "clean.txt"
inputs = open(filename, 'r', encoding='UTF-8')
outputs = open(outfilename, 'w', encoding='UTF-8')
# 将输出结果写入ou.txt中
for line in inputs:
line_seg = seg_depart(line)
outputs.write(line_seg + '\n')
outputs.close()
inputs.close()
print("删除停用词和分词成功!!!")
textc = codecs.open('clean.txt', 'r', encoding='utf-8').read()
textc_jieba = jieba.cut(textc)
c = Counter(textc_jieba) # 计数
word = c.most_common(800) # 取前500
bg_pic = imageio.imread('bg.png')
wc = WordCloud(
font_path= r'C:\Users\dell\Desktop\FZMWFont.TTF', # 指定中文字体
background_color='white', # 设置背景颜色
max_words=2000, # 设置最大显示的字数
mask=bg_pic, # 设置背景图片
max_font_size=200, # 设置字体最大值
random_state=20 # 设置多少种随机状态,即多少种配色
)
wc.generate_from_frequencies(dict(word)) # 生成词云
wc.to_file('resultc.png')
# show
plt.imshow(wc)
plt.axis("off")
plt.figure()
plt.imshow(bg_pic, cmap=plt.cm.gray)
plt.axis("off")
plt.show()
最开始做出来的第一版词云是这样的(最真实且毫无意义的一手数据可视化)
result o.png后来经过stopwords停用词库的进一步修饰和筛选,最终得出的结果大概长这样
resultc.png bg.png祝大家bug少少!
网友评论