美文网首页
购物狂论坛帖子标题词频分析

购物狂论坛帖子标题词频分析

作者: yu008 | 来源:发表于2018-05-14 16:55 被阅读12次

    针对前段时间爬取的购物狂育儿板块帖子,用结巴分词进行分词,并排除无意义的停用词,并对词频结果生成词云图。分析一下大家目前针对小BABY最关注哪些方面。

    import jieba.analyse
    from PIL import Image
    import numpy as np
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud,ImageColorGenerator
    
    f = open('c:/1.txt','r')
    text = f.read()
    
    
    result=jieba.analyse.textrank(text,topK=200,withWeight=True)
    keywords = dict()
    stopword = ["推荐","求助","请问","知道","儿童","请教","没有","问题","需要","记录","大家","分享",
                "适合","方法","重庆","有没有","麻麻","小朋友","看看","牌子","宝妈","摄影",
                "问题","开始","地方","时间","小儿","经验","时间","不吃","妈妈","娃儿","孩子",
                "爸爸","咨询","体验","不能","时候","还有","活动","起来","成长","婴儿","育儿",
                "母婴","进来","父母","新手","家长","亲们","喜欢","东西","东西","出生","妹妹",
                "帮忙","小孩","好用","照片","有点","感觉","免费","应该","准备","好用","娃娃",
                "妈咪","没得","注意","看到","支招","选择","购物狂","不会","出来","婆子",
                "日记","参加","遇到","辣妈","生育","新生儿","美妈","情况","觉得","发现",
                "台历","添加","幼儿","转让","座椅","了解","归来","报告","急求","跪求",
                "朋友","纠结","办法","经历",]
    for i in result:
        if i[0] in stopword:
            pass
        else:
            keywords[i[0]]=i[1]
    print(keywords)
    
    
    image= Image.open('c:/1.jpg')
    graph = np.array(image)
    wc = WordCloud(font_path='./fonts/simhei.ttf',background_color='White',max_words=50,mask=graph)
    wc.generate_from_frequencies(keywords)
    image_color = ImageColorGenerator(graph)
    # plt.imshow(wc)
    # plt.imshow(wc.recolor(color_func=image_color))
    # plt.axis("off")
    # plt.show()
    #plt.savefig('test.jpg',dpi=600)
    wc.to_file('gwk.jpg')
    
    gwk.jpg

    相关文章

      网友评论

          本文标题:购物狂论坛帖子标题词频分析

          本文链接:https://www.haomeiwen.com/subject/cnsjdftx.html