143、python中使用wordcloud包生成词云图

作者: 陈容喜 | 来源:发表于2018-04-02 17:57 被阅读0次

143、python中使用wordcloud包生成词云图
Python数据可视化（十三）：词云图绘制
R语言可视化（二十六）：词云图绘制
Python分析我的简书文章
Python wordcloud生成词云图
python 词云模块：wordcloud
Python jieba分词、词云、文件读取、函数调用、匿名函数
Python制作词云图根据蒙板图像确定形状和文字颜色！
安装wordcloud包
使用wordcloud制作精美词云图

一、Python的wordcloud包在anaconda中安装

根据自己的电脑系统及安装的anaconda版本下载对应wordcloud安装包。

window环境下载地址：<u>http://www.lfd.uci.edu/~gohlke/pythonlibs/#wordcloud</u>

1.png 我自己电脑是win7 64位系统，安装anaconda3，我下载的是：

2.png

然后把下载到的文件放在所执行的目录文件下。

使用pip安装

在终端Anaconda Prompt打开之后：你得先进入这个.whl文件所在的位置，我是放在了C:\notebook文件夹下面，具体你去找自己下载放到了那里！！

输入：pip install wordcloud-1.4.1-cp36-cp36m-win_amd64.whl

二、使用wordcloud包生成词云图

我是参考下面这篇博客来制作词云图，里面有详细介绍wordcloud包的用法

链接：<u>http://blog.csdn.net/u010309756/article/details/67637930</u>

下面是一个要分析的文本文档内容：

How the Word Cloud Generator Works

The layout algorithm for positioning words without overlap is available on GitHub under an open source license as d3-cloud. Note that this is the only the layout algorithm and any code for converting text into words and rendering the final output requires additional development.

As word placement can be quite slow for more than a few hundred words, the layout algorithm can be run asynchronously, with a configurable time step size. This makes it possible to animate words as they are placed without stuttering. It is recommended to always use a time step even without animations as it prevents the browser’s event loop from blocking while placing the words.

The layout algorithm itself is incredibly simple. For each word, starting with the most “important”:

Attempt to place the word at some starting point: usually near the middle, or somewhere on a central horizontal line. If the word intersects with any previously placed words, move it one step along an increasing spiral. Repeat until no intersections are found. The hard part is making it perform efficiently! According to Jonathan Feinberg, Wordle uses a combination of hierarchical bounding boxes and quadtrees to achieve reasonable speeds.

Glyphs in JavaScript

There isn’t a way to retrieve precise glyph shapes via the DOM, except perhaps for SVG fonts. Instead, we draw each word to a hidden canvas element, and retrieve the pixel data.

Retrieving the pixel data separately for each word is expensive, so we draw as many words as possible and then retrieve their pixels in a batch operation.

Sprites and Masks

My initial implementation performed collision detection using sprite masks. Once a word is placed, it doesn't move, so we can copy it to the appropriate position in a larger sprite representing the whole placement area.

The advantage of this is that collision detection only involves comparing a candidate sprite with the relevant area of this larger sprite, rather than comparing with each previous word separately.

Somewhat surprisingly, a simple low-level hack made a tremendous difference: when constructing the sprite I compressed blocks of 32 1-bit pixels into 32-bit integers, thus reducing the number of checks (and memory) by 32 times.

In fact, this turned out to beat my hierarchical bounding box with quadtree implementation on everything I tried it on (even very large areas and font sizes). I think this is primarily because the sprite version only needs to perform a single collision test per candidate area, whereas the bounding box version has to compare with every other previously placed word that overlaps slightly with the candidate area.

Another possibility would be to merge a word’s tree with a single large tree once it is placed. I think this operation would be fairly expensive though compared with the analagous sprite mask operation, which is essentially ORing a whole block.

下面是代码实现部分：

先导入相关包：

3.导入相关包.png

（1）使用背景图片制作词云图片

我使用的背景图片如下：

4.love.jpg 生成词云图：

5.png

6.生成词云图.png （2）不使用背景图片：

7.不使用背景图片.png

源码：


# coding: utf-8

# # python中使用wordcloud包生成词云图

# 我是参考下面这篇博客来制作词云图，里面有详细介绍wordcloud包的用法

#

# 链接：[生成词云之python中WordCloud包的用法](https://blog.csdn.net/u010309756/article/details/67637930)

# 下面是一个要分析的文本文档内容：

#

# How the Word Cloud Generator Works

#

# The layout algorithm for positioning words without overlap is available on GitHub under an open source license as d3-cloud. Note that this is the only the layout algorithm and any code for converting text into words and rendering the final output requires additional development.

#

# As word placement can be quite slow for more than a few hundred words, the layout algorithm can be run asynchronously, with a configurable time step size. This makes it possible to animate words as they are placed without stuttering. It is recommended to always use a time step even without animations as it prevents the browser’s event loop from blocking while placing the words.

#

# The layout algorithm itself is incredibly simple. For each word, starting with the most “important”:

#

# Attempt to place the word at some starting point: usually near the middle, or somewhere on a central horizontal line.

# If the word intersects with any previously placed words, move it one step along an increasing spiral. Repeat until no intersections are found.

# The hard part is making it perform efficiently! According to Jonathan Feinberg, Wordle uses a combination of hierarchical bounding boxes and quadtrees to achieve reasonable speeds.

#

# Glyphs in JavaScript

#

# There isn’t a way to retrieve precise glyph shapes via the DOM, except perhaps for SVG fonts. Instead, we draw each word to a hidden canvas element, and retrieve the pixel data.

#

# Retrieving the pixel data separately for each word is expensive, so we draw as many words as possible and then retrieve their pixels in a batch operation.

#

# Sprites and Masks

#

# My initial implementation performed collision detection using sprite masks. Once a word is placed, it doesn't move, so we can copy it to the appropriate position in a larger sprite representing the whole placement area.

#

# The advantage of this is that collision detection only involves comparing a candidate sprite with the relevant area of this larger sprite, rather than comparing with each previous word separately.

#

# Somewhat surprisingly, a simple low-level hack made a tremendous difference: when constructing the sprite I compressed blocks of 32 1-bit pixels into 32-bit integers, thus reducing the number of checks (and memory) by 32 times.

#

# In fact, this turned out to beat my hierarchical bounding box with quadtree implementation on everything I tried it on (even very large areas and font sizes). I think this is primarily because the sprite version only needs to perform a single collision test per candidate area, whereas the bounding box version has to compare with every other previously placed word that overlaps slightly with the candidate area.

#

# Another possibility would be to merge a word’s tree with a single large tree once it is placed. I think this operation would be fairly expensive though compared with the analagous sprite mask operation, which is essentially ORing a whole block.

# ### 下面是代码实现部分

# In[1]:

#导入wordcloud模块和matplotlib模块

from wordcloud import WordCloud,STOPWORDS,ImageColorGenerator

import matplotlib.pyplot as plt

from scipy.misc import imread

# In[6]:

#读取一个txt文件，把上面的文本文档内容复制到一个叫word.txt的文档中，自定义路径

text = open('D:\\Python\\notebook\\word.txt','r').read()

print(text)

# In[3]:

#使用背景图片制作词云图片

'''

mask : nd-array or None (default=None) //如果参数为空，则使用二维遮罩绘制词云。如果 mask 非空，设置的宽高值将被忽略，遮罩形状被 mask 取代。

       除全白（#FFFFFF）的部分将不绘制，其余部分会都绘制词云。如：bg_pic = imread('读取一张图片.png')，背景图片画布一定要设置为白色（#FFFFFF）,

       然后显示的形状为不是白色的其他颜色。可以用ps工具将自己要显示的形状复制到一个纯白色的画布上再保存，就ok了。

background_color : color value (default=”black”) //背景颜色，如background_color='black',背景颜色为黑色。

scale : float (default=1) //按照比例进行放大画布，如设置为1.5，则长和宽都是原来画布的1.5倍。

generate(text)  //根据文本生成词云

'''

#读入背景图片

bg_pic = imread('D:\\Python\\notebook\\love.jpg')

#生成词云

wordcloud = WordCloud(mask=bg_pic,background_color='black',scale=1.5).generate(text)

# 从背景图片生成颜色值

image_colors = ImageColorGenerator(bg_pic)

#显示词云图片

plt.imshow(wordcloud)

plt.axis('off')

plt.show()

# In[7]:

#不使用背景图片制作词云图

#生成词云

wordcloud = WordCloud(background_color='black',scale=1.5).generate(text)

#显示词云图片

plt.imshow(wordcloud)

plt.axis('off')

plt.show()

# In[5]:

#保存图片

#wordcloud.to_file('D:\\Python\\notebook\\test.jpg')

143、python中使用wordcloud包生成词云图
一、Python的wordcloud包在anaconda中安装根据自己的电脑系统及安装的anaconda版本下载...
Python数据可视化（十三）：词云图绘制
使用wordcloud包绘制词云图参考来源：https://www.python-graph-gallery.c...
R语言可视化（二十六）：词云图绘制
26. 词云图绘制清除当前环境中的变量设置工作目录使用wordcloud2包绘制词云图使用wordclou...
Python分析我的简书文章
上一篇文章中使用Python wordcloud生成词云图，笔者就想着可以用wordcloud来分析以下笔者写的3...
Python wordcloud生成词云图
今天测试python库，词云(wordcloud) 。因为以前看到新闻里面的统计词语的图形，觉得对于掌握核心...
python 词云模块：wordcloud
参考：生成词云之python中WordCloud包的用法https://amueller.github.io/wo...
Python jieba分词、词云、文件读取、函数调用、匿名函数
词云的生成使用wordcloud 库生成词云安装wordcloud 调用wordcloud类，生成词云对象词...
Python制作词云图根据蒙板图像确定形状和文字颜色！
相关阅读： Python使用wordcloud+pillow基于给定图像制作词云 Python自定义词云图形状和文...
安装wordcloud包
前几天生成词云图，安装wordcloud包，结果遇到各种问题，折腾好久，终于安装好了，总结下。常规方法：pip ...
使用wordcloud制作精美词云图
使用wordcloud制作精美词云图一个简单的开始安装库 wordcloud用来绘制词云图，是今天的主角。...