1.安装或加载R包
if (!requireNamespace("wordcloud2", quietly = TRUE))
install.packages("wordcloud2")
library(wordcloud2)
2.利用wordcloud2自带数据熟悉作图
a.绘制星形(shape="star")词云
wordcloud2(demoFreq, size = 1,shape = 'star')
![](https://img.haomeiwen.com/i17474796/b96f9efda7d94eb3.png)
b.绘制五角形(shape="pentagon")词云
wordcloud2(demoFreq, size = 1,shape = 'pentagon')
![](https://img.haomeiwen.com/i17474796/7baf7b19ab92a70f.png)
当然也支持中文词云的绘制
wordcloud2(demoFreqC, size = 1,shape = 'pentagon')
![](https://img.haomeiwen.com/i17474796/5e4332451f7fbcfe.png)
3.自定义图形
a.绘制鸽子词云
batman = system.file("examples/t.png", package = "wordcloud2")
wordcloud2(
demoFreqC,
figPath = batman,
size = 1,
color = "skyblue",
)
![](https://img.haomeiwen.com/i17474796/bb669b4303d41460.png)
b.绘制心形词云
wordcloud2(demoFreqC,
figPath = "1.png",#图片相对路径
size = 1,
color = 'random-dark')
![](https://img.haomeiwen.com/i17474796/a7abc0562d650e18.png)
c. 自定义颜色
ordcloud2(demoFreqC, figPath = "1.png", size = 1,
color = ifelse(demoFreqC$V1>1000,"red","skyblue")
)
![](https://img.haomeiwen.com/i17474796/127ae11f17475b7f.png)
以上只是个热身,下面我们开始对300首唐诗进行简单挖掘。先来看下300首唐诗的前三首,就有两手是杜甫写给李白的
![](https://img.haomeiwen.com/i17474796/e765e3121d6f92a2.png)
1.初步探索各作者作诗频率
#预览诗词
View(poem)
#提取作者名:诗名
author_title = poem[, 1][grepl("^[0-9]", poem[, 1], perl = T)]
#提取作者名
authors = gsub("[0-9]*(.{2,3})\\:\\S+", "\\1", author_title, perl = T)
#做成dataFrame
authors_dataframe = as.data.frame(table(authors))
#预览包含作者名,以及写诗的数目
View(authors_dataframe)
![](https://img.haomeiwen.com/i17474796/7ff2f9d4f916fc12.png)
a. 发表诗词大于10次的作者
x1=authors_dataframe[authors_dataframe$Freq>10,]
pielab=paste(x1$authors,"(",round(x1$Freq/sum(x1$Freq)*100,2),"%)")
pie(x1$Freq,labels = pielab
,col = c("skyblue", "rosybrown1", "aquamarine",
"gray82", "wheat1", "lightgreen")
,radius = 1.1
,cex.lab=1
,main = "10首以上诗词的作者分布"
,angle=90
,border=NA)
![](https://img.haomeiwen.com/i17474796/044e775efe563573.png)
a. 发表诗词5-10次的作者
x1 = authors_dataframe[authors_dataframe$Freq > 5 &
authors_dataframe$Freq < 10, ]
pielab = paste(x1$authors, "(", round(x1$Freq / sum(x1$Freq) * 100, 2), "%)")
pie(
x1$Freq,
labels = pielab,
col = c(
"skyblue",
"rosybrown1",
"aquamarine",
"gray82",
"wheat1",
"lightgreen"
),
radius = 1.1,
cex.lab = 1,
main = "5~10首诗词的作者分布"
,angle = 90
,border = NA
)
![](https://img.haomeiwen.com/i17474796/7abe3aa1937e7634.png)
2. 选取杜甫--李白的诗词进行挖掘
a. 对诗词进行预处理
利用perl语言整理成这个样子,第一列:作者+诗名;第二列:诗词内容。一共306行,也就是说有306首诗词:
![](https://img.haomeiwen.com/i17474796/4a92a7f34fef24ab.png)
b. 词频处理包的安装与加载
jieba(结巴),没错就是这个处理词频的包--结巴包。
if (!requireNamespace("jiebaR", quietly = TRUE))
install.packages("jiebaR")
library(jiebaR)
c.看看杜甫的诗
#选择作者是杜甫的行
author_DuFU = poems[, 2][grepl("杜甫", poems[, 1], perl = T)]
#进行字符转换
author_DuFU = as.character(author_DuFU)
#利用jieba包产生一个对象
wk = worker()
#统计词频
x1 = freq(segment(author_DuFU, wk))
#作图
wordcloud2(
x1,
size = 0.4,
figPath = "dufu.png",
color = ifelse(x1$freq > 1, 'red', 'green')
)
![](https://img.haomeiwen.com/i17474796/9dc61c04ce279f1f.png)
杜甫的关键词有:将军,先帝,公孙,君臣==>关心国事和历史,先天下之忧而忧。风尘,人生,白首,妻子,春色==> 重感情,易多愁善感。日暮,江湖,感时,寂寞,天涯,涕泪==>有些悲观,可能还爱哭,聊感寂寞。三峡,临颍,江南,关塞,西山==>又胸怀四海。
d.挖挖李白的诗
author_LiBai = poems[, 2][grepl("李白", poems[, 1], perl = T)]
author_LiBai = as.character(author_LiBai)
wk = worker()
x1 = freq(segment(author_LiBai, wk))
wordcloud2(x1, minSize = 40)
wordcloud2(
x1,
size = 0.2,
figPath = "x3.png",
color = ifelse(x1$freq > 1, 'red', 'green')
)
![](https://img.haomeiwen.com/i17474796/637b0c812688a255.png)
杜甫的关键词有:我,欲,来,还,去==>注重自我感受,十分洒脱,想来就来,想走就走。不可,不为,不得,之难,难于上青天==>生活之事,多有不可为,不得之,所以会时常抱怨。
网友评论