美文网首页
有趣的wordcloud2

有趣的wordcloud2

作者: dming1024 | 来源:发表于2019-08-22 21:07 被阅读0次

    1.安装或加载R包

    if (!requireNamespace("wordcloud2", quietly = TRUE))
        install.packages("wordcloud2")
    library(wordcloud2)
    

    2.利用wordcloud2自带数据熟悉作图

    a.绘制星形(shape="star")词云

    wordcloud2(demoFreq, size = 1,shape = 'star')
    

    b.绘制五角形(shape="pentagon")词云

    wordcloud2(demoFreq, size = 1,shape = 'pentagon')
    

    当然也支持中文词云的绘制

    wordcloud2(demoFreqC, size = 1,shape = 'pentagon')
    

    3.自定义图形

    a.绘制鸽子词云

    batman = system.file("examples/t.png", package = "wordcloud2")
    wordcloud2(
      demoFreqC,
      figPath = batman,
      size = 1,
      color = "skyblue",
    )
    

    b.绘制心形词云

    wordcloud2(demoFreqC,
               figPath = "1.png",#图片相对路径
               size = 1,
               color = 'random-dark')
    

    c. 自定义颜色

    ordcloud2(demoFreqC, figPath = "1.png", size = 1,
               color = ifelse(demoFreqC$V1>1000,"red","skyblue")
               )
    

    以上只是个热身,下面我们开始对300首唐诗进行简单挖掘。先来看下300首唐诗的前三首,就有两手是杜甫写给李白的


    1.初步探索各作者作诗频率

    #预览诗词
    View(poem)
    #提取作者名:诗名
    author_title = poem[, 1][grepl("^[0-9]", poem[, 1], perl = T)]
    #提取作者名
    authors = gsub("[0-9]*(.{2,3})\\:\\S+", "\\1", author_title, perl = T)
    #做成dataFrame
    authors_dataframe = as.data.frame(table(authors))
    #预览包含作者名,以及写诗的数目
    View(authors_dataframe)
    
    前十名

    a. 发表诗词大于10次的作者

    x1=authors_dataframe[authors_dataframe$Freq>10,]
    
    pielab=paste(x1$authors,"(",round(x1$Freq/sum(x1$Freq)*100,2),"%)")
    pie(x1$Freq,labels = pielab
        ,col = c("skyblue", "rosybrown1", "aquamarine",
                 "gray82", "wheat1", "lightgreen")
        ,radius = 1.1
        ,cex.lab=1
        ,main = "10首以上诗词的作者分布"
        ,angle=90
        ,border=NA)
    

    a. 发表诗词5-10次的作者

    x1 = authors_dataframe[authors_dataframe$Freq > 5 &
                             authors_dataframe$Freq < 10, ]
    pielab = paste(x1$authors, "(", round(x1$Freq / sum(x1$Freq) * 100, 2), "%)")
    pie(
      x1$Freq,
      labels = pielab,
      col = c(
        "skyblue",
        "rosybrown1",
        "aquamarine",
        "gray82",
        "wheat1",
        "lightgreen"
      ),
      radius = 1.1,
      cex.lab = 1,
      main = "5~10首诗词的作者分布"
      ,angle = 90
      ,border = NA
    )
    
    5-10次

    2. 选取杜甫--李白的诗词进行挖掘
    a. 对诗词进行预处理
    利用perl语言整理成这个样子,第一列:作者+诗名;第二列:诗词内容。一共306行,也就是说有306首诗词:

    b. 词频处理包的安装与加载
    jieba(结巴),没错就是这个处理词频的包--结巴包。

    if (!requireNamespace("jiebaR", quietly = TRUE))
        install.packages("jiebaR")
    library(jiebaR)
    

    c.看看杜甫的诗

    #选择作者是杜甫的行
    author_DuFU = poems[, 2][grepl("杜甫", poems[, 1], perl = T)]
    #进行字符转换
    author_DuFU = as.character(author_DuFU)
    #利用jieba包产生一个对象
    wk = worker()
    #统计词频
    x1 = freq(segment(author_DuFU, wk))
    #作图
    wordcloud2(
      x1,
      size = 0.4,
      figPath = "dufu.png",
      color = ifelse(x1$freq > 1, 'red', 'green')
    )
    

    杜甫的关键词有:将军,先帝,公孙,君臣==>关心国事和历史,先天下之忧而忧。风尘,人生,白首,妻子,春色==> 重感情,易多愁善感。日暮,江湖,感时,寂寞,天涯,涕泪==>有些悲观,可能还爱哭,聊感寂寞。三峡,临颍,江南,关塞,西山==>又胸怀四海。

    d.挖挖李白的诗

    author_LiBai = poems[, 2][grepl("李白", poems[, 1], perl = T)]
    author_LiBai = as.character(author_LiBai)
    wk = worker()
    x1 = freq(segment(author_LiBai, wk))
    wordcloud2(x1, minSize = 40)
    wordcloud2(
      x1,
      size = 0.2,
      figPath = "x3.png",
      color = ifelse(x1$freq > 1, 'red', 'green')
    )
    

    杜甫的关键词有:我,欲,来,还,去==>注重自我感受,十分洒脱,想来就来,想走就走。不可,不为,不得,之难,难于上青天==>生活之事,多有不可为,不得之,所以会时常抱怨。

    相关文章

      网友评论

          本文标题:有趣的wordcloud2

          本文链接:https://www.haomeiwen.com/subject/smposctx.html