美文网首页R
一学就会直方图,圆润又光滑

一学就会直方图,圆润又光滑

作者: 小洁忘了怎么分身 | 来源:发表于2020-10-03 18:23 被阅读0次

    0.前言

    放假了,豆豆去了海陵岛,花花十几天前就回了山东老家。在家刷statquest,第一节直方图,很简单舒适的一个开始,我顺手把图也给画了。

    直方图+密度图可以,直方图+分布曲线图也可以~

    1.R包和数据的准备

    rm(list=ls())
    library(ggplot2)
    library(dplyr)
    set.seed(1001)
    dat = data.frame(length1 = rnorm(2000,500,60),
                     length2 = c(rnorm(1000,500,60),rnorm(1000,800,60)),
                     group = rep(c("A","B"),each = 1000))
    head(dat)
    
    ##    length1  length2 group
    ## 1 631.3189 504.8925     A
    ## 2 489.3472 587.9790     A
    ## 3 488.8835 529.2282     A
    ## 4 349.6078 453.0826     A
    ## 5 466.5613 524.2960     A
    ## 6 491.3864 467.0581     A
    

    生成了两组数据,length1是一组均值为500的正态分布数据,length2是两组正态分布数据,均值分别为500和800。

    密度图与直方图的叠加,基础包与ggplot2都可以实现。

    1.基础包

    1.1.直方图+密度图

    hist(dat$length1,freq=FALSE,ylim = c(0,0.007),breaks = 30)
    lines(density(dat$length1))  
    
    hist(dat$length2,freq=FALSE,ylim = c(0,0.007),breaks = 30)
    lines(density(dat$length2)) 
    

    1.2.直方图+分布曲线

    dat2 = data.frame(d1 = dnorm(1:1000,500,60),
                      d2 = dnorm(1:1000,500,60),
                      d3 = dnorm(1:1000,800,60),
                      n = 1:1000)
    
    hist(dat$length1,freq=FALSE,ylim = c(0,0.007),breaks = 30)
    lines(dat2$d1)  
    
    hist(dat$length2,freq=FALSE,ylim = c(0,0.007),breaks = 30)
    lines(dat2$d2) 
    lines(dat2$d3)
    

    2.ggplot2

    2.1.直方图+密度图

    ggplot(dat, aes(x = length1)) +
      geom_histogram(aes(y = ..density..),color = "grey",fill = "grey",alpha = 0.7)+
      geom_density(color = "grey")+
      theme_bw()
    
    mes = group_by(dat,group) %>% summarise(mean = mean(length2)) 
    ggplot(dat, aes(x = length2,group = group)) +
      geom_histogram(aes(y = ..density..,fill = group,
                         color = group),alpha = 0.2,bins = 25)+
      geom_density(aes(y = ..density..,color = group))+
      geom_vline(data = mes,aes(xintercept = mean,color = group),lty =4)+
      scale_color_manual(values = c('#D0505D','#6194A7'))+
      scale_fill_manual(values = c('#D0505D','#6194A7'))+
      theme_bw()
    

    双峰的图可以分两组画,给两组分别画密度线,我还顺手给他改了改颜色,标记了均值线,好看!

    2.2.直方图+分布曲线

    ggplot(dat, aes(x = length1)) +
      geom_histogram(aes(y = ..density..),color = "grey",fill = "grey",alpha = 0.7)+
      geom_line(color = "grey",dat = dat2,aes(x = n,y = d1))+
      theme_bw()+
      xlim(c(300,750))
    
    mes = group_by(dat,group) %>% summarise(mean = mean(length2)) 
    ggplot(dat) +
      geom_histogram(aes(y = ..density..,fill = group,
                         x = length2,
                         color = group),alpha = 0.2,bins = 25)+
      geom_line(dat = dat2,aes(x = n,y = d2),color = '#D0505D')+
      geom_line(dat = dat2,aes(x = n,y = d3),color = '#6194A7')+
      geom_vline(data = mes,aes(xintercept = mean,color = group),lty =4)+
      scale_color_manual(values = c('#D0505D','#6194A7'))+
      scale_fill_manual(values = c('#D0505D','#6194A7'))+
      theme_bw()+
      xlim(c(260,1000))
    

    瞄了一眼,看到分布曲线图基础包和ggplot2画的不一样,想了一下 可能是因为设置的bins(就是柱子)不一样宽~不改了。

    相关文章

      网友评论

        本文标题:一学就会直方图,圆润又光滑

        本文链接:https://www.haomeiwen.com/subject/duwwuktx.html