0.前言
放假了,豆豆去了海陵岛,花花十几天前就回了山东老家。在家刷statquest,第一节直方图,很简单舒适的一个开始,我顺手把图也给画了。
直方图+密度图可以,直方图+分布曲线图也可以~
1.R包和数据的准备
rm(list=ls())
library(ggplot2)
library(dplyr)
set.seed(1001)
dat = data.frame(length1 = rnorm(2000,500,60),
length2 = c(rnorm(1000,500,60),rnorm(1000,800,60)),
group = rep(c("A","B"),each = 1000))
head(dat)
## length1 length2 group
## 1 631.3189 504.8925 A
## 2 489.3472 587.9790 A
## 3 488.8835 529.2282 A
## 4 349.6078 453.0826 A
## 5 466.5613 524.2960 A
## 6 491.3864 467.0581 A
生成了两组数据,length1是一组均值为500的正态分布数据,length2是两组正态分布数据,均值分别为500和800。
密度图与直方图的叠加,基础包与ggplot2都可以实现。
1.基础包
1.1.直方图+密度图
hist(dat$length1,freq=FALSE,ylim = c(0,0.007),breaks = 30)
lines(density(dat$length1))
hist(dat$length2,freq=FALSE,ylim = c(0,0.007),breaks = 30)
lines(density(dat$length2))
1.2.直方图+分布曲线
dat2 = data.frame(d1 = dnorm(1:1000,500,60),
d2 = dnorm(1:1000,500,60),
d3 = dnorm(1:1000,800,60),
n = 1:1000)
hist(dat$length1,freq=FALSE,ylim = c(0,0.007),breaks = 30)
lines(dat2$d1)
hist(dat$length2,freq=FALSE,ylim = c(0,0.007),breaks = 30)
lines(dat2$d2)
lines(dat2$d3)
2.ggplot2
2.1.直方图+密度图
ggplot(dat, aes(x = length1)) +
geom_histogram(aes(y = ..density..),color = "grey",fill = "grey",alpha = 0.7)+
geom_density(color = "grey")+
theme_bw()
mes = group_by(dat,group) %>% summarise(mean = mean(length2))
ggplot(dat, aes(x = length2,group = group)) +
geom_histogram(aes(y = ..density..,fill = group,
color = group),alpha = 0.2,bins = 25)+
geom_density(aes(y = ..density..,color = group))+
geom_vline(data = mes,aes(xintercept = mean,color = group),lty =4)+
scale_color_manual(values = c('#D0505D','#6194A7'))+
scale_fill_manual(values = c('#D0505D','#6194A7'))+
theme_bw()
双峰的图可以分两组画,给两组分别画密度线,我还顺手给他改了改颜色,标记了均值线,好看!
2.2.直方图+分布曲线
ggplot(dat, aes(x = length1)) +
geom_histogram(aes(y = ..density..),color = "grey",fill = "grey",alpha = 0.7)+
geom_line(color = "grey",dat = dat2,aes(x = n,y = d1))+
theme_bw()+
xlim(c(300,750))
mes = group_by(dat,group) %>% summarise(mean = mean(length2))
ggplot(dat) +
geom_histogram(aes(y = ..density..,fill = group,
x = length2,
color = group),alpha = 0.2,bins = 25)+
geom_line(dat = dat2,aes(x = n,y = d2),color = '#D0505D')+
geom_line(dat = dat2,aes(x = n,y = d3),color = '#6194A7')+
geom_vline(data = mes,aes(xintercept = mean,color = group),lty =4)+
scale_color_manual(values = c('#D0505D','#6194A7'))+
scale_fill_manual(values = c('#D0505D','#6194A7'))+
theme_bw()+
xlim(c(260,1000))
瞄了一眼,看到分布曲线图基础包和ggplot2画的不一样,想了一下 可能是因为设置的bins(就是柱子)不一样宽~不改了。
网友评论