前段时间在着手性状测试设备的验收工作,需要对机器测量数据和人工测量数据进行相关性分析,散点图加拟合曲线是一个好的呈现方式,现在将这个方法的R代码记录下来,一是对刚学知识的梳理,二是以后方便查看和分享给更多的志同道合的小白。 作者:刘峻宇
1、读取、整理数据:
##共测试了50尾大虾和50尾小虾,现将数据合并,并绘制BL(body length)的机器和人工测量数据的相关性
require(data.table)
require(ggplot2)
big <- fread(input= "shrimp_big.csv",sep = ",",header = TRUE)
little <- fread(input= "shrimp_little.csv",sep = ",",header = TRUE)
colnames(little) <- c("ID","AL1","BL1","CL1","CW1","CH1","AL2","BL2", "CL2" ,"CW2", "CH2")
shrimp <- rbind(big,little)
data <- as.data.frame(cbind(shrimp$ BL1,shrimp$BL2))
colnames(data) <- c("x","y") #y和x代表BL2和BL1
head(shrimp)
查看数据集结构,左边机器测量,右边人工测量
Call:
ID AL1 BL1 CL1 CW1 CH1 AL2 BL2 CL2 CW2 CH2
1 50.7 100.6 30.2 16.1 19.6 6.4 10.1 2.7 1.1 1.7
2 57.2 112.5 34.4 17.6 21.4 6.6 11.9 3.1 1.5 2.0
3 56.8 102.3 30.5 16.0 19.5 6.4 11.2 3.1 1.5 1.8
4 62.6 116.1 36.0 17.9 21.7 6.7 11.4 3.2 1.7 1.9
5 53.8 106.2 32.2 16.8 20.4 6.3 10.6 3.2 1.7 1.7
6 55.3 114.3 35.6 17.1 20.8 6.5 11.5 3.1 1.5 1.8
2、查看两数据间的线性关系
lm <- lm(y~x + I(x^2),data = shrimp)
summary(lm)
Call:
lm(formula = y ~ x + I(x^2), data = data)
Residuals:
Min 1Q Median 3Q Max
-2.04722 -0.30843 -0.00514 0.29011 1.03779
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.6014832 0.9583339 -3.758 0.000294 ***
x 0.1855287 0.0227781 8.145 1.4e-12 ***
I(x^2) -0.0004773 0.0001266 -3.771 0.000281 ***
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
可以看到,拟合曲线的截距和系数都极显著,所以数据间关系为二项式关系
3、绘图
# 编写拟合曲线公式的函数
lm_eqn = function(data){
m=lm(y ~ x + I(x^2), data)
eq <- substitute(italic(y) == a+b %.% italic(x)+b %.% italic(x^2)*","~~italic(r)^2~"="~r2,
list(a = as.numeric(format(coef(m)[1], digits = 3)),
b = as.numeric(format(coef(m)[2], digits = 3)),
c = as.numeric(format(coef(m)[3], digits = 4)),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq))
}
#ggpot 作图
p <- ggplot(data,aes(x=x,y=y)) + geom_point()
p + stat_smooth(method='lm',formula = y~x+ I(x^2),colour='red') +
scale_x_continuous(limits = c(50,130)) +
theme(axis.text=element_text(colour = 'black',size = 12), axis.title=element_text(size = 14)) +
annotate("text", x=50, y=15, label=lm_eqn(data), hjust=0, size=4,parse=TRUE)+
labs(x="机器测量",y="人工测量",title = "BL")+ theme(plot.title = element_text(hjust = 0.5))
4、结果
BL相关性曲线
网友评论