Validation
上一期我们介绍了,如何利用100
空间位置,来估计研究区域内的降雨量。
但是并没有做模型的validation
虽然我们已经将数据集分成test
与train
两个部分;接下来我们将介绍如何测试模型的好坏及与glm
模型对比。
validation
首先我们将367个点绘制出来,看一下效果。
image.png
然后根据前述的SPDE
函数,将367
个空间效应给提取出来 然后整合放在stack.train
里面,提示这里的y
为NA
但是X
变量还是来源于train
数据里面
# plot train
dim(train)
# train plot
ggplot() +
geom_point(data=train, aes(y = lat, x = lon,size=rainfall))
# Mesh and SPDE and stack
A.train=inla.spde.make.A(Mesh,loc=train_loc)
## 5.2 train stack
Xm <- model.matrix(~ -1 + altitude, data = train)
X=data.frame(altitude=Xm[,1])
N=nrow(train)
N
stack.train=inla.stack(tag="train",
data=list(y=NA),
A = list(1,1,A.train),
effects= list(
Intercept = rep(1, N),
X = X,
w = s.index))
# join stack
stack_fit=inla.stack(stack.test,stack.train)
## 7. fit model
formula = y ~ -1+Intercept+altitude+f(w, model = spde)
fit=inla(formula = formula,
data = inla.stack.data(stack_fit,spde=spde),
family = "gaussian",
control.compute = list(dic = TRUE,waic = TRUE),
control.predictor = list(A = inla.stack.A(stack_fit),compute=TRUE)
)
接下来就是拟合INLA
模型了,formula
跟前面介绍的一样,写好formula
以后带入fit model
;这里inla里面的stack
就使用了train
与test
结合的stack
# join stack
stack_fit=inla.stack(stack.test,stack.train)
## 7. fit model
formula = y ~ -1+Intercept+altitude+f(w, model = spde)
fit_train=inla(formula = formula,
data = inla.stack.data(stack_fit,spde=spde),
family = "gaussian",
control.compute = list(dic = TRUE,waic = TRUE),
control.predictor = list(A = inla.stack.A(stack_fit),compute=TRUE)
)
> round(fit_train$summary.fixed, 4)
mean sd 0.025quant 0.5quant 0.975quant mode kld
Intercept 0.0032 31.6144 -62.0665 0.0023 62.0211 0.0032 0e+00
altitude 0.0129 0.0173 -0.0211 0.0130 0.0467 0.0130 1e-04
predict
## prediction 367 sites
index.train=inla.stack.index(stack_fit,"train")$data
post_mean_train=fit$summary.linear.predictor[index.train,"mean"]
post_sd_train=fit$summary.linear.predictor[index.train,"sd"]
pred_df=tibble(obs=train$rainfall,
pre=post_mean_train)
ggplot(data=pred_df, aes(x = obs, y = pre)) +
geom_point()+
geom_smooth()+
labs(title="INLA-prediction")
cor.test(pred_df$obs,pred_df$pre)
367
个位置的拟合值与实际值的相关系数为0.84
,认为该INLA
模型预测效果较好。
glm
同样我们利用glm
一般线形模型来拟合降雨量与海拔高度之间的关系,并对367
个点进行预测。
## GLM
fit_glm=glm(rainfall~altitude,data=test)
summary(fit_glm)
# predict
newdata=train %>% mutate(rainfall=NA)
pre_glm=predict(fit_glm,newdata)
pred_df2=tibble(obs=train$rainfall,
pre=as.numeric(pre_glm))
ggplot(data=pred_df2, aes(x = obs, y = pre)) +
geom_point()+
geom_smooth()+
labs(title="GLM-prediction")
cor.test(pred_df2$obs,pred_df2$pre)
可以看到,glm
模型预测的结果很不理想。相关系数为0.198
网友评论