美文网首页R统计编程R语言做图R plot
跟着Nature Ecology&Evolution学数据分析:

跟着Nature Ecology&Evolution学数据分析:

作者: 小明的数据分析笔记本 | 来源:发表于2021-04-03 17:43 被阅读0次

    之前好多人在公众号留言问这个 方差分解 的内容,但是之前自己也没有听说过。最近看到有人分享了公众号推文 一种简单易行的方差分解方法。看了这个推文我目前理解的是 方差分解的主要作用是 量化回归模型Y=b0+b1x1+b2x2+…中x1, x2, x3…对Y贡献的相对大小,以及不同X所属的因素类别(如生物因素,非生物因素)对Y的贡献大小。

    这篇推文以已经发表的论文中的数据为例子进行了介绍,论文是

    image.png

    这篇论文关于方差分解的内容数据代码是公开的,下载链接是

    https://figshare.com/s/053837c4fa852f035448

    image.png

    我看了这些代码,有的地方还看不明白,但是利用数据能够跑通流程,今天先记录一下,后面抽时间再看,有什么新的理解再来记录

    首先是读入数据
    datatotal<-read.table("datasetmultifunctionality.txt", header=T, sep="\t")
    colnames(datatotal)
    
    接下来的代码是对数据进行转化

    有的是常规的标准化

    有的是log转化

    常规的标准化开头提到的推文里介绍了方差分解必须用标准化后的数据,但是有的log转化是什么意思呢?

    #####logtransformation moments
    datatotal[,c(12,13,16,17)]<-log(datatotal[,c(12,13,16,17)])
    datatotal[,14]<-log(datatotal[,14]-min(datatotal[,14])+1)
    datatotal[,15]<-log(datatotal[,15]-min(datatotal[,15])+1)
    datatotal[,18]<-log(datatotal[,18]-min(datatotal[,18])+1)
    datatotal[,19]<-log(datatotal[,19]-min(datatotal[,19])+1)
    
    #####Zscorring environmental variables
    datatotal$ELEVATION<-(datatotal$ELEVATION-mean(datatotal$ELEVATION))/sd(datatotal$ELEVATION)
    datatotal$LAT<-(datatotal$LAT-mean(datatotal$LAT))/sd(datatotal$LAT)
    datatotal$SINLONG<-(datatotal$SINLONG-mean(datatotal$SINLONG))/sd(datatotal$SINLONG)
    datatotal$COSLONG<-(datatotal$COSLONG-mean(datatotal$COSLONG))/sd(datatotal$COSLONG)
    datatotal$SLO<-(datatotal$SLO-mean(datatotal$SLO))/sd(datatotal$SLO)
    datatotal$ARIDITY<-(datatotal$ARIDITY-mean(datatotal$ARIDITY))/sd(datatotal$ARIDITY)
    datatotal$SAND<-(datatotal$SAND-mean(datatotal$SAND))/sd(datatotal$SAND)
    datatotal$PH<-(datatotal$PH-mean(datatotal$PH))/sd(datatotal$PH)
    datatotal$SR<-(datatotal$SR-mean(datatotal$SR))/sd(datatotal$SR)
    
    #####Zscorring moments
    datatotal$CWM_logH<-(datatotal$CWM_logH-mean(datatotal$CWM_logH))/sd(datatotal$CWM_logH)
    datatotal$CWV_logH<-(datatotal$CWV_logH-mean(datatotal$CWV_logH))/sd(datatotal$CWV_logH)
    datatotal$CWS_logH<-(datatotal$CWS_logH-mean(datatotal$CWS_logH))/sd(datatotal$CWS_logH)
    datatotal$CWK_logH<-(datatotal$CWK_logH-mean(datatotal$CWK_logH))/sd(datatotal$CWK_logH)
    datatotal$CWM_logSLA<-(datatotal$CWM_logSLA-mean(datatotal$CWM_logSLA))/sd(datatotal$CWM_logSLA)
    datatotal$CWV_logSLA<-(datatotal$CWV_logSLA-mean(datatotal$CWV_logSLA))/sd(datatotal$CWV_logSLA)
    datatotal$CWS_logSLA<-(datatotal$CWS_logSLA-mean(datatotal$CWS_logSLA))/sd(datatotal$CWS_logSLA)
    datatotal$CWK_logSLA<-(datatotal$CWK_logSLA-mean(datatotal$CWK_logSLA))/sd(datatotal$CWK_logSLA)
    
    #####Zscorring ecosystem functions
    
    datatotal$BGL<-(datatotal$BGL-mean(datatotal$BGL))/sd(datatotal$BGL)
    datatotal$FOS<-(datatotal$FOS-mean(datatotal$FOS))/sd(datatotal$FOS)
    datatotal$AMP<-(datatotal$AMP-mean(datatotal$AMP))/sd(datatotal$AMP)
    datatotal$NTR<-(datatotal$NTR-mean(datatotal$NTR))/sd(datatotal$NTR)
    datatotal$I.NDVI<-(datatotal$I.NDVI-mean(datatotal$I.NDVI))/sd(datatotal$I.NDVI)
    
    
    #####Calculating indices of multifunctionality (M5: 5 functions)
    colnames(datatotal)
    M5<-rowMeans(datatotal[,c(20,21,22,23,24)])
    datatotal<-cbind(datatotal,M5)
    
    
    #####Log-transfromation of multifunctionality
    logM5<-log(datatotal$M5-min(datatotal$M5)+1)
    datatotal<-cbind(datatotal,logM5)
    
    
    加载 MuMIn这个包做模型选择

    代码是

    library(MuMIn)
    mod12<-lm(logM5 ~ LAT + SINLONG + COSLONG +   
                ARIDITY + SLO + SAND + PH + I(PH^2) + ELEVATION+
                CWM_logSLA + I(CWM_logSLA^2)+ CWV_logSLA + I(CWV_logSLA^2) +  CWS_logSLA + CWK_logSLA + I(CWK_logSLA^2) +
                CWM_logH + I(CWM_logH^2)+ CWV_logH + I(CWV_logH^2) +  CWS_logH + CWK_logH + I(CWK_logH^2) +
                SR
              , data=datatotal)
    # 这一步要好长时间
    dd12<-dredge(mod12, subset = ~ LAT & SINLONG & COSLONG & ARIDITY & SLO & SAND & PH &SR & ELEVATION &   
                   dc(CWM_logSLA,I(CWM_logSLA^2)) & dc(CWV_logSLA,I(CWV_logSLA^2)) & dc(CWK_logSLA,I(CWK_logSLA^2)) 
                 & dc(CWM_logH,I(CWM_logH^2)) & dc(CWV_logH,I(CWV_logH^2)) & dc(CWK_logH,I(CWK_logH^2)), 
                 options(na.action = "na.fail"))
    
    subset(dd12,delta<2)
    de12<-model.avg(dd12, subset = delta < 2)
    summary(de12)
    
    image.png
    image.png

    这一步得到的数据就是论文中 的figure4a

    image.png

    下期推文介绍如何利用得到的数据画图

    这里遇到的问题是:

    • 1、 模型里有的变量会用I()函数包起来,这个函数起到什么作用呢?
    • 2、模型选择那一步用到了dc()函数,这个函数又起到什么作用呢?

    今天的内容就到这里了

    欢迎大家关注我的公众号
    小明的数据分析笔记本

    小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!

    今天的内容主要参考

    • 公众号 二傻统计 的推文 一种简单易行的方差分解方法

    相关文章

      网友评论

        本文标题:跟着Nature Ecology&Evolution学数据分析:

        本文链接:https://www.haomeiwen.com/subject/bpshkltx.html