美文网首页
常用模型使用简介

常用模型使用简介

作者: Bio_Learner | 来源:发表于2019-10-18 10:14 被阅读0次

    XGBoost

    xgboost入门与实战(原理篇)的后半部分介绍了需要注意的参数和基本使用方法。
    神器xgboost简单入门和运用
    手把手教写出XGBoost实战程序

    NMF

    NMF(非负矩阵分解)
    NMF中的cophenetic correlation coefficient可以用来判断最佳聚类数,源于文献的一句话:We select values of k where the magnitude of the cophenetic correlation coefficient begins to fall
    示例:
    R语言 NMF 如何自动判断最佳rank的数目
    Cophenetic correlation: Wikis

    Consensus Clustering

    Consensus Clustering
    上述链接的讲解不错,另外关于如何确定最佳K值,这个链接有讨论:
    how to choose optimal K in Consensus clustering
    其中,有人提到最常用的方法是:

    PAC has been shown to outperform other K-estimating methods (e.g., ) in this paper and this paper
    Dr. Yasin Şenbabaoğlu has kindly provided the R implementation of PAC . You can use the results in ConsensusClusterPlus as input to get optimal K based on minimum PAC. The code is from here.

    ######################################################## 
    seed=11111
    d = matrix(rnorm(200000,0,1),ncol=200) # 200 samples in columns, 1000 genes in rows
    colnames(d) = paste("Samp",1:200,sep="")
    rownames(d) = paste("Gene",1:1000,sep="")
    d = sweep(d,1, apply(d,1,median,na.rm=T))
    maxK = 6 # maximum number of clusters to try
    results = ConsensusClusterPlus(d,maxK=maxK,reps=50,pItem=0.8,pFeature=1,title="test_run",
    innerLinkage="complete",seed=seed,plot="pdf")
    
    # Note that we implement consensus clustering with innerLinkage="complete". 
    # We advise against using innerLinkage="average" which is the default value in this package as average linkage is not robust to outliers.
    
    ############## PAC implementation ##############
    Kvec = 2:maxK
    x1 = 0.1; x2 = 0.9 # threshold defining the intermediate sub-interval
    PAC = rep(NA,length(Kvec)) 
    names(PAC) = paste("K=",Kvec,sep="") # from 2 to maxK
    for(i in Kvec){
      M = results[[i]]$consensusMatrix
      Fn = ecdf(M[lower.tri(M)])
      PAC[i-1] = Fn(x2) - Fn(x1)
    }#end for i
    # The optimal K
    optK = Kvec[which.min(PAC)]
    ########################################################
    

    其他介绍:
    R中实现鉴定簇集数及其成员的算法
    R语言 ConsensusClusterPlus 确定最佳K值
    一致性聚类ConsensusClusterPlus

    相关文章

      网友评论

          本文标题:常用模型使用简介

          本文链接:https://www.haomeiwen.com/subject/apdfmctx.html