美文网首页
数据分析:基于K-folds + repeats tuning

数据分析:基于K-folds + repeats tuning

作者: 生信学习者2 | 来源:发表于2021-08-20 15:31 被阅读0次

    前言

    为了得到robust parameters,常常会对模型参数进行tuning,tuning的方法有K-folds CV或 N repeats 构建模型,前者对单次运行时划分样本训练模型,后者重复前者N次,接着选择前者最小lambda组合成最小lambda集合,再根据median或min等选择最佳lambda,最后基于最佳lambda构建新模型。更多知识分享请到 https://zouhua.top/

    Codes

    library(glmnet)
    data(QuickStartExample)
    
    df_lambdas_min <- c()
    # 10-fold CV + 10 repeat
    for(i in 1:10){
      cvfit <- cv.glmnet(x=x,
                         y=y,
                         nfolds = 10,
                         alpha = 1,
                         nlambda = 100,
                         type.measure = "auc")
    
    # require(ipflasso)
    # cvfit <- cvr.glmnet(X=dat_table,
    #                  Y=dat_target,
    #                  family='binomial',
    #                  nfolds = 10,
    #                  alpha = 1,
    #                  ncv = 10,
    #                  nlambda = 100,
    #                  type.measure = "auc")
    
      df_lambdas_min <- rbind(df_lambdas_min, cvfit$lambda.min)
    }
    print(df_lambdas_min)
    

    Notes: K folds设置应该考虑到sample size的问题,每个fold的sample size一定要大于8,所以10 folds的最小sample size是80

    1. sample size per condition less than 8


    1. sample size per folds less than 10


    参考

    1. Circulating Protein Biomarkers for Use in Pancreatic Ductal Adenocarcinoma Identification

    2. An Introduction to glmnet

    3. Repeating cv.glmnet

    参考文章如引起任何侵权问题,可以与我联系,谢谢。

    相关文章

      网友评论

          本文标题:数据分析:基于K-folds + repeats tuning

          本文链接:https://www.haomeiwen.com/subject/mfutiltx.html