美文网首页
应用sample函数对数据集随机分组并输出

应用sample函数对数据集随机分组并输出

作者: 灵活胖子的进步之路 | 来源:发表于2020-10-09 06:53 被阅读0次
    library(foreign)
    
    
    testdata<-read.csv("test.csv",header=T,sep=",")
    
    str(testdata)
    
    #'data.frame':  168 obs. of  3 variables:
    #$ id      : int  1 1 2 2 3 3 4 4 5 5 ...
    #$ outcome : int  1 0 1 0 1 0 1 0 1 0 ...
    #$ exposure: int  1 1 1 1 1 1 1 0 1 0 ...
    
    set.seed(2020)#设定种子数
    
    
    ind <- sample(2, nrow(testdata), replace = T, prob = c(0.7, 0.3))#有放回回抽样,建模组70%,验证组30%
    devData <- testdata[ind == 1,]  # 建模组
    vadData <- testdata[ind == 2,]  # 验证组
    
    str(devData)
    
    #'data.frame':  121 obs. of  3 variables:
    
    str(vadData)
    
    #data.frame':   47 obs. of  3 variables:
    
    #以下代码分别导出建模组及对照组
    
    write.csv(devData, file = "devData.csv",row.names = FALSE)
    
    write.csv(vadData , file = "vadData.csv", row.names = FALSE)
    
    #或者生成汇合矩阵
    
    mixdata<-cbind(ind,testdata)
    
    str(mixdata)
    
    #'data.frame':  168 obs. of  4 variables:
    # ind     : int  1 1 1 1 1 1 1 1 1 1 ...
    #  id      : int  1 1 2 2 3 3 4 4 5 5 ...
    # outcome : int  1 0 1 0 1 0 1 0 1 0 ...
    # exposure: int  1 1 1 1 1 1 1 0 1 0 ...
    

    相关文章

      网友评论

          本文标题:应用sample函数对数据集随机分组并输出

          本文链接:https://www.haomeiwen.com/subject/pqxkpktx.html