美文网首页
应用sample函数对数据集随机分组并输出

应用sample函数对数据集随机分组并输出

作者: 灵活胖子的进步之路 | 来源:发表于2020-10-09 06:53 被阅读0次
library(foreign)


testdata<-read.csv("test.csv",header=T,sep=",")

str(testdata)

#'data.frame':  168 obs. of  3 variables:
#$ id      : int  1 1 2 2 3 3 4 4 5 5 ...
#$ outcome : int  1 0 1 0 1 0 1 0 1 0 ...
#$ exposure: int  1 1 1 1 1 1 1 0 1 0 ...

set.seed(2020)#设定种子数


ind <- sample(2, nrow(testdata), replace = T, prob = c(0.7, 0.3))#有放回回抽样,建模组70%,验证组30%
devData <- testdata[ind == 1,]  # 建模组
vadData <- testdata[ind == 2,]  # 验证组

str(devData)

#'data.frame':  121 obs. of  3 variables:

str(vadData)

#data.frame':   47 obs. of  3 variables:

#以下代码分别导出建模组及对照组

write.csv(devData, file = "devData.csv",row.names = FALSE)

write.csv(vadData , file = "vadData.csv", row.names = FALSE)

#或者生成汇合矩阵

mixdata<-cbind(ind,testdata)

str(mixdata)

#'data.frame':  168 obs. of  4 variables:
# ind     : int  1 1 1 1 1 1 1 1 1 1 ...
#  id      : int  1 1 2 2 3 3 4 4 5 5 ...
# outcome : int  1 0 1 0 1 0 1 0 1 0 ...
# exposure: int  1 1 1 1 1 1 1 0 1 0 ...

相关文章

网友评论

      本文标题:应用sample函数对数据集随机分组并输出

      本文链接:https://www.haomeiwen.com/subject/pqxkpktx.html