- 分别判断一下“a”,TRUE,3是什么数据类型? 提示:typeof(),将要判断的内容放进括号里
> typeof("a")
[1] "character"
> typeof(TRUE)
[1] "logical"
> typeof(3)
[1] "double"
- 练习2 向量生成
2.1. 生成任意向量
> c(1,3,6,10)
[1] 1 3 6 10
2.2 生成1到30之间所有4的倍数,答案是 #4,8,12,16,20,24,28
> seq(from=4, to=28, by=4)
[1] 4 8 12 16 20 24 28
2.3 生成sample4,sample8,sample12…sample28 提示:用paste0,请尝试改写刚才的代码
> paste0(rep("sample", times=7), seq(from=4, to=28, by=4))
[1] "sample4" "sample8" "sample12" "sample16" "sample20" "sample24"
[7] "sample28"
思考如何从50个数中筛选小于7的?
50个数组成向量,赋值给x
用x<7判断返回50个逻辑值
挑选结果为TRUE的
> x <- runif(50, min = 1, max = 60)
> x
[1] 13.245162 14.490830 36.147008 34.917460 5.546798 3.096894 38.924934
[8] 55.788297 36.287453 34.093144 32.035636 59.120618 30.950868 41.284497
[15] 36.490932 15.093252 16.231790 44.029268 27.701679 11.332479 45.055198
[22] 7.194271 52.008152 37.264053 33.872413 20.397862 27.734755 30.526017
[29] 11.671115 32.248206 5.441269 17.387600 13.549272 17.802638 53.810552
[36] 27.327884 47.019108 52.956523 25.374328 4.764700 20.793762 43.699831
[43] 20.919305 38.194433 50.596259 51.511768 24.090198 23.449139 53.831280
[50] 39.014630
> x[x<7]
[1] 5.546798 3.096894 5.441269 4.764700
练习3:向量取子集
3.1 将基因名 “ACTR3B”,“ANLN”,“BAG1”,“BCL2”,“BIRC5”,“RAB”,“ABCT”,“ANF”,“BAD”,“BCF”,“BARC7”,“BAL V”组成一个向量,赋值给x
> x <- c("ACTR3B","ANLN","BAG1","BCL2","BIRC5","RAB","ABCT",
+ "ANF","BAD","BCF","BARC7","BALV")
> x
[1] "ACTR3B" "ANLN" "BAG1" "BCL2" "BIRC5" "RAB" "ABCT"
[8] "ANF" "BAD" "BCF" "BARC7" "BALV"
3.2 用函数计算向量长度
> length(x)
[1] 12
3.3 用向量取子集的方法,选出第1,3,5,7,9,11个基因名。
> x[seq(from=1, to=11, by=2)]
[1] "ACTR3B" "BAG1" "BIRC5" "ABCT" "BAD" "BARC7"
3.4 用向量取子集的方法,选出第1到7、10-15个基因名。
> x[c(1:7,10:15)]
[1] "ACTR3B" "ANLN" "BAG1" "BCL2" "BIRC5" "RAB" "ABCT"
[8] "BCF" "BARC7" "BALV" NA NA NA
3.5 用向量取子集的方法,选出出在c(“ANLN”, “BCL2”,“TP53”)中有的基因名。 提示:%in%
x[x %in% c("ANLN", "BCL2","TP53")]
[1] "ANLN" "BCL2"
3.6 修改第6个基因名为“a”
> x[6] <- "a"
> x
[1] "ACTR3B" "ANLN" "BAG1" "BCL2" "BIRC5" "a" "ABCT"
[8] "ANF" "BAD" "BCF" "BARC7" "BALV"
3.7 生成100个随机数: rnorm(n=100,mean=0,sd=18) 将小于-2的统一改为-2,将大于2的统一改为2
> x <- rnorm(n=100,mean=0,sd=18)
> x
[1] 19.8391787 -0.3014686 2.9121954 36.4457050 -12.6664966
[6] 17.2942629 32.2287310 -19.1549729 0.3174578 -7.0183553
[11] -8.8349895 -18.8229177 -16.1318028 22.8489689 10.6891371
[16] 13.9614177 28.0326668 -6.5772323 14.6980161 -1.0914260
[21] -9.0248097 16.6691291 0.6648784 -19.1916031 -4.2922144
[26] 26.9140220 21.0988538 -26.2387298 1.7110121 15.2579693
[31] -29.2385615 25.3541404 -9.7516865 5.0159650 -3.4915094
[36] 28.3708473 -26.5598574 -2.6029477 -17.1576566 7.3177692
[41] 40.1267196 -27.2609461 -1.1107336 -2.6508742 27.7486752
[46] -17.6734020 8.9384071 30.5450619 -4.6932536 -12.7067145
[51] -2.9012131 9.0237929 -18.2437141 29.0655402 0.1015557
[56] -52.2881831 -19.9289667 27.8562048 -17.5829463 -1.8270621
[61] 0.7677045 -28.7409243 8.8374127 7.5888606 33.7302702
[66] 18.6212578 1.4725856 -1.4854277 10.9093218 -15.9735626
[71] 1.8975850 6.3517405 9.9070805 -20.4179574 26.3223277
[76] 12.6381008 45.1280007 -34.0204886 -10.6166302 -30.8610413
[81] -7.5779622 5.5825448 30.6462705 -7.9809265 -21.5747475
[86] -5.5328565 11.1789756 3.2742394 23.7312168 -5.3803676
[91] -29.6679914 17.1269725 -20.0362132 11.1053966 9.2428869
[96] 6.6502639 31.0300944 -3.7106022 -23.6555125 1.1425337
> x[x< -2] <- -2
> x[x> 2] <- 2
> x
[1] 2.0000000 -0.3014686 2.0000000 2.0000000 -2.0000000 2.0000000
[7] 2.0000000 -2.0000000 0.3174578 -2.0000000 -2.0000000 -2.0000000
[13] -2.0000000 2.0000000 2.0000000 2.0000000 2.0000000 -2.0000000
[19] 2.0000000 -1.0914260 -2.0000000 2.0000000 0.6648784 -2.0000000
[25] -2.0000000 2.0000000 2.0000000 -2.0000000 1.7110121 2.0000000
[31] -2.0000000 2.0000000 -2.0000000 2.0000000 -2.0000000 2.0000000
[37] -2.0000000 -2.0000000 -2.0000000 2.0000000 2.0000000 -2.0000000
[43] -1.1107336 -2.0000000 2.0000000 -2.0000000 2.0000000 2.0000000
[49] -2.0000000 -2.0000000 -2.0000000 2.0000000 -2.0000000 2.0000000
[55] 0.1015557 -2.0000000 -2.0000000 2.0000000 -2.0000000 -1.8270621
[61] 0.7677045 -2.0000000 2.0000000 2.0000000 2.0000000 2.0000000
[67] 1.4725856 -1.4854277 2.0000000 -2.0000000 1.8975850 2.0000000
[73] 2.0000000 -2.0000000 2.0000000 2.0000000 2.0000000 -2.0000000
[79] -2.0000000 -2.0000000 -2.0000000 2.0000000 2.0000000 -2.0000000
[85] -2.0000000 -2.0000000 2.0000000 2.0000000 2.0000000 -2.0000000
[91] -2.0000000 2.0000000 -2.0000000 2.0000000 2.0000000 2.0000000
[97] 2.0000000 -2.0000000 -2.0000000 1.1425337
练习4:数据框新建与取子集
4.1 新建这个数据框 (提示:后面的三列是rnorm()
image.png
> gene <- paste0("gene",1:15) #循环补齐
> gene
[1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6" "gene7"
[8] "gene8" "gene9" "gene10" "gene11" "gene12" "gene13" "gene14"
[15] "gene15"
> gene <- paste0(rep("gene",times=15),1:15)
> gene
[1] "gene1" "gene2" "gene3" "gene4" "gene5" "gene6" "gene7"
[8] "gene8" "gene9" "gene10" "gene11" "gene12" "gene13" "gene14"
[15] "gene15"
> s1 <- rnorm(15,mean = 0, sd=1)
> s1
[1] 0.68501477 3.26641452 0.56060046 -0.06901730 -0.97244294
[6] -0.54658659 -1.68869233 -1.57237270 -0.40498716 0.31928642
[11] 0.04042768 -0.39000956 -1.81922223 0.65918071 0.45962167
> s2 <- rnorm(15, mean = 0, sd=1)
> s2
[1] 1.6166263 -1.8561905 -0.2868239 1.7503219 0.1164136 1.3842532
[7] 0.5742209 0.1364908 0.9142160 -1.8008263 -0.3398806 0.6062646
[13] 1.3411303 0.7672873 0.1937257
> s3 <- rnorm(15, mean = 0, sd=1)
> s3
[1] 1.14056669 0.01386480 -1.10530591 -0.02516264 -0.16367334
[6] 0.37005975 -0.38082454 0.65295237 2.06134181 -1.79664494
[11] 0.58407712 -0.72275312 -0.62916466 -1.81620605 -0.25928910
> my_dataframe <- data.frame(gene, s1, s2, s3)
> my_dataframe
gene s1 s2 s3
1 gene1 0.68501477 1.6166263 1.14056669
2 gene2 3.26641452 -1.8561905 0.01386480
3 gene3 0.56060046 -0.2868239 -1.10530591
4 gene4 -0.06901730 1.7503219 -0.02516264
5 gene5 -0.97244294 0.1164136 -0.16367334
6 gene6 -0.54658659 1.3842532 0.37005975
7 gene7 -1.68869233 0.5742209 -0.38082454
8 gene8 -1.57237270 0.1364908 0.65295237
9 gene9 -0.40498716 0.9142160 2.06134181
10 gene10 0.31928642 -1.8008263 -1.79664494
11 gene11 0.04042768 -0.3398806 0.58407712
12 gene12 -0.39000956 0.6062646 -0.72275312
13 gene13 -1.81922223 1.3411303 -0.62916466
14 gene14 0.65918071 0.7672873 -1.81620605
15 gene15 0.45962167 0.1937257 -0.25928910
4.2 提取第一列(两种方法)
> my_dataframe$gene
[1] gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene8 gene9 gene10
[11] gene11 gene12 gene13 gene14 gene15
15 Levels: gene1 gene10 gene11 gene12 gene13 gene14 gene15 gene2 ... gene9
> my_dataframe[,1]
[1] gene1 gene2 gene3 gene4 gene5 gene6 gene7 gene8 gene9 gene10
[11] gene11 gene12 gene13 gene14 gene15
15 Levels: gene1 gene10 gene11 gene12 gene13 gene14 gene15 gene2 ... gene9
>
4.3 提取第二行
> my_dataframe[2,]
gene s1 s2 s3
2 gene2 3.266415 -1.85619 0.0138648
4.4 提取第3行第4列
> my_dataframe[3,4]
[1] -1.105306
4.5 提取行名和列名
> colnames(my_dataframe)
[1] "gene" "s1" "s2" "s3"
> row.names(my_dataframe)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"
[15] "15"
4.6 求第2列的平均值
> mean(my_dataframe[,2])
[1] -0.09818564
> mean(my_dataframe$s1)
[1] -0.09818564
4.7 按照列名提取s1,s3列
> my_dataframe$s1
[1] 0.68501477 3.26641452 0.56060046 -0.06901730 -0.97244294
[6] -0.54658659 -1.68869233 -1.57237270 -0.40498716 0.31928642
[11] 0.04042768 -0.39000956 -1.81922223 0.65918071 0.45962167
> my_dataframe$s3
[1] 1.14056669 0.01386480 -1.10530591 -0.02516264 -0.16367334
[6] 0.37005975 -0.38082454 0.65295237 2.06134181 -1.79664494
[11] 0.58407712 -0.72275312 -0.62916466 -1.81620605 -0.25928910
4.8 筛选s3列大于0的行
> my_dataframe$s3[my_dataframe$s3>0]
[1] 1.1405667 0.0138648 0.3700597 0.6529524 2.0613418 0.5840771
> my_dataframe[,4][my_dataframe[,4]>0]
[1] 1.1405667 0.0138648 0.3700597 0.6529524 2.0613418 0.5840771
5 安装任意两个R包
> BiocManager::install()
Bioconductor version 3.9 (BiocManager 1.30.4), R 3.6.0 (2019-04-26)
installation path not writeable, unable to update packages: boot, cluster,
foreign, KernSmooth, mgcv, nlme
Update old packages: 'AnnotationDbi', 'backports', 'BiocManager',
'BiocParallel', 'biomaRt', 'blob', 'callr', 'car', 'carData',
'checkmate', 'clipr', 'cowplot', 'curl', 'data.table', 'devtools',
'digest', 'doParallel', 'dplyr', 'edgeR', 'effects', 'ellipsis',
'FactoMineR', 'farver', 'fgsea', 'foreach', 'GenomicRanges', 'ggforce',
'ggplot2', 'ggplotify', 'ggpubr', 'ggraph', 'ggsignif', 'git2r',
'haven', 'hexbin', 'Hmisc', 'hms', 'htmlTable', 'htmltools',
'htmlwidgets', 'httpuv', 'httr', 'IRanges', 'iterators', 'knitr',
'lambda.r', 'later', 'lava', 'limma', 'maptools', 'markdown',
'matrixStats', 'mclust', 'openssl', 'openxlsx', 'pillar', 'pkgbuild',
'pkgconfig', 'processx', 'prodlim', 'promises', 'purrr', 'quantreg',
'R6', 'Rcpp', 'RcppArmadillo', 'RcppEigen', 'rlang', 'robust',
'RSQLite', 'rvcheck', 'S4Vectors', 'shiny', 'sp',
'SummarizedExperiment', 'survival', 'sys', 'tidyr', 'usethis', 'vctrs',
'whisker', 'xfun', 'xml2', 'zip'
Update all/some/none? [a/s/n]:
n
> library(BiocManager)
Bioconductor version 3.9 (BiocManager 1.30.4), ?BiocManager::install for
help
Bioconductor version '3.9' is out-of-date; the current release version
'3.10' is available with R version '3.6'; see
https://bioconductor.org/install
练习6 文件的读取和导出
6.1 读取complete_set.txt(已保存在工作目录)
> read.table("complete_set.txt")
V1 V2 V3
1 geneA geneB geneC
2 -0.635020187971398 -0.49728008811353 0.514896730700242
3 0.91605661780324 -0.545381308500589 1.20238322656491
4 0.805995294157758 -0.315914513323816 0.27825197143441
6.2 查看有多少行、多少列
> nrow(read.table("complete_set.txt"))
[1] 51
> ncol(read.table("complete_set.txt"))
[1] 20
6.3.获取行名和列名
> row.names(read.table("complete_set.txt"))
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"
[15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
[29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
[43] "43" "44" "45" "46" "47" "48" "49" "50" "51"
> colnames(read.table("complete_set.txt"))
[1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11"
[12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20"
6.4.导出为csv格式,再读取它
> write.table(read.table("complete_set.txt"), file="complete_set1.csv")
> read.table("complete_set1.csv")
V1 V2 V3
1 geneA geneB geneC
2 -0.635020187971398 -0.49728008811353 0.514896730700242
3 0.91605661780324 -0.545381308500589 1.20238322656491
6.5.保存为Rdata,再加载它
> x=read.table("complete_set1.csv")
> save(x,file = "complete_set1.Rdata")
> load("complete_set1.Rdata")
练习7 :tidyr_dplyr
7.1.将iris数据框的前4列gather,然后还原
test <- iris
head(iris)
iris_g <- gather(test, s_p, exp, -Species)
head(iris_g)
iris_g %>%
group_by(s_p) %>%
mutate(id=1:n()) %>%
spread(s_p, exp)
7.2.将第三列分成两列(以小数点为分隔符)然后合并
head(iris_s)
iris_s <- separate(test,Petal.Length,c("Petal", "Length"),sep = "[.]")
head(iris_s)
7.3.加载test.Rdata,将deg数据框按照pvalue从小到大排序
load("test.Rdata")
head(deg)
head(arrange(deg,P.Value))
7.4. 将两个数据框按照probe_id列连接在一起
merge(deg,ids,by="probe_id")
网友评论