美文网首页
2019-11-20 生信R练习

2019-11-20 生信R练习

作者: __一蓑烟雨__ | 来源:发表于2019-11-20 21:38 被阅读0次
    1. 分别判断一下“a”,TRUE,3是什么数据类型? 提示:typeof(),将要判断的内容放进括号里
    > typeof("a")
    [1] "character"
    > typeof(TRUE)
    [1] "logical"
    > typeof(3)
    [1] "double"
    
    1. 练习2 向量生成
      2.1. 生成任意向量
    > c(1,3,6,10)
    [1]  1  3  6 10
    

    2.2 生成1到30之间所有4的倍数,答案是 #4,8,12,16,20,24,28

    > seq(from=4, to=28, by=4)
    [1]  4  8 12 16 20 24 28
    

    2.3 生成sample4,sample8,sample12…sample28 提示:用paste0,请尝试改写刚才的代码

    > paste0(rep("sample", times=7), seq(from=4, to=28, by=4))
    [1] "sample4"  "sample8"  "sample12" "sample16" "sample20" "sample24"
    [7] "sample28"
    

    思考如何从50个数中筛选小于7的?
    50个数组成向量,赋值给x
    用x<7判断返回50个逻辑值
    挑选结果为TRUE的

    
    > x <- runif(50, min = 1, max = 60)
    > x
     [1] 13.245162 14.490830 36.147008 34.917460  5.546798  3.096894 38.924934
     [8] 55.788297 36.287453 34.093144 32.035636 59.120618 30.950868 41.284497
    [15] 36.490932 15.093252 16.231790 44.029268 27.701679 11.332479 45.055198
    [22]  7.194271 52.008152 37.264053 33.872413 20.397862 27.734755 30.526017
    [29] 11.671115 32.248206  5.441269 17.387600 13.549272 17.802638 53.810552
    [36] 27.327884 47.019108 52.956523 25.374328  4.764700 20.793762 43.699831
    [43] 20.919305 38.194433 50.596259 51.511768 24.090198 23.449139 53.831280
    [50] 39.014630
    > x[x<7]
    [1] 5.546798 3.096894 5.441269 4.764700
    

    练习3:向量取子集
    3.1 将基因名 “ACTR3B”,“ANLN”,“BAG1”,“BCL2”,“BIRC5”,“RAB”,“ABCT”,“ANF”,“BAD”,“BCF”,“BARC7”,“BAL V”组成一个向量,赋值给x

    > x <- c("ACTR3B","ANLN","BAG1","BCL2","BIRC5","RAB","ABCT",
    +        "ANF","BAD","BCF","BARC7","BALV")
    > x
     [1] "ACTR3B" "ANLN"   "BAG1"   "BCL2"   "BIRC5"  "RAB"    "ABCT"  
     [8] "ANF"    "BAD"    "BCF"    "BARC7"  "BALV"  
    

    3.2 用函数计算向量长度

    > length(x)
    [1] 12
    

    3.3 用向量取子集的方法,选出第1,3,5,7,9,11个基因名。

    > x[seq(from=1, to=11, by=2)]
    [1] "ACTR3B" "BAG1"   "BIRC5"  "ABCT"   "BAD"    "BARC7"
    

    3.4 用向量取子集的方法,选出第1到7、10-15个基因名。

    > x[c(1:7,10:15)]
     [1] "ACTR3B" "ANLN"   "BAG1"   "BCL2"   "BIRC5"  "RAB"    "ABCT"  
     [8] "BCF"    "BARC7"  "BALV"   NA       NA       NA  
    

    3.5 用向量取子集的方法,选出出在c(“ANLN”, “BCL2”,“TP53”)中有的基因名。 提示:%in%

     x[x %in% c("ANLN", "BCL2","TP53")]
    [1] "ANLN" "BCL2"
    

    3.6 修改第6个基因名为“a”

    > x[6] <- "a"
    > x
     [1] "ACTR3B" "ANLN"   "BAG1"   "BCL2"   "BIRC5"  "a"      "ABCT"  
     [8] "ANF"    "BAD"    "BCF"    "BARC7"  "BALV" 
    

    3.7 生成100个随机数: rnorm(n=100,mean=0,sd=18) 将小于-2的统一改为-2,将大于2的统一改为2

    > x <- rnorm(n=100,mean=0,sd=18)
    > x
      [1]  19.8391787  -0.3014686   2.9121954  36.4457050 -12.6664966
      [6]  17.2942629  32.2287310 -19.1549729   0.3174578  -7.0183553
     [11]  -8.8349895 -18.8229177 -16.1318028  22.8489689  10.6891371
     [16]  13.9614177  28.0326668  -6.5772323  14.6980161  -1.0914260
     [21]  -9.0248097  16.6691291   0.6648784 -19.1916031  -4.2922144
     [26]  26.9140220  21.0988538 -26.2387298   1.7110121  15.2579693
     [31] -29.2385615  25.3541404  -9.7516865   5.0159650  -3.4915094
     [36]  28.3708473 -26.5598574  -2.6029477 -17.1576566   7.3177692
     [41]  40.1267196 -27.2609461  -1.1107336  -2.6508742  27.7486752
     [46] -17.6734020   8.9384071  30.5450619  -4.6932536 -12.7067145
     [51]  -2.9012131   9.0237929 -18.2437141  29.0655402   0.1015557
     [56] -52.2881831 -19.9289667  27.8562048 -17.5829463  -1.8270621
     [61]   0.7677045 -28.7409243   8.8374127   7.5888606  33.7302702
     [66]  18.6212578   1.4725856  -1.4854277  10.9093218 -15.9735626
     [71]   1.8975850   6.3517405   9.9070805 -20.4179574  26.3223277
     [76]  12.6381008  45.1280007 -34.0204886 -10.6166302 -30.8610413
     [81]  -7.5779622   5.5825448  30.6462705  -7.9809265 -21.5747475
     [86]  -5.5328565  11.1789756   3.2742394  23.7312168  -5.3803676
     [91] -29.6679914  17.1269725 -20.0362132  11.1053966   9.2428869
     [96]   6.6502639  31.0300944  -3.7106022 -23.6555125   1.1425337
    > x[x< -2] <- -2
    > x[x> 2] <- 2
    > x
      [1]  2.0000000 -0.3014686  2.0000000  2.0000000 -2.0000000  2.0000000
      [7]  2.0000000 -2.0000000  0.3174578 -2.0000000 -2.0000000 -2.0000000
     [13] -2.0000000  2.0000000  2.0000000  2.0000000  2.0000000 -2.0000000
     [19]  2.0000000 -1.0914260 -2.0000000  2.0000000  0.6648784 -2.0000000
     [25] -2.0000000  2.0000000  2.0000000 -2.0000000  1.7110121  2.0000000
     [31] -2.0000000  2.0000000 -2.0000000  2.0000000 -2.0000000  2.0000000
     [37] -2.0000000 -2.0000000 -2.0000000  2.0000000  2.0000000 -2.0000000
     [43] -1.1107336 -2.0000000  2.0000000 -2.0000000  2.0000000  2.0000000
     [49] -2.0000000 -2.0000000 -2.0000000  2.0000000 -2.0000000  2.0000000
     [55]  0.1015557 -2.0000000 -2.0000000  2.0000000 -2.0000000 -1.8270621
     [61]  0.7677045 -2.0000000  2.0000000  2.0000000  2.0000000  2.0000000
     [67]  1.4725856 -1.4854277  2.0000000 -2.0000000  1.8975850  2.0000000
     [73]  2.0000000 -2.0000000  2.0000000  2.0000000  2.0000000 -2.0000000
     [79] -2.0000000 -2.0000000 -2.0000000  2.0000000  2.0000000 -2.0000000
     [85] -2.0000000 -2.0000000  2.0000000  2.0000000  2.0000000 -2.0000000
     [91] -2.0000000  2.0000000 -2.0000000  2.0000000  2.0000000  2.0000000
     [97]  2.0000000 -2.0000000 -2.0000000  1.1425337
    

    练习4:数据框新建与取子集

    4.1 新建这个数据框 (提示:后面的三列是rnorm()


    image.png
    > gene <- paste0("gene",1:15) #循环补齐
    > gene
     [1] "gene1"  "gene2"  "gene3"  "gene4"  "gene5"  "gene6"  "gene7" 
     [8] "gene8"  "gene9"  "gene10" "gene11" "gene12" "gene13" "gene14"
    [15] "gene15"
    > gene <- paste0(rep("gene",times=15),1:15)
    > gene
     [1] "gene1"  "gene2"  "gene3"  "gene4"  "gene5"  "gene6"  "gene7" 
     [8] "gene8"  "gene9"  "gene10" "gene11" "gene12" "gene13" "gene14"
    [15] "gene15"
    > s1 <- rnorm(15,mean = 0, sd=1)
    > s1
     [1]  0.68501477  3.26641452  0.56060046 -0.06901730 -0.97244294
     [6] -0.54658659 -1.68869233 -1.57237270 -0.40498716  0.31928642
    [11]  0.04042768 -0.39000956 -1.81922223  0.65918071  0.45962167
    > s2 <- rnorm(15, mean = 0, sd=1)
    > s2
     [1]  1.6166263 -1.8561905 -0.2868239  1.7503219  0.1164136  1.3842532
     [7]  0.5742209  0.1364908  0.9142160 -1.8008263 -0.3398806  0.6062646
    [13]  1.3411303  0.7672873  0.1937257
    > s3 <- rnorm(15, mean = 0, sd=1)
    > s3
     [1]  1.14056669  0.01386480 -1.10530591 -0.02516264 -0.16367334
     [6]  0.37005975 -0.38082454  0.65295237  2.06134181 -1.79664494
    [11]  0.58407712 -0.72275312 -0.62916466 -1.81620605 -0.25928910
    > my_dataframe <- data.frame(gene, s1, s2, s3)
    > my_dataframe
         gene          s1         s2          s3
    1   gene1  0.68501477  1.6166263  1.14056669
    2   gene2  3.26641452 -1.8561905  0.01386480
    3   gene3  0.56060046 -0.2868239 -1.10530591
    4   gene4 -0.06901730  1.7503219 -0.02516264
    5   gene5 -0.97244294  0.1164136 -0.16367334
    6   gene6 -0.54658659  1.3842532  0.37005975
    7   gene7 -1.68869233  0.5742209 -0.38082454
    8   gene8 -1.57237270  0.1364908  0.65295237
    9   gene9 -0.40498716  0.9142160  2.06134181
    10 gene10  0.31928642 -1.8008263 -1.79664494
    11 gene11  0.04042768 -0.3398806  0.58407712
    12 gene12 -0.39000956  0.6062646 -0.72275312
    13 gene13 -1.81922223  1.3411303 -0.62916466
    14 gene14  0.65918071  0.7672873 -1.81620605
    15 gene15  0.45962167  0.1937257 -0.25928910
    

    4.2 提取第一列(两种方法)

     > my_dataframe$gene
     [1] gene1  gene2  gene3  gene4  gene5  gene6  gene7  gene8  gene9  gene10
    [11] gene11 gene12 gene13 gene14 gene15
    15 Levels: gene1 gene10 gene11 gene12 gene13 gene14 gene15 gene2 ... gene9
    > my_dataframe[,1]
     [1] gene1  gene2  gene3  gene4  gene5  gene6  gene7  gene8  gene9  gene10
    [11] gene11 gene12 gene13 gene14 gene15
    15 Levels: gene1 gene10 gene11 gene12 gene13 gene14 gene15 gene2 ... gene9
    > 
    

    4.3 提取第二行

    > my_dataframe[2,]
       gene       s1       s2        s3
    2 gene2 3.266415 -1.85619 0.0138648
    

    4.4 提取第3行第4列

    > my_dataframe[3,4]
    [1] -1.105306
    

    4.5 提取行名和列名

    > colnames(my_dataframe)
    [1] "gene" "s1"   "s2"   "s3"  
    > row.names(my_dataframe)
     [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
    [15] "15"
    

    4.6 求第2列的平均值

    > mean(my_dataframe[,2])
    [1] -0.09818564
    > mean(my_dataframe$s1)
    [1] -0.09818564
    

    4.7 按照列名提取s1,s3列

    > my_dataframe$s1
     [1]  0.68501477  3.26641452  0.56060046 -0.06901730 -0.97244294
     [6] -0.54658659 -1.68869233 -1.57237270 -0.40498716  0.31928642
    [11]  0.04042768 -0.39000956 -1.81922223  0.65918071  0.45962167
    > my_dataframe$s3
     [1]  1.14056669  0.01386480 -1.10530591 -0.02516264 -0.16367334
     [6]  0.37005975 -0.38082454  0.65295237  2.06134181 -1.79664494
    [11]  0.58407712 -0.72275312 -0.62916466 -1.81620605 -0.25928910
    

    4.8 筛选s3列大于0的行

    > my_dataframe$s3[my_dataframe$s3>0]
    [1] 1.1405667 0.0138648 0.3700597 0.6529524 2.0613418 0.5840771
    > my_dataframe[,4][my_dataframe[,4]>0]
    [1] 1.1405667 0.0138648 0.3700597 0.6529524 2.0613418 0.5840771
    

    5 安装任意两个R包

    > BiocManager::install()
    Bioconductor version 3.9 (BiocManager 1.30.4), R 3.6.0 (2019-04-26)
    installation path not writeable, unable to update packages: boot, cluster,
      foreign, KernSmooth, mgcv, nlme
    Update old packages: 'AnnotationDbi', 'backports', 'BiocManager',
      'BiocParallel', 'biomaRt', 'blob', 'callr', 'car', 'carData',
      'checkmate', 'clipr', 'cowplot', 'curl', 'data.table', 'devtools',
      'digest', 'doParallel', 'dplyr', 'edgeR', 'effects', 'ellipsis',
      'FactoMineR', 'farver', 'fgsea', 'foreach', 'GenomicRanges', 'ggforce',
      'ggplot2', 'ggplotify', 'ggpubr', 'ggraph', 'ggsignif', 'git2r',
      'haven', 'hexbin', 'Hmisc', 'hms', 'htmlTable', 'htmltools',
      'htmlwidgets', 'httpuv', 'httr', 'IRanges', 'iterators', 'knitr',
      'lambda.r', 'later', 'lava', 'limma', 'maptools', 'markdown',
      'matrixStats', 'mclust', 'openssl', 'openxlsx', 'pillar', 'pkgbuild',
      'pkgconfig', 'processx', 'prodlim', 'promises', 'purrr', 'quantreg',
      'R6', 'Rcpp', 'RcppArmadillo', 'RcppEigen', 'rlang', 'robust',
      'RSQLite', 'rvcheck', 'S4Vectors', 'shiny', 'sp',
      'SummarizedExperiment', 'survival', 'sys', 'tidyr', 'usethis', 'vctrs',
      'whisker', 'xfun', 'xml2', 'zip'
    Update all/some/none? [a/s/n]: 
    n
    > library(BiocManager)
    Bioconductor version 3.9 (BiocManager 1.30.4), ?BiocManager::install for
      help
    Bioconductor version '3.9' is out-of-date; the current release version
      '3.10' is available with R version '3.6'; see
      https://bioconductor.org/install
    

    练习6 文件的读取和导出
    6.1 读取complete_set.txt(已保存在工作目录)

    > read.table("complete_set.txt")
                         V1                   V2                 V3
    1                 geneA                geneB              geneC
    2    -0.635020187971398    -0.49728008811353  0.514896730700242
    3      0.91605661780324   -0.545381308500589   1.20238322656491
    4     0.805995294157758   -0.315914513323816   0.27825197143441
    

    6.2 查看有多少行、多少列

    > nrow(read.table("complete_set.txt"))
    [1] 51
    > ncol(read.table("complete_set.txt"))
    [1] 20
    

    6.3.获取行名和列名

    > row.names(read.table("complete_set.txt"))
     [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
    [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
    [29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
    [43] "43" "44" "45" "46" "47" "48" "49" "50" "51"
    > colnames(read.table("complete_set.txt"))
     [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11"
    [12] "V12" "V13" "V14" "V15" "V16" "V17" "V18" "V19" "V20"
    

    6.4.导出为csv格式,再读取它

    > write.table(read.table("complete_set.txt"), file="complete_set1.csv")
    > read.table("complete_set1.csv")
                         V1                   V2                 V3
    1                 geneA                geneB              geneC
    2    -0.635020187971398    -0.49728008811353  0.514896730700242
    3      0.91605661780324   -0.545381308500589   1.20238322656491
    

    6.5.保存为Rdata,再加载它

    > x=read.table("complete_set1.csv")
    > save(x,file = "complete_set1.Rdata")
    > load("complete_set1.Rdata")
    

    练习7 :tidyr_dplyr

    7.1.将iris数据框的前4列gather,然后还原

    test <- iris
    head(iris)
    iris_g <- gather(test, s_p, exp, -Species)
    head(iris_g)
    iris_g %>% 
      group_by(s_p) %>% 
      mutate(id=1:n()) %>% 
      spread(s_p, exp)
    

    7.2.将第三列分成两列(以小数点为分隔符)然后合并

    head(iris_s)
    iris_s <- separate(test,Petal.Length,c("Petal", "Length"),sep = "[.]")
    head(iris_s)
    

    7.3.加载test.Rdata,将deg数据框按照pvalue从小到大排序

    load("test.Rdata")
    head(deg)
    head(arrange(deg,P.Value))
    

    7.4. 将两个数据框按照probe_id列连接在一起

    merge(deg,ids,by="probe_id")
    

    相关文章

      网友评论

          本文标题:2019-11-20 生信R练习

          本文链接:https://www.haomeiwen.com/subject/uhjaictx.html