美文网首页
Steven的R语言初级作业

Steven的R语言初级作业

作者: Steven潘 | 来源:发表于2019-03-30 22:09 被阅读0次
    我的第一篇简书笔记,就从R语言的入门习题开始~

    今天做了Jimmy老师的R语言初级练习题,还没有全部写完,打算分两次完成。题目的来源是http://www.bio-info-trainee.com/3793.html。除了学习Jimmy老师的B站视频和《R语言实战》的书本以外,我加入了一点点自己摸索的过程。相比起完成规定工作,或许在报错的边缘试探能够有助于强化我的记忆。生信路漫漫,跟对了人最重要,真的非常感谢Jimmy老师的热情关照~~ 小萌新今后将要不惧挫折,不懈努力!

    下面就是我的作业内容了:


    工作目录

    > getwd()    #返回值为当前工作目录
    [1] "E:/My_Program/R_Start"
    

    向量

    character <- c("abc","def","ghi")
    numeric <- c(1,-2,3)
    logical <- c(F,T,T)
    complex <- c(1+2i,2i)
    num1 <- 2:4
    num2 <- seq(2.5,3.5, by=0.5)#等差数列
    num3 <- rep(c(1,3), each=2) #对元素逐一重复
    num4 <- rep(1:2, times=2)   #对向量重复
    

    矩阵

    matrix_a <- matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
    

    数组

    dim1 <- c("A1", "A2")
    dim2 <- c("B1", "B2", "B3")
    dim3 <- c("C1", "C2", "C3", "C4")
    z <- array(1:24, c(2,3,4), dimnames=list(dim1, dim2, dim3))  
    #dimnames是各维度的标签构成的列表
    
    若不加标签
    > z <- array(1:24, c(2,3,4))
    > z
    , , 1
    
         [,1] [,2] [,3]
    [1,]    1    3    5
    [2,]    2    4    6
    
    , , 2
    
         [,1] [,2] [,3]
    [1,]    7    9   11
    [2,]    8   10   12
    
    , , 3
    
         [,1] [,2] [,3]
    [1,]   13   15   17
    [2,]   14   16   18
    
    , , 4
    
         [,1] [,2] [,3]
    [1,]   19   21   23
    [2,]   20   22   24
    

    数据框

    > col1 <- c(1,2,3)
    > col2 <- c("a","b","c")
    > df <- data.frame(col1,col2)    #用等长的向量作为列来创建数据框,向量的类型可以不同
    > df
      col1 col2
    1    1    a
    2    2    b
    3    3    c
    
    • 几种对数据框切片的方法
    > df$col1          #用$符号取值,结果为向量
    [1] 1 2 3
    > df_col1 <- df$col1
    > str(df_col1)
     num [1:3] 1 2 3
    
    > df[1]            #而用[]切片,结果为数据框
      col1
    1    1
    2    2
    3    3
    > df["col1"]
      col1
    1    1
    2    2
    3    3
    
    > df_1 <- df[1]
    > str(df_1)
    'data.frame':   3 obs. of  1 variable:
     $ col1: num  1 2 3
    
    > df_col1 <- df["col1"]
    > str(df_col1)
    'data.frame':   3 obs. of  1 variable:
     $ col1: num  1 2 3
    
    > df[,1]             #用[ ,y]按列切片。第一列切出来是向量
    [1] 1 2 3
    > str(df[,1])
     num [1:3] 1 2 3
    > df[,2]             #第二列是字符型的,切出来是因子
    [1] a b c
    Levels: a b c        #如果要保留向量的话,创建数据框的时候加上StringsAsFactors=F
    > str(df[,2])
     Factor w/ 3 levels "a","b","c": 1 2 3
     
    > df[1,1]            #用[x,y]可以取第x行第y列的元素
    [1] 1
    > df[1,2]
    [1] a
    Levels: a b c        #字符也会变成因子
    
    > str(df[2,])        #按行切片的话,由于数据类型不一样,得到的仍是数据框
    'data.frame':   1 obs. of  2 variables:
     $ col1: num 2
     $ col2: Factor w/ 3 levels "a","b","c": 2
     
    
    对于按行切得的数据框,还可以继续切
    > df[1,][2]            #得到数据框
      col2
    1    a
    > df[1,][,2]           #得到因子
    [1] a
    Levels: a b c
    > df[1,]$col2          #得到因子
    [1] a
    Levels: a b c
    > df[1,][[2]]          #得到因子
    [1] a
    Levels: a b c
    
    在数据框里,用[[]]和[]切片似乎没有任何区别
    > df[[1]]            #用[[]]取值,得到的也是向量
    [1] 1 2 3
    > str(df[[1]])
     num [1:3] 1 2 3
     
    > df[[1]][2]         #进而可以取第一行第二列的元素
    [1] 2
    > str(df[[1]][2])
     num 2
    > df[[1,2]]          #这样取元素也可以,得到了因子
    [1] a
    Levels: a b c
    > str(df[[1,2]])
     Factor w/ 3 levels "a","b","c": 1
     
    > df[["col1"]]
    [1] 1 2 3
    > df[["col2"]]       #这样也是因子
    [1] a b c
    Levels: a b c
    > df[[2]]            #同理,用下标索引和标签索引结果是一样的
    [1] a b c
    Levels: a b c
    
    玩了这么多,有点偏题了,咳咳
    接下来做一下作业:创建一个数据框,做切片
    > o <- 1:4
    > p <- c("a","b","c","d")
    > q <- 11:14
    > r <- c(T,T,F,T)
    > frame1 <- data.frame(o,p,q,r,stringsAsFactors = F)
    > frame1
      o p  q     r
    1 1 a 11  TRUE
    2 2 b 12  TRUE
    3 3 c 13 FALSE
    4 4 d 14  TRUE
    > frame2 <- frame1[c(1,3),][,2:4]
    > frame2
      p  q     r
    1 a 11  TRUE
    3 c 13 FALSE
    

    下一题

    #读入sample.csv
    > df=read.csv("sample.csv")
    > dim(df)                 #查看行列数
    [1] 768  12
    > colnames(df)            #查看列名
     [1] "Accession"           "Title"              
     [3] "Sample.Type"         "Taxonomy"           
     [5] "Channels"            "Platform"           
     [7] "Series"              "Supplementary.Types"
     [9] "Supplementary.Links" "SRA.Accession"      
    [11] "Contact"             "Release.Date"    
    > str(df)
    'data.frame':   768 obs. of  12 variables:       #12个列768行
     $ Accession          : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ Title              : Factor w/ 768 levels "SS2_15_0048_A1",..: 1 12 18 19 20 21 22 23 24 2 ...
     $ Sample.Type        : Factor w/ 1 level "SRA": 1 1 1 1 1 1 1 1 1 1 ...
     $ Taxonomy           : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
     $ Channels           : int  1 1 1 1 1 1 1 1 1 1 ...
     $ Platform           : Factor w/ 1 level "GPL13112": 1 1 1 1 1 1 1 1 1 1 ...
     $ Series             : Factor w/ 1 level "GSE111229": 1 1 1 1 1 1 1 1 1 1 ...
     $ Supplementary.Types: Factor w/ 1 level "SRA Run Selector": 1 1 1 1 1 1 1 1 1 1 ...
     $ Supplementary.Links: Factor w/ 768 levels "https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901",..: 2 3 4 5 6 7 8 9 10 1 ...
     $ SRA.Accession      : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
     $ Contact            : Factor w/ 1 level "Kristian Pietras": 1 1 1 1 1 1 1 1 1 1 ...
     $ Release.Date       : Factor w/ 1 level "Nov 23, 2018": 1 1 1 1 1 1 1 1 1 1 ...
    
    #读入SraRunTable.txt
    > df1 <- read.table("SraRunTable.txt",header = TRUE, sep="\t", fill= TRUE)
    > # header表示第一列是否为标题栏,fill表示是否将空的单元格用空格填充
    > str(df1)
    'data.frame':   768 obs. of  31 variables:
     $ BioSample         : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
     $ Experiment        : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
     $ MBases            : int  16 16 8 8 11 7 18 5 11 15 ...
     $ MBytes            : int  8 8 4 4 5 4 9 3 6 8 ...
     $ Run               : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ SRA_Sample        : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
     $ Sample_Name       : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ Assay_Type        : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
     $ AssemblyName      : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
     $ AvgSpotLen        : int  43 43 43 43 43 43 43 43 43 43 ...
     $ BioProject        : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
     $ Center_Name       : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
     $ Consent           : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
     $ DATASTORE_filetype: Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
     $ DATASTORE_provider: Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
     $ InsertSize        : int  0 0 0 0 0 0 0 0 0 0 ...
     $ Instrument        : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
     $ LibraryLayout     : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
     $ LibrarySelection  : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
     $ LibrarySource     : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
     $ LoadDate          : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
     $ Organism          : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
     $ Platform          : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
     $ ReleaseDate       : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
     $ SRA_Study         : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
     $ age               : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
     $ cell_type         : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
     $ marker_genes      : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
     $ source_name       : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
     $ strain            : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
     $ tissue            : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
     
     #合成
    > df2 <- merge(df,df1,by.x="Accession",by.y="Sample_Name")    #用by将关联的两列对映起来
    str(df2)
    'data.frame':   768 obs. of  42 variables:
     $ Accession          : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ Title              : Factor w/ 768 levels "SS2_15_0048_A1",..: 1 12 18 19 20 21 22 23 24 2 ...
     $ Sample.Type        : Factor w/ 1 level "SRA": 1 1 1 1 1 1 1 1 1 1 ...
     $ Taxonomy           : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
     $ Channels           : int  1 1 1 1 1 1 1 1 1 1 ...
     $ Platform.x         : Factor w/ 1 level "GPL13112": 1 1 1 1 1 1 1 1 1 1 ...
     $ Series             : Factor w/ 1 level "GSE111229": 1 1 1 1 1 1 1 1 1 1 ...
     $ Supplementary.Types: Factor w/ 1 level "SRA Run Selector": 1 1 1 1 1 1 1 1 1 1 ...
     $ Supplementary.Links: Factor w/ 768 levels "https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901",..: 2 3 4 5 6 7 8 9 10 1 ...
     $ SRA.Accession      : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
     $ Contact            : Factor w/ 1 level "Kristian Pietras": 1 1 1 1 1 1 1 1 1 1 ...
     $ Release.Date       : Factor w/ 1 level "Nov 23, 2018": 1 1 1 1 1 1 1 1 1 1 ...
     $ BioSample          : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
     $ Experiment         : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
     $ MBases             : int  16 16 8 8 11 7 18 5 11 15 ...
     $ MBytes             : int  8 8 4 4 5 4 9 3 6 8 ...
     $ Run                : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
     $ SRA_Sample         : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
     $ Assay_Type         : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
     $ AssemblyName       : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
     $ AvgSpotLen         : int  43 43 43 43 43 43 43 43 43 43 ...
     $ BioProject         : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
     $ Center_Name        : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
     $ Consent            : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
     $ DATASTORE_filetype : Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
     $ DATASTORE_provider : Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
     $ InsertSize         : int  0 0 0 0 0 0 0 0 0 0 ...
     $ Instrument         : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
     $ LibraryLayout      : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
     $ LibrarySelection   : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
     $ LibrarySource      : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
     $ LoadDate           : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
     $ Organism           : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
     $ Platform.y         : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
     $ ReleaseDate        : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
     $ SRA_Study          : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
     $ age                : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
     $ cell_type          : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
     $ marker_genes       : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
     $ source_name        : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
     $ strain             : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
     $ tissue             : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
    

    R语言是基本功,我想走得扎实一些,所以每次学的内容不是太多。今天就先做这些啦,下次继续~

    相关文章

      网友评论

          本文标题:Steven的R语言初级作业

          本文链接:https://www.haomeiwen.com/subject/cqqebqtx.html