我的第一篇简书笔记,就从R语言的入门习题开始~
今天做了Jimmy老师的R语言初级练习题,还没有全部写完,打算分两次完成。题目的来源是http://www.bio-info-trainee.com/3793.html。除了学习Jimmy老师的B站视频和《R语言实战》的书本以外,我加入了一点点自己摸索的过程。相比起完成规定工作,或许在报错的边缘试探能够有助于强化我的记忆。生信路漫漫,跟对了人最重要,真的非常感谢Jimmy老师的热情关照~~ 小萌新今后将要不惧挫折,不懈努力!
下面就是我的作业内容了:
工作目录
> getwd() #返回值为当前工作目录
[1] "E:/My_Program/R_Start"
向量
character <- c("abc","def","ghi")
numeric <- c(1,-2,3)
logical <- c(F,T,T)
complex <- c(1+2i,2i)
num1 <- 2:4
num2 <- seq(2.5,3.5, by=0.5)#等差数列
num3 <- rep(c(1,3), each=2) #对元素逐一重复
num4 <- rep(1:2, times=2) #对向量重复
矩阵
matrix_a <- matrix(1:6, nrow=2, ncol=3, byrow=TRUE)
数组
dim1 <- c("A1", "A2")
dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")
z <- array(1:24, c(2,3,4), dimnames=list(dim1, dim2, dim3))
#dimnames是各维度的标签构成的列表
若不加标签
> z <- array(1:24, c(2,3,4))
> z
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24
数据框
> col1 <- c(1,2,3)
> col2 <- c("a","b","c")
> df <- data.frame(col1,col2) #用等长的向量作为列来创建数据框,向量的类型可以不同
> df
col1 col2
1 1 a
2 2 b
3 3 c
- 几种对数据框切片的方法
> df$col1 #用$符号取值,结果为向量
[1] 1 2 3
> df_col1 <- df$col1
> str(df_col1)
num [1:3] 1 2 3
> df[1] #而用[]切片,结果为数据框
col1
1 1
2 2
3 3
> df["col1"]
col1
1 1
2 2
3 3
> df_1 <- df[1]
> str(df_1)
'data.frame': 3 obs. of 1 variable:
$ col1: num 1 2 3
> df_col1 <- df["col1"]
> str(df_col1)
'data.frame': 3 obs. of 1 variable:
$ col1: num 1 2 3
> df[,1] #用[ ,y]按列切片。第一列切出来是向量
[1] 1 2 3
> str(df[,1])
num [1:3] 1 2 3
> df[,2] #第二列是字符型的,切出来是因子
[1] a b c
Levels: a b c #如果要保留向量的话,创建数据框的时候加上StringsAsFactors=F
> str(df[,2])
Factor w/ 3 levels "a","b","c": 1 2 3
> df[1,1] #用[x,y]可以取第x行第y列的元素
[1] 1
> df[1,2]
[1] a
Levels: a b c #字符也会变成因子
> str(df[2,]) #按行切片的话,由于数据类型不一样,得到的仍是数据框
'data.frame': 1 obs. of 2 variables:
$ col1: num 2
$ col2: Factor w/ 3 levels "a","b","c": 2
对于按行切得的数据框,还可以继续切
> df[1,][2] #得到数据框
col2
1 a
> df[1,][,2] #得到因子
[1] a
Levels: a b c
> df[1,]$col2 #得到因子
[1] a
Levels: a b c
> df[1,][[2]] #得到因子
[1] a
Levels: a b c
在数据框里,用[[]]和[]切片似乎没有任何区别
> df[[1]] #用[[]]取值,得到的也是向量
[1] 1 2 3
> str(df[[1]])
num [1:3] 1 2 3
> df[[1]][2] #进而可以取第一行第二列的元素
[1] 2
> str(df[[1]][2])
num 2
> df[[1,2]] #这样取元素也可以,得到了因子
[1] a
Levels: a b c
> str(df[[1,2]])
Factor w/ 3 levels "a","b","c": 1
> df[["col1"]]
[1] 1 2 3
> df[["col2"]] #这样也是因子
[1] a b c
Levels: a b c
> df[[2]] #同理,用下标索引和标签索引结果是一样的
[1] a b c
Levels: a b c
玩了这么多,有点偏题了,咳咳
接下来做一下作业:创建一个数据框,做切片
> o <- 1:4
> p <- c("a","b","c","d")
> q <- 11:14
> r <- c(T,T,F,T)
> frame1 <- data.frame(o,p,q,r,stringsAsFactors = F)
> frame1
o p q r
1 1 a 11 TRUE
2 2 b 12 TRUE
3 3 c 13 FALSE
4 4 d 14 TRUE
> frame2 <- frame1[c(1,3),][,2:4]
> frame2
p q r
1 a 11 TRUE
3 c 13 FALSE
下一题
#读入sample.csv
> df=read.csv("sample.csv")
> dim(df) #查看行列数
[1] 768 12
> colnames(df) #查看列名
[1] "Accession" "Title"
[3] "Sample.Type" "Taxonomy"
[5] "Channels" "Platform"
[7] "Series" "Supplementary.Types"
[9] "Supplementary.Links" "SRA.Accession"
[11] "Contact" "Release.Date"
> str(df)
'data.frame': 768 obs. of 12 variables: #12个列768行
$ Accession : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Title : Factor w/ 768 levels "SS2_15_0048_A1",..: 1 12 18 19 20 21 22 23 24 2 ...
$ Sample.Type : Factor w/ 1 level "SRA": 1 1 1 1 1 1 1 1 1 1 ...
$ Taxonomy : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
$ Channels : int 1 1 1 1 1 1 1 1 1 1 ...
$ Platform : Factor w/ 1 level "GPL13112": 1 1 1 1 1 1 1 1 1 1 ...
$ Series : Factor w/ 1 level "GSE111229": 1 1 1 1 1 1 1 1 1 1 ...
$ Supplementary.Types: Factor w/ 1 level "SRA Run Selector": 1 1 1 1 1 1 1 1 1 1 ...
$ Supplementary.Links: Factor w/ 768 levels "https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901",..: 2 3 4 5 6 7 8 9 10 1 ...
$ SRA.Accession : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
$ Contact : Factor w/ 1 level "Kristian Pietras": 1 1 1 1 1 1 1 1 1 1 ...
$ Release.Date : Factor w/ 1 level "Nov 23, 2018": 1 1 1 1 1 1 1 1 1 1 ...
#读入SraRunTable.txt
> df1 <- read.table("SraRunTable.txt",header = TRUE, sep="\t", fill= TRUE)
> # header表示第一列是否为标题栏,fill表示是否将空的单元格用空格填充
> str(df1)
'data.frame': 768 obs. of 31 variables:
$ BioSample : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
$ Experiment : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
$ MBases : int 16 16 8 8 11 7 18 5 11 15 ...
$ MBytes : int 8 8 4 4 5 4 9 3 6 8 ...
$ Run : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
$ SRA_Sample : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
$ Sample_Name : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Assay_Type : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
$ AssemblyName : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
$ AvgSpotLen : int 43 43 43 43 43 43 43 43 43 43 ...
$ BioProject : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
$ Center_Name : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
$ Consent : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
$ DATASTORE_filetype: Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
$ DATASTORE_provider: Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
$ InsertSize : int 0 0 0 0 0 0 0 0 0 0 ...
$ Instrument : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
$ LibraryLayout : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
$ LibrarySelection : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
$ LibrarySource : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
$ LoadDate : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
$ Organism : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
$ Platform : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
$ ReleaseDate : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
$ SRA_Study : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
$ age : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
$ cell_type : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
$ marker_genes : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
$ source_name : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
$ strain : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
$ tissue : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
#合成
> df2 <- merge(df,df1,by.x="Accession",by.y="Sample_Name") #用by将关联的两列对映起来
str(df2)
'data.frame': 768 obs. of 42 variables:
$ Accession : Factor w/ 768 levels "GSM3025845","GSM3025846",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Title : Factor w/ 768 levels "SS2_15_0048_A1",..: 1 12 18 19 20 21 22 23 24 2 ...
$ Sample.Type : Factor w/ 1 level "SRA": 1 1 1 1 1 1 1 1 1 1 ...
$ Taxonomy : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
$ Channels : int 1 1 1 1 1 1 1 1 1 1 ...
$ Platform.x : Factor w/ 1 level "GPL13112": 1 1 1 1 1 1 1 1 1 1 ...
$ Series : Factor w/ 1 level "GSE111229": 1 1 1 1 1 1 1 1 1 1 ...
$ Supplementary.Types: Factor w/ 1 level "SRA Run Selector": 1 1 1 1 1 1 1 1 1 1 ...
$ Supplementary.Links: Factor w/ 768 levels "https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRX3749901",..: 2 3 4 5 6 7 8 9 10 1 ...
$ SRA.Accession : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
$ Contact : Factor w/ 1 level "Kristian Pietras": 1 1 1 1 1 1 1 1 1 1 ...
$ Release.Date : Factor w/ 1 level "Nov 23, 2018": 1 1 1 1 1 1 1 1 1 1 ...
$ BioSample : Factor w/ 768 levels "SAMN08619908",..: 5 4 3 2 1 12 11 14 13 7 ...
$ Experiment : Factor w/ 768 levels "SRX3749901","SRX3749902",..: 2 3 4 5 6 7 8 9 10 1 ...
$ MBases : int 16 16 8 8 11 7 18 5 11 15 ...
$ MBytes : int 8 8 4 4 5 4 9 3 6 8 ...
$ Run : Factor w/ 768 levels "SRR6790711","SRR6790712",..: 1 2 3 4 5 6 7 8 9 10 ...
$ SRA_Sample : Factor w/ 768 levels "SRS3006136","SRS3006137",..: 3 13 2 1 14 5 15 7 6 4 ...
$ Assay_Type : Factor w/ 1 level "RNA-Seq": 1 1 1 1 1 1 1 1 1 1 ...
$ AssemblyName : Factor w/ 1 level "GCF_000001635.20": 1 1 1 1 1 1 1 1 1 1 ...
$ AvgSpotLen : int 43 43 43 43 43 43 43 43 43 43 ...
$ BioProject : Factor w/ 1 level "PRJNA436229": 1 1 1 1 1 1 1 1 1 1 ...
$ Center_Name : Factor w/ 1 level "GEO": 1 1 1 1 1 1 1 1 1 1 ...
$ Consent : Factor w/ 1 level "public": 1 1 1 1 1 1 1 1 1 1 ...
$ DATASTORE_filetype : Factor w/ 1 level "sra": 1 1 1 1 1 1 1 1 1 1 ...
$ DATASTORE_provider : Factor w/ 1 level "ncbi": 1 1 1 1 1 1 1 1 1 1 ...
$ InsertSize : int 0 0 0 0 0 0 0 0 0 0 ...
$ Instrument : Factor w/ 1 level "Illumina HiSeq 2000": 1 1 1 1 1 1 1 1 1 1 ...
$ LibraryLayout : Factor w/ 1 level "SINGLE": 1 1 1 1 1 1 1 1 1 1 ...
$ LibrarySelection : Factor w/ 1 level "cDNA": 1 1 1 1 1 1 1 1 1 1 ...
$ LibrarySource : Factor w/ 1 level "TRANSCRIPTOMIC": 1 1 1 1 1 1 1 1 1 1 ...
$ LoadDate : Factor w/ 1 level "2018-03-01": 1 1 1 1 1 1 1 1 1 1 ...
$ Organism : Factor w/ 1 level "Mus musculus": 1 1 1 1 1 1 1 1 1 1 ...
$ Platform.y : Factor w/ 1 level "ILLUMINA": 1 1 1 1 1 1 1 1 1 1 ...
$ ReleaseDate : Factor w/ 1 level "2018-11-23": 1 1 1 1 1 1 1 1 1 1 ...
$ SRA_Study : Factor w/ 1 level "SRP133642": 1 1 1 1 1 1 1 1 1 1 ...
$ age : Factor w/ 1 level "14 weeks": 1 1 1 1 1 1 1 1 1 1 ...
$ cell_type : Factor w/ 1 level "cancer-associated fibroblasts (CAFs)": 1 1 1 1 1 1 1 1 1 1 ...
$ marker_genes : Factor w/ 1 level "EpCAM-, CD45-, CD31-, NG2-": 1 1 1 1 1 1 1 1 1 1 ...
$ source_name : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
$ strain : Factor w/ 1 level "FVB/N-Tg(MMTVPyVT)634Mul/J": 1 1 1 1 1 1 1 1 1 1 ...
$ tissue : Factor w/ 1 level "Mammary tumor fibroblast": 1 1 1 1 1 1 1 1 1 1 ...
R语言是基本功,我想走得扎实一些,所以每次学的内容不是太多。今天就先做这些啦,下次继续~
网友评论