R初级作业(一)--2019-05-01

作者: Bio小盼 | 来源:发表于2019-05-01 20:39 被阅读26次

R初级作业(一)--2019-05-01
R作业初级
R语言初级作业
R语言初级作业
R语言作业—初级
R语言作业·初级
2019-06-15 R语言作业（初级）
R 练习作业（初级）
R语言作业（初级·上）
R语言for循环练习

R初级作业(一)

打开 Rstudio 告诉我它的工作目录。

getwd
[1] "C:/Users/ZPY/Desktop/生信培训/u盘资料/3天课程资料/1.R/01-get_start"

getwd后显示的是当前的工作目录。

新建6个向量，基于不同的原子类型。（重点是字符串，数值，逻辑值）

字符串

> x<- c("a","b","test")
> x
[1] "a"    "b"    "test"
> class(x)
[1] "character"

2.数值型

> x2 <- c (1:15)
> x2
 [1]  1  2  3  4  5  6  7  8  9 10
[11] 11 12 13 14 15
> class(x2)
[1] "integer"

3.逻辑值

> x3 <- c(T,T,F,T)
> x3 
[1]  TRUE  TRUE FALSE  TRUE
> class(x3)
[1] "logical"

新建一些数据结构，比如矩阵，数组，数据框，列表等重点是数据框，矩阵）
1.新建数据框

> df <- data.frame(gene=paste0("gene",1:5),a1 = rnorm(n=5),a2 = rnorm(n=5),a3 = rnorm(n=5),a4 = rnorm(n=5),a5 = rnorm(n=5))
> df
   gene         a1          a2
1 gene1 -1.2618589  1.23975969
2 gene2 -0.4130272  0.55415193
3 gene3  1.3418602  1.49528004
4 gene4  0.6431766 -0.92223528
5 gene5  0.9204888  0.04323589
          a3          a4
1  1.9925937  1.13877165
2  1.9097372  0.57711783
3 -0.5818669  0.60345433
4  0.2551079  0.09098584
5  0.2774576 -1.24120023
            a5
1  1.575937779
2 -0.008430767
3 -0.738294543
4 -0.032767262
5 -0.082232013

2.新建矩阵

> m <- matrix(1:15,ncol  =3)
> m
     [,1] [,2] [,3]
[1,]    1    6   11
[2,]    2    7   12
[3,]    3    8   13
[4,]    4    9   14
[5,]    5   10   15
> rownames(m) <- paste0(rep("gene",5),1:5)
> colnames(m) <- c("a1","a2","a3")
> m
      a1 a2 a3
gene1  1  6 11
gene2  2  7 12
gene3  3  8 13
gene4  4  9 14
gene5  5 10 15

在你新建的数据框进行切片操作，比如首先取第1，3行，然后取第4，6列
1.取df 1,3行,4到6列

> df[c(1,3),]
   gene        a1      a2
1 gene1 -1.261859 1.23976
3 gene3  1.341860 1.49528
          a3        a4         a5
1  1.9925937 1.1387717  1.5759378
3 -0.5818669 0.6034543 -0.7382945
> df[,4:6]
          a3          a4
1  1.9925937  1.13877165
2  1.9097372  0.57711783
3 -0.5818669  0.60345433
4  0.2551079  0.09098584
5  0.2774576 -1.24120023
            a5
1  1.575937779
2 -0.008430767
3 -0.738294543
4 -0.032767262
5 -0.082232013

使用data函数来加载R内置数据集 rivers 描述它

> data("rivers")
> rivers
> ?rivers #北美141条河流长度
> data("rivers") #加载rivers
> rivers
  [1]  735  320  325  392  524  450
  [7] 1459  135  465  600  330  336
 [13]  280  315  870  906  202  329
 [19]  290 1000  600  505 1450  840
 [25] 1243  890  350  407  286  280
 [31]  525  720  390  250  327  230
 [37]  265  850  210  630  260  230
 [43]  360  730  600  306  390  420
 [49]  291  710  340  217  281  352
 [55]  259  250  470  680  570  350
 [61]  300  560  900  625  332 2348
 [67] 1171 3710 2315 2533  780  280
 [73]  410  460  260  255  431  350
 [79]  760  618  338  981 1306  500
 [85]  696  605  250  411 1054  735
 [91]  233  435  490  310  460  383
 [97]  375 1270  545  445 1885  380
[103]  300  380  377  425  276  210
[109]  800  420  350  360  538 1100
[115] 1205  314  237  610  360  540
[121] 1038  424  310  300  444  301
[127]  268  620  215  652  900  525
[133]  246  360  529  500  720  270
[139]  430  671 1770
> ?rivers
> length(rivers)
[1] 141
> unique(rivers)
  [1]  735  320  325  392  524  450
  [7] 1459  135  465  600  330  336
 [13]  280  315  870  906  202  329
 [19]  290 1000  505 1450  840 1243
 [25]  890  350  407  286  525  720
 [31]  390  250  327  230  265  850
 [37]  210  630  260  360  730  306
 [43]  420  291  710  340  217  281
 [49]  352  259  470  680  570  300
 [55]  560  900  625  332 2348 1171
 [61] 3710 2315 2533  780  410  460
 [67]  255  431  760  618  338  981
 [73] 1306  500  696  605  411 1054
 [79]  233  435  490  310  383  375
 [85] 1270  545  445 1885  380  377
 [91]  425  276  800  538 1100 1205
 [97]  314  237  610  540 1038  424
[103]  444  301  268  620  215  652
[109]  246  529  270  430  671 1770
> length(rivers)
[1] 141
> unique(rivers)#去重复
  [1]  735  320  325  392  524  450
  [7] 1459  135  465  600  330  336
 [13]  280  315  870  906  202  329
 [19]  290 1000  505 1450  840 1243
 [25]  890  350  407  286  525  720
 [31]  390  250  327  230  265  850
 [37]  210  630  260  360  730  306
 [43]  420  291  710  340  217  281
 [49]  352  259  470  680  570  300
 [55]  560  900  625  332 2348 1171
 [61] 3710 2315 2533  780  410  460
 [67]  255  431  760  618  338  981
 [73] 1306  500  696  605  411 1054
 [79]  233  435  490  310  383  375
 [85] 1270  545  445 1885  380  377
 [91]  425  276  800  538 1100 1205
 [97]  314  237  610  540 1038  424
[103]  444  301  268  620  215  652
[109]  246  529  270  430  671 1770
> length(unique(rivers))#元素个数
[1] 114
> table(rivers)#统计
rivers
 135  202  210  215  217  230  233 
   1    1    2    1    1    2    1 
 237  246  250  255  259  260  265 
   1    1    3    1    1    2    1 
 268  270  276  280  281  286  290 
   1    1    1    3    1    1    1 
 291  300  301  306  310  314  315 
   1    3    1    1    2    1    1 
 320  325  327  329  330  332  336 
   1    1    1    1    1    1    1 
 338  340  350  352  360  375  377 
   1    1    4    1    4    1    1 
 380  383  390  392  407  410  411 
   2    1    2    1    1    1    1 
 420  424  425  430  431  435  444 
   2    1    1    1    1    1    1 
 445  450  460  465  470  490  500 
   1    1    2    1    1    1    2 
 505  524  525  529  538  540  545 
   1    1    2    1    1    1    1 
 560  570  600  605  610  618  620 
   1    1    3    1    1    1    1 
 625  630  652  671  680  696  710 
   1    1    1    1    1    1    1 
 720  730  735  760  780  800  840 
   2    1    2    1    1    1    1 
 850  870  890  900  906  981 1000 
   1    1    1    2    1    1    1 
1038 1054 1100 1171 1205 1243 1270 
   1    1    1    1    1    1    1 
1306 1450 1459 1770 1885 2315 2348 
   1    1    1    1    1    1    1 
2533 3710 
   1    1 
> sort(rivers)#排序
  [1]  135  202  210  210  215  217
  [7]  230  230  233  237  246  250
 [13]  250  250  255  259  260  260
 [19]  265  268  270  276  280  280
 [25]  280  281  286  290  291  300
 [31]  300  300  301  306  310  310
 [37]  314  315  320  325  327  329
 [43]  330  332  336  338  340  350
 [49]  350  350  350  352  360  360
 [55]  360  360  375  377  380  380
 [61]  383  390  390  392  407  410
 [67]  411  420  420  424  425  430
 [73]  431  435  444  445  450  460
 [79]  460  465  470  490  500  500
 [85]  505  524  525  525  529  538
 [91]  540  545  560  570  600  600
 [97]  600  605  610  618  620  625
[103]  630  652  671  680  696  710
[109]  720  720  730  735  735  760
[115]  780  800  840  850  870  890
[121]  900  900  906  981 1000 1038
[127] 1054 1100 1171 1205 1243 1270
[133] 1306 1450 1459 1770 1885 2315
[139] 2348 2533 3710
> median(rivers)#中位数
[1] 425
> range(rivers)#显示最大值及最小值
[1]  135 3710
> which.min(rivers)#最小值下标
[1] 8

下载 https://www.ncbi.nlm.nih.gov/sra?term=SRP133642 里面的 RunInfo Table 文件读入到R里面，了解这个数据框，多少列，每一列都是什么属性的元素。
1.下载步骤：打开链接--Send results to Run selector--RunInfo Table
2.读取文件

> sra <- read.table (SraRUNTable.txt)
Error in read.table(SraRUNTable.txt) : object 'SraRUNTable.txt' not found #文件没有在当前工作目录下
> sra <- read.table (file="SraRunTable.txt")
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 44 elements #指定文件分隔符为 \t
> df1 <- read.table(file = "SraRunTable.txt",header = T,sep = '\t')#注意读取列名
> View(df1)
> dim(df1)#查看行数列数
> nrow(df1)#查看行数
> ncol(df1)#查看列数
> colnames(df1)#查看列名
 [1] "BioSample"         
 [2] "Experiment"        
 [3] "MBases"            
 [4] "MBytes"            
 [5] "Run"               
 [6] "SRA_Sample"        
 [7] "Sample_Name"       
 [8] "Assay_Type"        
 [9] "AssemblyName"      
[10] "AvgSpotLen"        
[11] "BioProject"        
[12] "Center_Name"       
[13] "Consent"           
[14] "DATASTORE_filetype"
[15] "DATASTORE_provider"
[16] "InsertSize"        
[17] "Instrument"        
[18] "LibraryLayout"     
[19] "LibrarySelection"  
[20] "LibrarySource"     
[21] "LoadDate"          
[22] "Organism"          
[23] "Platform"          
[24] "ReleaseDate"       
[25] "SRA_Study"         
[26] "age"               
[27] "cell_type"         
[28] "marker_genes"      
[29] "source_name"       
[30] "strain"            
[31] "tissue"     
> for (i in colnames(df1)) paste(i,class(df1[,i])) %>% print() #查看文件属性
Error in paste(i, class(df1[, i])) %>% print() : 
  could not find function "%>%" # %>% ％>％来自magrittr包的管道，其作用是将前一步的结果直接传参给下一步的[函数](https://www.baidu.com/s?wd=%E5%87%BD%E6%95%B0&tn=SE_PcZhidaonwhc_ngpagmjz&rsv_dl=gh_pc_zhidao)，从而省略了中间的赋值步骤，可以大量减少内存中的对象，节省内存。

报错解决：https://stackoverflow.com/questions/30248583/error-could-not-find-function
之后：

>install.packages("magrittr") 
>library(magrittr) #加载包，以便使用%>%
> for (i in colnames(df1)) paste(i,class(df1[,i])) %>% print() #对于df1的每一列，都输出列内容及列属性，其中%>%表示重定向符号，将之前的操作输出，paste：将向量转换成字符并连接，参考>[https://blog.csdn.net/neweastsun/article/details/51792237]

下载 https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111229 里面的样本信息sample.csv读入到R里面，了解这个数据框，多少列，每一列都是什么属性的元素
1.下载GEO样本信息点此获取下载步骤
GEO官网：https://www.ncbi.nlm.nih.gov/geo/ ---- 点击samples----- search 输入 GSE111229 ---- export
2.读取到R中

> df2<- read.table("sample.csv",header = T)
> View(df2)
> dim(df2)
[1] 20  6
> library(magrittr)
> for (i in colnames(df2)) paste(i,class(df2[,i])) %>% print()
[1] "Accession.Title.Sample character"
[1] "Type.Taxonomy.Channels.Platform.Series.Supplementary character"
[1] "Types.Supplementary character"
[1] "Links.SRA character"
[1] "Accession.Contact.Release character"
[1] "Date character"

读出之后发现与其他同学的不同，错误原因：此文件为csv文件，以，为分隔符，若想用read.table ,需要：

> df2<- read.table("sample.csv",sep = ",")
> df3<- read.csv("sample.csv")
> dim(df2)
[1] 20 12

发现行数还是错误，查看下载来的初始文件本是20行，因为我下载时只下载了当前页，重新下载，选择 All search results，

> df2=read.csv(file="sample.csv")
> View(df2)
> dim(df2)
[1] 768  12
> library(magrittr)
Warning message:
程辑包‘magrittr’是用R版本3.5.3 来建造的 
> for (i in colnames(df2)) paste(i,class(df2[,i])) %>% print()
[1] "Accession factor"
[1] "Title factor"
[1] "Sample.Type factor"
[1] "Taxonomy factor"
[1] "Channels integer"
[1] "Platform factor"
[1] "Series factor"
[1] "Supplementary.Types factor"
[1] "Supplementary.Links factor"
[1] "SRA.Accession factor"
[1] "Contact factor"
[1] "Release.Date factor"

把前面两个步骤的两个表（RunInfo Table 文件，样本信息sample.csv）关联起来，使用merge函数。
总体思路：找出相同内容合并

rm(list = ls())
options(stringsAsFactors = F)
df1 <- read.table(file = "SraRunTable.txt",header = T,sep = '\t')
df2 <- read.csv(file = "sample.csv")
for (i in colnames(df1)) {if (i %in% colnames(df2)) print(i)}#查看相同列名
df1[1,"Platform"]  
df2[1,"Platform"]#查看两个数据框相同列名的行名

后面大神的内容就看不懂了，参考https://www.jianshu.com/p/c07e67e2c757

R初级作业(一)--2019-05-01
R初级作业(一) 打开 Rstudio 告诉我它的工作目录。 getwd后显示的是当前的工作目录。新建6个向量，...
R作业初级
初级题目 R语言练习题-初级正式开始我们的旅程软件安装以及R包安装参考：http://www.bio-info...
R语言初级作业
首先做完了周末班工作，包括软件安装以及R包安装：打开 Rstudio告诉我它的工作目录。getwd() 新建6...
R语言初级作业
R语言初级作业打开 Rstudio 告诉我它的工作目录。新建6个向量，基于不同的原子类型。（重点是字符串，数值...
R语言作业—初级
教程对应B站：【生信技能树】生信人应该这样学R语言配套资料：B站的11套生物信息学公益视频配套讲义、练习题及思维导...
R语言作业·初级
【作业1】当前工作目录是什么路径【作业2】新建6个向量，基于不同的原子类型。（重点是字符串，数值，逻辑值）【不...
2019-06-15 R语言作业（初级）
R语言作业（初级）题目链接：http://www.bio-info-trainee.com/3793.html ...
R 练习作业（初级）
title: "R 编程作业（初级）"author: "Xiaxia"date: "2019年4月3日"outpu...
R语言作业（初级·上）
初级作业·上题目链接：http://www.bio-info-trainee.com/3793.html 1.软...
R语言for循环练习
for循环在R中很常见也很重要，虽然很简单，但是一定要理解其中的思想。以R语言初级作业第9题为例。加数据加载到R中...