R包的来源

google检索
Rstudio cheatsheet网站或者bioconductor 以及Github安装

tidyr包的使用

功能：把目标数据处理程统一的数据框。“统一”：每个变量（variable）占一列，每个情况（case）和观测值（observation）占一行。
（1）数据框的变形
（2）处理数据框中的空值
（3）根据一个表格衍生出其他表格
（4）实现行或列的分割和合并
key-value--“键值对” ，表示一种对应关系。“键”和“值”都是列名，如geneid和expr的对应
Reshape Data
(1) gather():
▪ gather（数据框名，列名，列名（需合并的），key= 新列名，value = 值新列名）
eg: 三条命令运行结果一致

     gather(a,X1999,X2000,key = "year",value = "cases")
     gather(a, "year","cases",X1999,X2000)

若为多列合并，可用：

     gather(a,"year","cases",-country)  #-country：合并除country外剩下的列。

(2) spead()：moves the unique values of a key column into the column names, spreading the values of a value column across the new columns
▪ spread(数据框名，type（整合后的类），value)
(3) Handle Missing Values
▪ 删除含NA的整行
drop_na()：有空值整行删除；drop_na(数据框名，行名)
▪ 填充
fill()：根据上一行的数值填充；fill(数据框名，有NA的列名)
▪ 空值填入特定数值
replace_na() ：填入特定值；replace_na（数据框名，要填的列名=要填的值） ```

replace_na(X,list(year=2001))

(4) Expand Tables（扩充数据）
▪ 把空值的位置补全
complete()：命令应用complete(X,nesting(X1列名),fill=list(X2=5))
▪ 列出每列值所有可能的组合
expand()：expand(数据框名，列名（选中的），列名，列名（需列出的列名）)

(5) split cells
separate：按列分割
separate_rows：按行分割
unite：分割完了再合并回去