学习R包
安装和加载R包
1.镜像设置
https://mp.weixin.qq.com/s/XvKb5FjAGM6gYsxTw3tcWw来自生信星球
2.安装
R包安装命令是install.packages(“包”)或者BiocManager::install(“包”)。取决于你要安装的包存在于CRAN网站还是Biocductor
3.加载
library(包)
require(包)
data:image/s3,"s3://crabby-images/99eb2/99eb24ad093670a1c12b44657b2d2d7257dc67ae" alt=""
dplyr五个基础函数
示例数据直接使用内置数据集iris的简化版
test <- iris[c(1:2,51:52,101:102),]
1.mutate(),新增列
data:image/s3,"s3://crabby-images/c6f37/c6f3739d5e90ea1434aae0ed492fa03dae99d5a2" alt=""
代表新增了一列名为“new”的数值,数值为sepal.legth*sepal.width
2.select(),按列筛选
(1)按列号筛选
data:image/s3,"s3://crabby-images/90d82/90d82222bc6a1b8b2eeae1bf8d952f255e7d2ec7" alt=""
(2)按列名筛选
data:image/s3,"s3://crabby-images/b8ce6/b8ce6b59776e9a3180c16d910a871c4788579c3a" alt=""
3.filter()筛选行
data:image/s3,"s3://crabby-images/ed920/ed9209eeded9c793f2da8f682aa7e7b66f6b60dc" alt=""
4.arrange(),按某1列或某几列对整个表格进行排序
arrange(test, Sepal.Length)
默认从小到大排序data:image/s3,"s3://crabby-images/bd840/bd8401b79c71234fd0eec0ba2a1cd70a6208a5d4" alt=""
arrange(test, desc(Sepal.Length))
用desc从大到小data:image/s3,"s3://crabby-images/bdf57/bdf57c054802b92637a0028360170a6a58b7ff20" alt=""
5.summarise():汇总
summarise(test, mean(Sepal.Length), sd(Sepal.Length))
计算Sepal.Length的平均值和标准差先按照Species分组,计算每组Sepal.Length的平均值和标准差
summarise(group_by(test, Species),mean(Sepal.Length), sd(Sepal.Length))
data:image/s3,"s3://crabby-images/35ac3/35ac3ab80e0d7cfe4eba726fd203c77394f3bcfe" alt=""
dplyr两个实用技能
1:管道操作 %>% (cmd/ctr + shift + M)
(加载任意一个tidyverse包即可用管道符号)
注:tidyverse是一组处理与可视化R包的集合,其中ggplot2与dplyr最广为人知
符号%>%,其意思是将%>%左边的对象传递给右边的函数
test %>% group_by(Species) %>% summarise(mean(Sepal.Length),sd(Sepal.Length))
data:image/s3,"s3://crabby-images/b1867/b1867633e782b2fbcd42f82eeff28e89a01de43c" alt=""
2:count统计某列的unique值
count(test,Species)
data:image/s3,"s3://crabby-images/d6a38/d6a38bf0de564f9906d4029420665a166faa96e2" alt=""
dplyr处理关系数据
即将2个表进行连接,注意:不要引入factor
options(stringsAsFactors = F)
代表禁止将字符转换为factor
data:image/s3,"s3://crabby-images/fcbaf/fcbafef111a77e6f7aec890c80c58aae169c3934" alt=""
1.內连inner_join,取交集
inner_join(test1, test2, by = "x")
data:image/s3,"s3://crabby-images/ed05e/ed05eb0b4f9f5efd233b77249e95d0197b591357" alt=""
2.左连left_join
left_join(test1, test2, by = 'x')
left_join(test2, test1, by = 'x')
data:image/s3,"s3://crabby-images/6e933/6e93361ff60a59d38f976a8ac9b22c20bc4e2069" alt=""
3.全连full_join
data:image/s3,"s3://crabby-images/09cd3/09cd3f868e50ec6b3653ec218f08c00aa3a526a6" alt=""
4.半连接:返回能够与y表匹配的x表所有记录semi_join
semi_join(x, y): 保留 x 表中与 y 表中的观测相匹配的所有观测。
semi_join(x = test1, y = test2, by = 'x')
data:image/s3,"s3://crabby-images/a3548/a354899892d434c84681b5c2a99c3087c52734e6" alt=""
semi_join函数所得结果为与inner_join类似,同样是求两个数据集的交集,但semi_join只保留/返回前者与后者相匹配的函数
5.反连接:返回无法与y表匹配的x表的所记录anti_join
anti_join(x, y): 丢弃 x 表中与 y 表中的观测相匹配的所有观测。
anti_join(x = test2, y = test1, by = 'x')
data:image/s3,"s3://crabby-images/c2a9d/c2a9d9c32bb5c123380a01db6bfd1d643c8f2bd4" alt=""
data:image/s3,"s3://crabby-images/7a9d2/7a9d224f5f43a92869ba583bac4a2302bb5f0901" alt=""
6.简单合并
相当于base包里的cbind()函数和rbind()函数;注意,bind_rows()函数需要两个表格列数相同,而bind_cols()函数则需要两个数据框有相同的行数
- cbind: 根据列进行合并,即叠加所有列,m列的矩阵与n列的矩阵cbind()最后变成m+n列,合并前提:cbind(a, b)中矩阵a、b的行数必需相符
-
rbind: 根据行进行合并,就是行的叠加,m行的矩阵与n行的矩阵rbind()最后变成m+n行,合并前提:rbind(a, b)中矩阵a、b的列数必需相符
网友评论