高效数据处理组合
tidyverse
包中就是这种组合:tidyr
+ dplyr
+ purrr
,如下所示:
> library(tidyverse)
── Attaching packages ──────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 2.2.1 ✔ purrr 0.2.4
✔ tibble 1.4.1 ✔ dplyr 0.7.4
✔ tidyr 0.7.2 ✔ stringr 1.2.0
✔ readr 1.1.1 ✔ forcats 0.2.0
── Conflicts ─────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
学习笔记
- 学习资源
- purrr tutorial
- R for Data Science的第21章节(其实在前面的章节也谈到上一个学习资源)
- 学习摘记
library(tidyverse)
# 构造一个tibble
df <- tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
# 算每列数据的中位数
median(df$a)
median(df$b)
# 重复做写很麻烦,写个循环
output <- vector("double", ncol(df))
for (i in seq_along(df)){
# seq_along()更加安全比起length()
output[[i]] <- median(df[[i]])
}
output
## 练习题
mean_loop <- function(df){
output <- vector("double", ncol(df))
for (i in seq_along(df)){
output[[i]] <- mean(df[[i]])
}
print(output)
}
mean_loop(mtcars)
# vapply() is a safe alternative to sapply()
# because you supply an additional argument that defines the type.
# The only problem with vapply() is that it’s a lot of typing:
# vapply(df, is.numeric, logical(1)) is equivalent to map_lgl(df, is.numeric).
# One advantage of vapply() over purrr’s map functions is that it can also produce matrices
# — the map functions only ever produce vectors.
我的微信公众号
如果实在有需要请给我发邮件:mengyuanshen@126.com;
也可以关注我的公众号:沈梦圆(PandaBiotrainee)
网友评论