《R for Data Science》第十八章 Pipes 啃书知识点积累
参考链接:R for Data Science
library(magrittr)
Piping alternatives
- Intermediate steps
R will share columns across data frames, where possible.
diamonds <- ggplot2::diamonds
diamonds2 <- diamonds %>%
dplyr::mutate(price_per_carat = price / carat)
pryr::object_size(diamonds)
#> Registered S3 method overwritten by 'pryr':
#> method from
#> print.bytes Rcpp
#> 3.46 MB
pryr::object_size(diamonds2)
#> 3.89 MB
pryr::object_size(diamonds, diamonds2)
#> 3.89 MB
# 如果修改了其中一列,该列在数据框就不再共享
diamonds$carat[1] <- NA
pryr::object_size(diamonds)
#> 3.46 MB
pryr::object_size(diamonds2)
#> 3.89 MB
pryr::object_size(diamonds, diamonds2)
#> 4.32 MB
pryr::object_size()
可以获取给定对象占用的内存,可以给多个对象
object.size()
只能给定一个对象
- Function composition
bop(
scoop(
hop(foo_foo, through = forest),
up = field_mice
),
on = head
)
The dagwood sandwhich problem:
The disadvantage is that you have to read from inside-out, from right-to-left, and that the arguments end up spread far apart.
- Use the pipe
foo_foo %>%
hop(through = forest) %>%
scoop(up = field_mice) %>%
bop(on = head)
# 本质上如下
my_pipe <- function(.) {
. <- hop(., through = forest)
. <- scoop(., up = field_mice)
bop(., on = head)
}
my_pipe(foo_foo)
- 两种不适用管道的情况
(1) 使用当前环境的函数:如assign
load
get
assign("x", 10); x
# [1] 10
"x" %>% assign(100); x
# [1] 10
env <- environment()
"x" %>% assign(100, envir = env); x
# [1] 100
(2) 延迟使用、惰性计算的函数: 如多数捕获异常的函数
tryCatch
try
suppressMessages
suppressWarnings
tryCatch(stop("!"), error = function(e) "An error")
#> [1] "An error"
stop("!") %>%
tryCatch(error = function(e) "An error")
#> Error in eval(lhs, parent, parent): !
When not to use the pipe
知道什么时候不用管道也是很重要的事情
Pipes are most useful for rewriting a fairly short linear sequence of operations.
-
Your pipes are longer than (say) ten steps. In that case, create intermediate objects with meaningful names. That will make debugging easier, because you can more easily check the intermediate results, and it makes it easier to understand your code, because the variable names can help communicate intent.
-
You have multiple inputs or outputs. If there isn’t one primary object being transformed, but two or more objects being combined together, don’t use the pipe.
-
You are starting to think about a directed graph with a complex dependency structure. Pipes are fundamentally linear and expressing complex relationships with them will typically yield confusing code.
Other tools from magrittr
When working with more complex pipes, it’s sometimes useful to call a function for its side-effects. Maybe you want to print out the current object, or plot it, or save it to disk. Many times, such functions don’t return anything, effectively terminating the pipe.
-
%T>%
%T>%
works like%>%
except that it returns the left-hand side instead of the right-hand side. It’s called “tee” because it’s like a literal T-shaped pipe.
library(magrittr)
rnorm(100) %>%
matrix(ncol = 2) %>%
plot() %>%
str()
# NULL
rnorm(100) %>%
matrix(ncol = 2) %T>%
plot() %>%
str()
# num [1:50, 1:2] -0.351 -1.751 0.666 0.516 -0.686 ...
-
%$%
It “explodes” out the variables in a data frame so that you can refer to them explicitly.
(便于显式调用变量)
mtcars %$%
cor(disp, mpg)
#> [1] -0.8475514
# 可以用with显式变量
with(mtcars, cor(disp, mpg))
-
%<>%
直接替换不需要重赋值
mtcars <- mtcars %>%
transform(cyl = cyl * 2)
mtcars %<>% transform(cyl = cyl * 2)
网友评论