美文网首页R for Data Science
[R语言] magrittr包 管道操作《R for data

[R语言] magrittr包 管道操作《R for data

作者: 半为花间酒 | 来源:发表于2020-04-28 09:56 被阅读0次

    《R for Data Science》第十八章 Pipes 啃书知识点积累
    参考链接:R for Data Science

    library(magrittr)
    

    Piping alternatives

    - Intermediate steps

    R will share columns across data frames, where possible.

    diamonds <- ggplot2::diamonds
    diamonds2 <- diamonds %>% 
      dplyr::mutate(price_per_carat = price / carat)
    
    pryr::object_size(diamonds)
    #> Registered S3 method overwritten by 'pryr':
    #>   method      from
    #>   print.bytes Rcpp
    #> 3.46 MB
    pryr::object_size(diamonds2)
    #> 3.89 MB
    pryr::object_size(diamonds, diamonds2)
    #> 3.89 MB
    
    #  如果修改了其中一列,该列在数据框就不再共享
    diamonds$carat[1] <- NA
    pryr::object_size(diamonds)
    #> 3.46 MB
    pryr::object_size(diamonds2)
    #> 3.89 MB
    pryr::object_size(diamonds, diamonds2)
    #> 4.32 MB
    

    pryr::object_size()可以获取给定对象占用的内存,可以给多个对象
    object.size()只能给定一个对象

    - Function composition

    bop(
      scoop(
        hop(foo_foo, through = forest),
        up = field_mice
      ), 
      on = head
    )
    

    The dagwood sandwhich problem:
    The disadvantage is that you have to read from inside-out, from right-to-left, and that the arguments end up spread far apart.

    - Use the pipe

    foo_foo %>%
      hop(through = forest) %>%
      scoop(up = field_mice) %>%
      bop(on = head)
    
    # 本质上如下
    my_pipe <- function(.) {
      . <- hop(., through = forest)
      . <- scoop(., up = field_mice)
      bop(., on = head)
    }
    my_pipe(foo_foo)
    
    • 两种不适用管道的情况

    (1) 使用当前环境的函数:如assign load get

    assign("x", 10); x
    # [1] 10
    
    "x" %>% assign(100); x
    # [1] 10
    
    env <- environment()
    "x" %>% assign(100, envir = env); x
    # [1] 100
    

    (2) 延迟使用、惰性计算的函数: 如多数捕获异常的函数
    tryCatch try suppressMessages suppressWarnings

    tryCatch(stop("!"), error = function(e) "An error")
    #> [1] "An error"
    
    stop("!") %>% 
      tryCatch(error = function(e) "An error")
    #> Error in eval(lhs, parent, parent): !
    

    When not to use the pipe

    知道什么时候不用管道也是很重要的事情

    Pipes are most useful for rewriting a fairly short linear sequence of operations.

    • Your pipes are longer than (say) ten steps. In that case, create intermediate objects with meaningful names. That will make debugging easier, because you can more easily check the intermediate results, and it makes it easier to understand your code, because the variable names can help communicate intent.

    • You have multiple inputs or outputs. If there isn’t one primary object being transformed, but two or more objects being combined together, don’t use the pipe.

    • You are starting to think about a directed graph with a complex dependency structure. Pipes are fundamentally linear and expressing complex relationships with them will typically yield confusing code.

    Other tools from magrittr

    When working with more complex pipes, it’s sometimes useful to call a function for its side-effects. Maybe you want to print out the current object, or plot it, or save it to disk. Many times, such functions don’t return anything, effectively terminating the pipe.

    • %T>%
      %T>% works like %>% except that it returns the left-hand side instead of the right-hand side. It’s called “tee” because it’s like a literal T-shaped pipe.
    library(magrittr)
    
    rnorm(100) %>%
      matrix(ncol = 2) %>%
      plot() %>%
      str()
    #  NULL
    
    rnorm(100) %>%
      matrix(ncol = 2) %T>%
      plot() %>% 
      str()
    # num [1:50, 1:2] -0.351 -1.751 0.666 0.516 -0.686 ...
    
    • %$%
      It “explodes” out the variables in a data frame so that you can refer to them explicitly.
      (便于显式调用变量)
    mtcars %$%
      cor(disp, mpg)
    #> [1] -0.8475514
    
    # 可以用with显式变量
    with(mtcars, cor(disp, mpg))
    
    • %<>%
      直接替换不需要重赋值
    mtcars <- mtcars %>% 
      transform(cyl = cyl * 2)
    
    mtcars %<>% transform(cyl = cyl * 2)
    

    相关文章

      网友评论

        本文标题:[R语言] magrittr包 管道操作《R for data

        本文链接:https://www.haomeiwen.com/subject/njsuwhtx.html