美文网首页
Advanced-R. Subsetting

Advanced-R. Subsetting

作者: MJades | 来源:发表于2020-02-13 17:03 被阅读0次
    1. The one important difference between and [[ is that does (left-to-right) partial matching.
      设置环境变量,当进行部分匹配时,进行错误提醒。
    options(warnPartialMatchDollar = TRUE)
    x$a
    #> Warning in x$a: partial match of 'a' to 'abc'
    #> [1] 1
    
    1. Subsetting with nothing can be useful with assignment because it preserves the structure of the original object. Compare the following two expressions. In the first, mtcars remains a data frame because you are only changing the contents of mtcars, not mtcars itself. In the second, mtcars becomes a list because you are changing the object it is bound to.
    mtcars[] <- lapply(mtcars, as.integer)
    is.data.frame(mtcars)
    #> [1] TRUE
    
    mtcars <- lapply(mtcars, as.integer)
    is.data.frame(mtcars)
    #> [1] FALSE
    
    1. Lookup tables
      Character matching is a powerful way to create lookup tables. Say you want to convert abbreviations:
    x <- c("m", "f", "u", "f", "f", "m", "m")
    lookup <- c(m = "Male", f = "Female", u = NA)
    lookup[x]
    #>        m        f        u        f        f        m        m 
    #>   "Male" "Female"       NA "Female" "Female"   "Male"   "Male"
    

    Note that if you don’t want names in the result, use unname() to remove them.

    unname(lookup[x])
    #> [1] "Male"   "Female" NA       "Female" "Female" "Male"   "Male"
    
    1. match
      combining match() and integer subsetting (match(needles, haystack) returns the position where each needle is found in the haystack).
    id <- match(grades, info$grade)
    id
    #> [1] 3 2 2 1 3
    info[id, ]
    #>     grade      desc  fail
    #> 3       1      Poor  TRUE
    #> 2       2      Good FALSE
    #> 2.1     2      Good FALSE
    #> 1       3 Excellent FALSE
    #> 3.1     1      Poor  TRUE
    

    If you’re matching on multiple columns, you’ll need to first collapse them into a single column (with e.g. interaction()). Typically, however, you’re better off switching to a function designed specifically for joining multiple tables like merge(), or dplyr::left_join().

    1. setdiff
      If you only know the columns you don’t want, use set operations to work out which columns to keep:
    df[setdiff(names(df), "z")]
    #>   x y
    #> 1 1 3
    #> 2 2 2
    #> 3 3 1
    
    1. xor()异或
    xor(x1, y1)
    #>  [1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE
    setdiff(union(x2, y2), intersect(x2, y2))
    #> [1] 2 4 6 8 5
    
    1. When first learning subsetting, a common mistake is to use x[which(y)] instead of x[y]. Here the which() achieves nothing: it switches from logical to integer subsetting but the result is exactly the same. In more general cases, there are two important differences.

    When the logical vector contains NA, logical subsetting replaces these values with NA while which() simply drops these values. It’s not uncommon to use which() for this side-effect, but I don’t recommend it: nothing about the name “which” implies the removal of missing values.

    x[-which(y)] is not equivalent to x[!y]: if y is all FALSE, which(y) will be integer(0) and -integer(0) is still integer(0), so you’ll get no values, instead of all values.

    In general, avoid switching from logical to integer subsetting unless you want, for example, the first or last TRUE value.

    相关文章

      网友评论

          本文标题:Advanced-R. Subsetting

          本文链接:https://www.haomeiwen.com/subject/bjqlfhtx.html