Advanced-R. Subsetting

作者: MJades | 来源:发表于2020-02-13 17:03 被阅读0次

The one important difference between $and [[ is that$ does (left-to-right) partial matching.
设置环境变量，当进行部分匹配时，进行错误提醒。

options(warnPartialMatchDollar = TRUE)
x$a
#> Warning in x$a: partial match of 'a' to 'abc'
#> [1] 1

Subsetting with nothing can be useful with assignment because it preserves the structure of the original object. Compare the following two expressions. In the first, mtcars remains a data frame because you are only changing the contents of mtcars, not mtcars itself. In the second, mtcars becomes a list because you are changing the object it is bound to.

mtcars[] <- lapply(mtcars, as.integer)
is.data.frame(mtcars)
#> [1] TRUE

mtcars <- lapply(mtcars, as.integer)
is.data.frame(mtcars)
#> [1] FALSE

Lookup tables
Character matching is a powerful way to create lookup tables. Say you want to convert abbreviations:

x <- c("m", "f", "u", "f", "f", "m", "m")
lookup <- c(m = "Male", f = "Female", u = NA)
lookup[x]
#>        m        f        u        f        f        m        m 
#>   "Male" "Female"       NA "Female" "Female"   "Male"   "Male"

Note that if you don’t want names in the result, use unname() to remove them.

unname(lookup[x])
#> [1] "Male"   "Female" NA       "Female" "Female" "Male"   "Male"

match
combining match() and integer subsetting (match(needles, haystack) returns the position where each needle is found in the haystack).

id <- match(grades, info$grade)
id
#> [1] 3 2 2 1 3
info[id, ]
#>     grade      desc  fail
#> 3       1      Poor  TRUE
#> 2       2      Good FALSE
#> 2.1     2      Good FALSE
#> 1       3 Excellent FALSE
#> 3.1     1      Poor  TRUE

If you’re matching on multiple columns, you’ll need to first collapse them into a single column (with e.g. interaction()). Typically, however, you’re better off switching to a function designed specifically for joining multiple tables like merge(), or dplyr::left_join().

setdiff
If you only know the columns you don’t want, use set operations to work out which columns to keep:

df[setdiff(names(df), "z")]
#>   x y
#> 1 1 3
#> 2 2 2
#> 3 3 1

xor()异或

xor(x1, y1)
#>  [1] FALSE  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE
setdiff(union(x2, y2), intersect(x2, y2))
#> [1] 2 4 6 8 5

When first learning subsetting, a common mistake is to use x[which(y)] instead of x[y]. Here the which() achieves nothing: it switches from logical to integer subsetting but the result is exactly the same. In more general cases, there are two important differences.

When the logical vector contains NA, logical subsetting replaces these values with NA while which() simply drops these values. It’s not uncommon to use which() for this side-effect, but I don’t recommend it: nothing about the name “which” implies the removal of missing values.

x[-which(y)] is not equivalent to x[!y]: if y is all FALSE, which(y) will be integer(0) and -integer(0) is still integer(0), so you’ll get no values, instead of all values.

In general, avoid switching from logical to integer subsetting unless you want, for example, the first or last TRUE value.

网友评论

本文标题：Advanced-R. Subsetting

本文链接：https://www.haomeiwen.com/subject/bjqlfhtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Advanced-R. Subsetting

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读