读purrr包有感

作者: ADO_AI | 来源:发表于2021-08-07 20:24 被阅读0次

一、利用purrr中的lmap函数创建含有哑变量的数据框

disjoin <- function(x, sep = "_") {
name <- names(x)
x <- as.factor(x[[1]])
out <- lapply(levels(x), function(level) {
as.numeric(x == level)
})
names(out) <- paste(name, levels(x), sep = sep)
out
}

Now, we are ready to map disjoin() on each categorical variable of a data frame:

iris %>% lmap_if(is.factor, disjoin)
mtcars %>% lmap_at(c("cyl", "vs", "am"), disjoin)

二、lmap和map的区别在于，lmap用于列表的每个元素的时候，仍将每个元素当做list处理，而非向量。见包中原文

lmap(), lmap_at() and lmap_if() are similar to map(), map_at() and map_if(), with the difference that they operate exclusively on functions that take and return a list (or data frame). Thus, instead of mapping the elements of a list (as in .x[[i]]), they apply a function .f to each subset of size 1 of that list (as in .x[i]). We call those elements list-elements).

三、map可以利用字符串或者数字对列表进行循环索引，但其索引内涵是对每个列表元素进行索引：map(list,'a')，其本质是对 list[n]['a'],而非list['a']。 image.png

image.png

相反，map_at(list,'a',.f) 或 map_at(list,c(4,5),.f),其本质是对list['a']或list[c(4,5)]应用函数.f，这与map有着列表元素层级上的差别。

The functions map_if() and map_at() take .x as input, apply the function .f to some of the
elements of .x, and return a list of the same length as the input.
map_at和map_if 的输入是原子向量或列表，其返回值是与原向量or列表等长的列表

对于数据框而言，map_at(df,c(4,5),is.numeric()) 对4、5列进行了逻辑判断，各自生成了一个逻辑值；运行后使得原来长度(列数)为5的数据框(特殊列表)变成了一个长度为5的列表

image.png

四、modify系列函数：输入是何类型，输出就一定是何种类型（输入是数据框，则输出就一定是数据框)。正因为有这种强制转换机制，则在使用modify系列函数的过程中，必须保证输出是有意义的，保证modify中的.f函数使用是合理的

Unlike map() and its variants which always return a fixed object type (list for map(), integer vector for map_int(), etc), the modify() family always returns the same type as the input object.modify() is a shortcut for x[[i]] <- f(x[[i]]); return(x).

Details
Since the transformation can alter the structure of the input; it’s your responsibility to ensure that the transformation produces a valid output. For example, if you’re modifying a data frame, .f must preserve the length of the input.

以modify_at为例，与上述map_at的作用效果类似；但map_at(df,c(4,5),is.numeric()) 将原列表的第4、5个元素变成了单个的逻辑值；而modify_at会强行将单个逻辑值向量化，保证输出仍为数据框 image.png

再次感受map_if和modify_if的区别

image.png

五、map将函数用于列表or向量, 匿名函数的写法中，~相当于function(x); .x相当于function(x)中x的作用

function(x)

1:10 %>% map(function(x) rnorm(10, x))

Or a formula

1:10 %>% map(~ rnorm(10, .x))

六、理解一下为什么map函数能够逐列处理数据框；而pmap能够逐行处理数据框

首先，数据框的本质是由多个长度相等的向量按列合并而成的列表
map(.x, .f, ...)就是针对列表.x中的元素(向量)逐个应用函数.f；因此map用于数据框就表现为逐列应用函数.f。
map2(.x,.y,.f) 是针对列表.x和.y中的同位置元素,同步应用函数.f，函数.f 要求必须两个数据对象作为输入
pmap(.l,.f)的本质是针对.l 中任意多个列表中的同位置元素，逐个应用函数.f，则用于数据框，就表现为逐行处理

5b77b13855a690013ccb78a24075e12.jpg pmap这种逐行处理数据框的性质，不单能够将sum，mean这种函数逐行用于数据框，进行统计分析。更重要的是，能够将a[1],b[1],c[1],d[1],e[1]当成是.f 所需的参数：

image.png

网友评论

R语言常见问题汇总

本文标题：读purrr包有感

本文链接：https://www.haomeiwen.com/subject/vjvgvltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

读purrr包有感

一、利用purrr中的lmap函数创建含有哑变量的数据框

Now, we are ready to map disjoin() on each categorical variable of a data frame:

二、lmap和map的区别在于，lmap用于列表的每个元素的时候，仍将每个元素当做list处理，而非向量。见包中原文

三、map可以利用字符串或者数字对列表进行循环索引，但其索引内涵是对每个列表元素进行索引：map(list,'a')，其本质是对 list[n]['a'],而非list['a']。 image.png

相反，map_at(list,'a',.f) 或 map_at(list,c(4,5),.f),其本质是对list['a']或list[c(4,5)]应用函数.f，这与map有着列表元素层级上的差别。

对于数据框而言，map_at(df,c(4,5),is.numeric()) 对4、5列进行了逻辑判断，各自生成了一个逻辑值；运行后使得原来长度(列数)为5的数据框(特殊列表)变成了一个长度为5的列表

四、modify系列函数：输入是何类型，输出就一定是何种类型（输入是数据框，则输出就一定是数据框)。正因为有这种强制转换机制，则在使用modify系列函数的过程中，必须保证输出是有意义的，保证modify中的.f函数使用是合理的

以modify_at为例，与上述map_at的作用效果类似；但map_at(df,c(4,5),is.numeric()) 将原列表的第4、5个元素变成了单个的逻辑值；而modify_at会强行将单个逻辑值向量化，保证输出仍为数据框 image.png

再次感受map_if和modify_if的区别

五、map将函数用于列表or向量, 匿名函数的写法中，~相当于function(x); .x相当于function(x)中x的作用

function(x)

Or a formula

六、理解一下为什么map函数能够逐列处理数据框；而pmap能够逐行处理数据框

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

R语言常见问题汇总