美文网首页
apply家族——只为循环而生

apply家族——只为循环而生

作者: 芋圆学徒 | 来源:发表于2021-09-28 20:03 被阅读0次

    apply()函数是一个很R语言的函数,可以起到很好的替代冗余的for循环的作用,在一篇博客里面介绍过,R语言的循环操作for和while,都是基于R语言本身来实现的,而向量操作是基于底层的C语言函数实现的,所以使用apply()家族进行向量计算是高性价比的。apply()可以面向数据框、列表、向量等,同时任何函数都可以传递给apply()函数。
    作者:面面的徐爷
    链接:https://www.jianshu.com/p/8e04245bfe6d

    一、apply() 家谱

    apply家族为循环而生,又根据输入、输出的数据类型衍生8大派系,其中前三个最为人知:
    apply函数:处理矩阵的行或列
    lapply函数:输入list,对list每个对象操作后返回list
    sapply函数:输入list,对list每个对象操作后返回matrix
    vapply函数
    mapply函数
    tapply函数
    rapply函数
    eapply函数

    apply.png

    二、成员简介

    1、 apply()函数

    apply函数可以取代for循环对数据进行行或列的处理,最后返回向量、矩阵、或列表等
    参数介绍
    apply(X, MARGIN, FUN, ...)

    X :an array, including a matrix.


    MARGIN :a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names.


    FUN :the function to be applied: see ‘Details’. In the case of functions like +, %*%, etc., the function name must be backquoted or quoted.


    ... :optional arguments to FUN.


    实战简介:
    1、数学运算:sum, mean, quantile等
    2、自定义函数


    1、数学运算:sum, mean, quantile等

    ####矩阵每一行、每一列求和
    > x <- cbind(x1 = 3, x2 = c(4:1, 2:5))
    > dimnames(x)[[1]] <- letters[1:8]
    #大致看一下数据结构
      x1 x2
    a  3  4
    b  3  3
    c  3  2
    d  3  1
    e  3  2
    f  3  3
    g  3  4
    h  3  5
    #对x的行和列分别求和
    > apply(x, 2, mean, trim = .2)
    > col.sums <- apply(x, 2, sum)
    > row.sums <- apply(x, 1, sum)
    > rbind(cbind(x, Rtot = row.sums), Ctot = c(col.sums, sum(col.sums)))
     x1 x2 Rtot
    a     3  4    7
    b     3  3    6
    c     3  2    5
    d     3  1    4
    e     3  2    5
    f     3  3    6
    g     3  4    7
    h     3  5    8
    Ctot 24 24   48
    
    ## Sort the columns of a matrix
    apply(x, 2, sort)
         x1 x2
    [1,]  3  1
    [2,]  3  2
    [3,]  3  2
    [4,]  3  3
    [5,]  3  3
    [6,]  3  4
    [7,]  3  4
    [8,]  3  5
    

    2、自定义函数

    > ##- function with extra args:
    > cave <- function(x, c1, c2) c(mean(x[c1]), mean(x[c2]))
    > apply(x, 1, cave,  c1 = "x1", c2 = c("x1","x2"))
          row
             a b   c d   e f   g h
      [1,] 3.0 3 3.0 3 3.0 3 3.0 3
      [2,] 3.5 3 2.5 2 2.5 3 3.5 4
    > 
    > ma <- matrix(c(1:4, 1, 6:8), nrow = 2)
    > ma
         [,1] [,2] [,3] [,4]
    [1,]    1    3    1    7
    [2,]    2    4    6    8
    > apply(ma, 1, table)         #--> a list of length 2
    [[1]]
    
    1 3 7 
    2 1 1 
    
    [[2]]
    
    2 4 6 8 
    1 1 1 1 
    
    > apply(ma, 1, stats::quantile) # 5 x n matrix with rownames
         [,1] [,2]
    0%      1  2.0
    25%     1  3.5
    50%     2  5.0
    75%     4  6.5
    100%    7  8.0
    

    2、 lapply()函数

    lapply函数是一个最基础循环操作函数之一,用来对list、data.frame数据集进行循环,并返回和X长度同样的list结构作为结果集,通过lapply的开头的第一个字母’l’就可以判断返回结果集的类型;可以通过参数 simplify = T简化结果,返回matrix,结果与sapply一致。

    参数介绍
    lapply(X, FUN, ...)


    X :a vector (atomic or list) or an expression object. Other objects (including classed objects) will be coerced by base::as.list.


    FUN :the function to be applied to each element of X: see ‘Details’. In the case of functions like +, %*%, the function name must be backquoted or quoted.


    ... :optional arguments to FUN.


    simplify :logical or character string; should the result be simplified to a vector, matrix or higher dimensional array if possible? For sapply it must be named and not abbreviated. The default value, TRUE, returns a vector or matrix if appropriate, whereas if simplify = "array" the result may be an array of “rank” (=length(dim(.))) one higher than the result of FUN(X[[i]]).


    实战介绍

    > require(stats); require(graphics)
    > 
    > x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
    > # compute the list mean for each list element
    > lapply(x, mean)
    $a
    [1] 5.5
    
    $beta
    [1] 4.535125
    
    $logic
    [1] 0.5
    
    > # median and quartiles for each list element
    > lapply(x, quantile, probs = 1:3/4)
    $a
     25%  50%  75% 
    3.25 5.50 7.75 
    
    $beta
          25%       50%       75% 
    0.2516074 1.0000000 5.0536690 
    
    $logic
    25% 50% 75% 
    0.0 0.5 1.0 
    
    

    当lapply对矩阵和数据框操作时,可能达不到我们的要求。其中当输入数据是matrix时,lapply对matrix中每个向量操作,返回值再逐个放进list中的每个key;当输入dataframe时,lapply对每一列操作。

     
    # 生成一个矩阵
    > x <- cbind(x1=3, x2=c(2:1,4:5))
    > x; class(x)
         x1 x2
    [1,]  3  2
    [2,]  3  1
    [3,]  3  4
    [4,]  3  5
    [1] "matrix"
     
    # 求和
    > lapply(x, sum)
    [[1]]
    [1] 3
     
    [[2]]
    [1] 3
     
    [[3]]
    [1] 3
     
    [[4]]
    [1] 3
     
    [[5]]
    [1] 2
     
    [[6]]
    [1] 1
     
    [[7]]
    [1] 4
     
    [[8]]
    [1] 5
    

    lapply会分别循环矩阵中的每个值,而不是按行或按列进行分组计算。

    如果对数据框的列求和。

    > lapply(data.frame(x), sum)
    $x1
    [1] 12
    
    $x2
    [1] 12
    

    lapply会自动把数据框按列进行分组,再进行计算。

    3、 sapply()函数

    sapply()函数做的事情和lapply()一样,可以理解为是一个简化的lapply,返回的是一个向量(vector)使得对解读更加友好,其使用方法和lapply一样,不过多了两个参数: simplify&use.NAMEs,simplify = T可以将输出结果数组化,如果设置为false,sapply()函数就和lapply()函数没有差别了,use.NAMEs = T可以设置字符串为字符名。

    参数介绍
    sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)


    实战简介

    > x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
    > sapply(x, quantile)
             a        beta logic
    0%    1.00  0.04978707   0.0
    25%   3.25  0.25160736   0.0
    50%   5.50  1.00000000   0.5
    75%   7.75  5.05366896   1.0
    100% 10.00 20.08553692   1.0
    > i39 <- sapply(3:9, seq) # list of vectors
    > sapply(i39, fivenum)
         [,1] [,2] [,3] [,4] [,5] [,6] [,7]
    [1,]  1.0  1.0    1  1.0  1.0  1.0    1
    [2,]  1.5  1.5    2  2.0  2.5  2.5    3
    [3,]  2.0  2.5    3  3.5  4.0  4.5    5
    [4,]  2.5  3.5    4  5.0  5.5  6.5    7
    [5,]  3.0  4.0    5  6.0  7.0  8.0    9
    

    4、 vapply()函数

    vapply类似于sapply,提供了FUN.VALUE参数,用来控制返回值的行名,这样可以让程序更丰满。

    参数介绍
    vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)


    X :a vector (atomic or list) or an expression object. Other objects (including classed objects) will be coerced by base::as.list.


    FUN :the function to be applied to each element of X: see ‘Details’. In the case of functions like +, %*%, the function name must be backquoted or quoted.


    ... :optional arguments to FUN.


    simplify :logical or character string; should the result be simplified to a vector, matrix or higher dimensional array if possible? For sapply it must be named and not abbreviated. The default value, TRUE, returns a vector or matrix if appropriate, whereas if simplify = "array" the result may be an array of “rank” (=length(dim(.))) one higher than the result of FUN(X[[i]]).


    USE.NAMES :logical; if TRUE and if X is character, use X as names for the result unless it had names already. Since this argument follows ... its name cannot be abbreviated.
    以上参数和sapply一样


    FUN.VALUE :a (generalized) vector; a template for the return value from FUN.
    特有参数添加行名

    
    > i39 <- sapply(3:9, seq)     # list of vectors,每个key中有n个n
    > sapply(i39, fivenum)
         [,1] [,2] [,3] [,4] [,5] [,6] [,7]
    [1,]  1.0  1.0    1  1.0  1.0  1.0    1
    [2,]  1.5  1.5    2  2.0  2.5  2.5    3
    [3,]  2.0  2.5    3  3.5  4.0  4.5    5
    [4,]  2.5  3.5    4  5.0  5.5  6.5    7
    [5,]  3.0  4.0    5  6.0  7.0  8.0    9
    > vapply(i39, fivenum, c(Min. = 0, "1st Qu." = 0, Median = 0, "3rd Qu." = 0, Max. = 0))
    #添加行名
            [,1] [,2] [,3] [,4] [,5] [,6] [,7]
    Min.     1.0  1.0    1  1.0  1.0  1.0    1
    1st Qu.  1.5  1.5    2  2.0  2.5  2.5    3
    Median   2.0  2.5    3  3.5  4.0  4.5    5
    3rd Qu.  2.5  3.5    4  5.0  5.5  6.5    7
    Max.     3.0  4.0    5  6.0  7.0  8.0    9
    >
    

    5、 tapply()函数

    tapply用于分组的循环计算,通过INDEX参数可以把数据集X进行分组,相当于group by的操作。
    参数介绍
    tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

    > # 通过iris$Species品种进行分组
    > tapply(iris$Petal.Length,iris$Species,mean)
        setosa versicolor  virginica 
         1.462      4.260      5.552
    

    6、mapply函数

    mapply也是sapply的变形函数,类似多变量的sapply,但是参数定义有些变化。第一参数为自定义的FUN函数,第二个参数’…’可以接收多个数据,作为FUN函数的参数调用。

    参数介绍:
    mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE,USE.NAMES = TRUE)

    FUN: 自定义的调用函数
    …: 接收多个数据
    MoreArgs: 参数列表
    SIMPLIFY: 是否数组化,当值array时,输出结果按数组进行分组
    USE.NAMES: 如果X为字符串,TRUE设置字符串为数据名,FALSE不设置

    > set.seed(1)
     
    # 长度为4
    > n<-rep(4,4)
     
    # m为均值,v为方差
    > m<-v<-c(1,10,100,1000)
     
    # 生成4组数据,按列分组
    > mapply(rnorm,n,m,v)
              [,1]      [,2]      [,3]       [,4]
    [1,] 0.3735462 13.295078 157.57814   378.7594
    [2,] 1.1836433  1.795316  69.46116 -1214.6999
    [3,] 0.1643714 14.874291 251.17812  2124.9309
    [4,] 2.5952808 17.383247 138.98432   955.0664
    

    由于mapply是可以接收多个参数的,所以我们在做数据操作的时候,就不需要把数据先合并为data.frame了,直接一次操作就能计算出结果了。

    7、rapply函数

    rapply是一个递归版本的lapply,它只处理list类型数据,对list的每个元素进行递归遍历,如果list包括子元素则继续遍历。

    函数定义:

    rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)
    参数介绍
    object:list数据
    f: 自定义的调用函数
    classes : 匹配类型, ANY为所有类型
    deflt: 非匹配类型的默认值
    how: 3种操作方式,当为replace时,则用调用f后的结果替换原list中原来的元素;当为list时,新建一个list,类型匹配调用f函数,不匹配赋值为deflt;当为unlist时,会执行一次unlist(recursive = TRUE)的操作
    …: 更多参数,可选
    比如,对一个list的数据进行过滤,把所有数字型numeric的数据进行从小到大的排序。

    > x=list(a=12,b=1:4,c=c('b','a'))
    > y=pi
    > z=data.frame(a=rnorm(10),b=1:10)
    > a <- list(x=x,y=y,z=z)
    
    # 进行排序,并替换原list的值
    > rapply(a,sort, classes='numeric',how='replace')
    $x
    $x$a
    [1] 12
    $x$b
    [1] 4 3 2 1
    $x$c
    [1] "b" "a"
    
    $y
    [1] 3.141593
    
    $z
    $z$a
    [1] -0.8356286 -0.8204684 -0.6264538 -0.3053884  0.1836433  0.3295078
    [7]  0.4874291  0.5757814  0.7383247  1.5952808
    $z$b
    [1] 10  9  8  7  6  5  4  3  2  1
    
    > class(a$z$b)
    [1] "integer"
    

    从结果发现,只有za的数据进行了排序,检查zb的类型,发现是integer,是不等于numeric的,所以没有进行排序。

    接下来,对字符串类型的数据进行操作,把所有的字符串型加一个字符串’++++’,非字符串类型数据设置为NA。

    > rapply(a,function(x) paste(x,'++++'),classes="character",deflt=NA, how = "list")
    $x
    $x$a
    [1] NA
    $x$b
    [1] NA
    $x$c
    [1] "b ++++" "a ++++"
     
    $y
    [1] NA
     
    $z
    $z$a
    [1] NA
    $z$b
    [1] NA
    

    只有x$c为字符串向量,都合并了一个新字符串。那么,有了rapply就可以对list类型的数据进行方便的数据过滤了。

    8、eapply函数

    eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE)

    总结
    尽管apply家族数量庞大,一般前三位就可满足我们对循环的需要,合理使用apply家族从而更加高效简洁的达到我们的目的。

    参考

    http://blog.fens.me/r-apply/
    https://www.jianshu.com/p/8e04245bfe6d

    相关文章

      网友评论

          本文标题:apply家族——只为循环而生

          本文链接:https://www.haomeiwen.com/subject/ryaqxltx.html