tidyr总结篇-续

作者: 医科研 | 来源:发表于2019-07-07 17:47 被阅读17次

    作者:白介素2

    tidyr总结篇

    gather(data,key="“,value=”") ## key是变量,value是值 gather的意义是重新塑造数据的变量,原有数据的变量并不是真正的变量 这时候变量不是变量,变量还是变量。
    举例说明: 神奇的gather
    参数1:data 参数2:key变量名,参数3:value变量名 参数4:gather的变量指定 其中-表示除外某向量,全部gather

    Sys.setlocale('LC_ALL','C')
    ## [1] "C"
    library(tidyverse)
    ## Registered S3 methods overwritten by 'ggplot2':
    ##   method         from 
    ##   [.quosures     rlang
    ##   c.quosures     rlang
    ##   print.quosures rlang
    ## Registered S3 method overwritten by 'rvest':
    ##   method            from
    ##   read_xml.response xml2
    ## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
    ## <U+221A> ggplot2 3.1.0       <U+221A> purrr   0.3.0  
    ## <U+221A> tibble  2.0.1       <U+221A> dplyr   0.8.0.1
    ## <U+221A> tidyr   0.8.2       <U+221A> stringr 1.4.0  
    ## <U+221A> readr   1.3.1       <U+221A> forcats 0.4.0
    ## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
    ## x dplyr::filter() masks stats::filter()
    ## x dplyr::lag()    masks stats::lag()
    stocks <- tibble(
      time = as.Date('2009-01-01') + 0:9,
      X = rnorm(10, 0, 1),
      Y = rnorm(10, 0, 2),
      Z = rnorm(10, 0, 4)
    )
    stocks
    ## # A tibble: 10 x 4
    ##    time             X      Y      Z
    ##    <date>       <dbl>  <dbl>  <dbl>
    ##  1 2009-01-01 -0.497  -1.20   5.93 
    ##  2 2009-01-02  1.22    1.58  -4.43 
    ##  3 2009-01-03  1.68   -2.50   8.03 
    ##  4 2009-01-04  1.58    0.744 -2.00 
    ##  5 2009-01-05  0.775   1.87  -3.14 
    ##  6 2009-01-06  0.0405  0.629  4.31 
    ##  7 2009-01-07 -1.42   -1.36   9.63 
    ##  8 2009-01-08  1.18    5.21  -0.231
    ##  9 2009-01-09 -0.581  -1.02  -0.680
    ## 10 2009-01-10  0.768   0.900  6.43
    

    gather起stocks中的,X,Y,Z. 新命名一个key,命名一个value, 除去time不变化

    gather(stocks, stock, price, -time)
    ## # A tibble: 30 x 3
    ##    time       stock   price
    ##    <date>     <chr>   <dbl>
    ##  1 2009-01-01 X     -0.497 
    ##  2 2009-01-02 X      1.22  
    ##  3 2009-01-03 X      1.68  
    ##  4 2009-01-04 X      1.58  
    ##  5 2009-01-05 X      0.775 
    ##  6 2009-01-06 X      0.0405
    ##  7 2009-01-07 X     -1.42  
    ##  8 2009-01-08 X      1.18  
    ##  9 2009-01-09 X     -0.581 
    ## 10 2009-01-10 X      0.768 
    ## # ... with 20 more rows
    stocks %>% gather(stock, price, -time)##保留time不变化
    ## # A tibble: 30 x 3
    ##    time       stock   price
    ##    <date>     <chr>   <dbl>
    ##  1 2009-01-01 X     -0.497 
    ##  2 2009-01-02 X      1.22  
    ##  3 2009-01-03 X      1.68  
    ##  4 2009-01-04 X      1.58  
    ##  5 2009-01-05 X      0.775 
    ##  6 2009-01-06 X      0.0405
    ##  7 2009-01-07 X     -1.42  
    ##  8 2009-01-08 X      1.18  
    ##  9 2009-01-09 X     -0.581 
    ## 10 2009-01-10 X      0.768 
    ## # ... with 20 more rows
    ##
    mini_iris <- iris[c(1, 51, 101), ]
    mini_iris
    ##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    ## 1            5.1         3.5          1.4         0.2     setosa
    ## 51           7.0         3.2          4.7         1.4 versicolor
    ## 101          6.3         3.3          6.0         2.5  virginica
    gather(mini_iris,key = "flower_att",value = "value",Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)
    ##       Species   flower_att value
    ## 1      setosa Sepal.Length   5.1
    ## 2  versicolor Sepal.Length   7.0
    ## 3   virginica Sepal.Length   6.3
    ## 4      setosa  Sepal.Width   3.5
    ## 5  versicolor  Sepal.Width   3.2
    ## 6   virginica  Sepal.Width   3.3
    ## 7      setosa Petal.Length   1.4
    ## 8  versicolor Petal.Length   4.7
    ## 9   virginica Petal.Length   6.0
    ## 10     setosa  Petal.Width   0.2
    ## 11 versicolor  Petal.Width   1.4
    ## 12  virginica  Petal.Width   2.5
    gather(mini_iris,key = "flower_att",value = "value",Sepal.Length:Petal.Width)
    ##       Species   flower_att value
    ## 1      setosa Sepal.Length   5.1
    ## 2  versicolor Sepal.Length   7.0
    ## 3   virginica Sepal.Length   6.3
    ## 4      setosa  Sepal.Width   3.5
    ## 5  versicolor  Sepal.Width   3.2
    ## 6   virginica  Sepal.Width   3.3
    ## 7      setosa Petal.Length   1.4
    ## 8  versicolor Petal.Length   4.7
    ## 9   virginica Petal.Length   6.0
    ## 10     setosa  Petal.Width   0.2
    ## 11 versicolor  Petal.Width   1.4
    ## 12  virginica  Petal.Width   2.5
    

    -表示不gather的变量

    gather(mini_iris,key = "flow_att",value = "value",-Species)
    ##       Species     flow_att value
    ## 1      setosa Sepal.Length   5.1
    ## 2  versicolor Sepal.Length   7.0
    ## 3   virginica Sepal.Length   6.3
    ## 4      setosa  Sepal.Width   3.5
    ## 5  versicolor  Sepal.Width   3.2
    ## 6   virginica  Sepal.Width   3.3
    ## 7      setosa Petal.Length   1.4
    ## 8  versicolor Petal.Length   4.7
    ## 9   virginica Petal.Length   6.0
    ## 10     setosa  Petal.Width   0.2
    ## 11 versicolor  Petal.Width   1.4
    ## 12  virginica  Petal.Width   2.5
    

    省略掉key, value

    gather(mini_iris,flow_att,value,-Species)##得到的结果相同
    ##       Species     flow_att value
    ## 1      setosa Sepal.Length   5.1
    ## 2  versicolor Sepal.Length   7.0
    ## 3   virginica Sepal.Length   6.3
    ## 4      setosa  Sepal.Width   3.5
    ## 5  versicolor  Sepal.Width   3.2
    ## 6   virginica  Sepal.Width   3.3
    ## 7      setosa Petal.Length   1.4
    ## 8  versicolor Petal.Length   4.7
    ## 9   virginica Petal.Length   6.0
    ## 10     setosa  Petal.Width   0.2
    ## 11 versicolor  Petal.Width   1.4
    ## 12  virginica  Petal.Width   2.5
    

    在管道中演示一套

    注意group_by与slice联用时,slice切割的是总分组的数目 如果group分了3组,那slice切割1的话 就是显示13,如果切割1:2的话,那就是23,显示6个观测 下面举例说明

    • 展示分组中的序列1,包含3个species

    注意slice与group_by的联用

    library(dplyr)
    mini_iris <-
      iris %>%
      group_by(Species) %>%
      slice(1)
    mini_iris %>% gather(key = flower_att, value = measurement, -Species)
    ## # A tibble: 12 x 3
    ## # Groups:   Species [3]
    ##    Species    flower_att   measurement
    ##    <fct>      <chr>              <dbl>
    ##  1 setosa     Sepal.Length         5.1
    ##  2 versicolor Sepal.Length         7  
    ##  3 virginica  Sepal.Length         6.3
    ##  4 setosa     Sepal.Width          3.5
    ##  5 versicolor Sepal.Width          3.2
    ##  6 virginica  Sepal.Width          3.3
    ##  7 setosa     Petal.Length         1.4
    ##  8 versicolor Petal.Length         4.7
    ##  9 virginica  Petal.Length         6  
    ## 10 setosa     Petal.Width          0.2
    ## 11 versicolor Petal.Width          1.4
    ## 12 virginica  Petal.Width          2.5
    

    再来举个例子 - mtcars数据集中的cyl分组为4-6-8 - 切割slice 1:2,即显示2组,4-6-8

    by_cyl <- group_by(mtcars, cyl)
    ##
    slice(by_cyl, 1:2)
    ## # A tibble: 6 x 11
    ## # Groups:   cyl [3]
    ##     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
    ##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    ## 1  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
    ## 2  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
    ## 3  21       6  160    110  3.9   2.62  16.5     0     1     4     4
    ## 4  21       6  160    110  3.9   2.88  17.0     0     1     4     4
    ## 5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
    ## 6  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
    

    spread函数

    这是一个与gather互为逆向操作的函数 函数参数有:参数1:data, 参数2:key,参数3:value,达到按key与value展开的效果

    library(dplyr)
    stocks <- data.frame(
      time = as.Date('2009-01-01') + 0:9,
      X = rnorm(10, 0, 1),
      Y = rnorm(10, 0, 2),
      Z = rnorm(10, 0, 4)
    )
    stocks
    ##          time          X          Y          Z
    ## 1  2009-01-01 -0.9981963  0.6149012  -1.880305
    ## 2  2009-01-02  0.9763906  1.2292060  -2.244749
    ## 3  2009-01-03  1.3475060 -1.6466510   4.497477
    ## 4  2009-01-04  0.6845907 -2.8694272 -10.145486
    ## 5  2009-01-05 -0.3132428  0.2366398  -1.401196
    ## 6  2009-01-06  1.0542915 -0.5094071   1.311380
    ## 7  2009-01-07 -2.5360015  1.5011045   1.188158
    ## 8  2009-01-08 -0.2878114  1.6744369  -3.015077
    ## 9  2009-01-09 -0.3004896 -2.8344579   4.376036
    ## 10 2009-01-10 -0.1714464  0.8319891  -2.288022
    

    先key value, gather一下

    stocksm <- stocks %>% gather(stock, price, -time)
    stocksm
    ##          time stock       price
    ## 1  2009-01-01     X  -0.9981963
    ## 2  2009-01-02     X   0.9763906
    ## 3  2009-01-03     X   1.3475060
    ## 4  2009-01-04     X   0.6845907
    ## 5  2009-01-05     X  -0.3132428
    ## 6  2009-01-06     X   1.0542915
    ## 7  2009-01-07     X  -2.5360015
    ## 8  2009-01-08     X  -0.2878114
    ## 9  2009-01-09     X  -0.3004896
    ## 10 2009-01-10     X  -0.1714464
    ## 11 2009-01-01     Y   0.6149012
    ## 12 2009-01-02     Y   1.2292060
    ## 13 2009-01-03     Y  -1.6466510
    ## 14 2009-01-04     Y  -2.8694272
    ## 15 2009-01-05     Y   0.2366398
    ## 16 2009-01-06     Y  -0.5094071
    ## 17 2009-01-07     Y   1.5011045
    ## 18 2009-01-08     Y   1.6744369
    ## 19 2009-01-09     Y  -2.8344579
    ## 20 2009-01-10     Y   0.8319891
    ## 21 2009-01-01     Z  -1.8803050
    ## 22 2009-01-02     Z  -2.2447486
    ## 23 2009-01-03     Z   4.4974771
    ## 24 2009-01-04     Z -10.1454861
    ## 25 2009-01-05     Z  -1.4011960
    ## 26 2009-01-06     Z   1.3113796
    ## 27 2009-01-07     Z   1.1881581
    ## 28 2009-01-08     Z  -3.0150769
    ## 29 2009-01-09     Z   4.3760358
    ## 30 2009-01-10     Z  -2.2880217
    

    spread展开数据

    按stock, price展开

    stocksm %>% spread(stock, price)
    ##          time          X          Y          Z
    ## 1  2009-01-01 -0.9981963  0.6149012  -1.880305
    ## 2  2009-01-02  0.9763906  1.2292060  -2.244749
    ## 3  2009-01-03  1.3475060 -1.6466510   4.497477
    ## 4  2009-01-04  0.6845907 -2.8694272 -10.145486
    ## 5  2009-01-05 -0.3132428  0.2366398  -1.401196
    ## 6  2009-01-06  1.0542915 -0.5094071   1.311380
    ## 7  2009-01-07 -2.5360015  1.5011045   1.188158
    ## 8  2009-01-08 -0.2878114  1.6744369  -3.015077
    ## 9  2009-01-09 -0.3004896 -2.8344579   4.376036
    ## 10 2009-01-10 -0.1714464  0.8319891  -2.288022
    

    按time, price展开

    stocksm %>% spread(time, price)
    ##   stock 2009-01-01 2009-01-02 2009-01-03  2009-01-04 2009-01-05 2009-01-06
    ## 1     X -0.9981963  0.9763906   1.347506   0.6845907 -0.3132428  1.0542915
    ## 2     Y  0.6149012  1.2292060  -1.646651  -2.8694272  0.2366398 -0.5094071
    ## 3     Z -1.8803050 -2.2447486   4.497477 -10.1454861 -1.4011960  1.3113796
    ##   2009-01-07 2009-01-08 2009-01-09 2009-01-10
    ## 1  -2.536001 -0.2878114 -0.3004896 -0.1714464
    ## 2   1.501104  1.6744369 -2.8344579  0.8319891
    ## 3   1.188158 -3.0150769  4.3760358 -2.2880217
    

    说明一下gather-spread的互补性质

    stocks
    ##          time          X          Y          Z
    ## 1  2009-01-01 -0.9981963  0.6149012  -1.880305
    ## 2  2009-01-02  0.9763906  1.2292060  -2.244749
    ## 3  2009-01-03  1.3475060 -1.6466510   4.497477
    ## 4  2009-01-04  0.6845907 -2.8694272 -10.145486
    ## 5  2009-01-05 -0.3132428  0.2366398  -1.401196
    ## 6  2009-01-06  1.0542915 -0.5094071   1.311380
    ## 7  2009-01-07 -2.5360015  1.5011045   1.188158
    ## 8  2009-01-08 -0.2878114  1.6744369  -3.015077
    ## 9  2009-01-09 -0.3004896 -2.8344579   4.376036
    ## 10 2009-01-10 -0.1714464  0.8319891  -2.288022
    stocks %>% 
      gather(key=stock,value = price,-time) %>% ##先聚合
      spread(key = stock,value = price) %>% ## 又展开还原
      identical(stocks) ## 判断与原来的stocks是否完全一样
    ## [1] TRUE
    

    总结一下 gather与spread,可以自如的将数据变换为宽数据或窄数据 gather的数据格式非常适用用于ggplot2的导入,用于可视化 说到这里了我们就绘制一下吧,当然关于可视化的内容暂时不展开讲。

    牛刀小试

    library(ggplot2)
    p<-stocks %>% 
      gather(key = stock,value = price,-time) %>% 
      as_tibble() %>% ##直接导入到ggplot2进行可视化
      ggplot2::ggplot(aes(x=stock,y=price,fill=stock))+
        geom_boxplot()
    p 
    
    image.png

    改改颜色

    p+scale_fill_brewer(palette="Dark2")
    
    image.png

    放上自己喜欢的颜色

    p+scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))
    
    image.png

    tidyr::unite函数

    能够方便的实现将多列粘贴到一起的功能 参数1:data数据框,参数2:新列名,参数3:sep分隔符,参数4:remove=T移除原列 下面举例说明,这个功能好用,但用起来比较简单

    粘贴vs与am列

    library(dplyr)
    unite_(mtcars, "vs_am", c("vs","am"))
    ##                      mpg cyl  disp  hp drat    wt  qsec vs_am gear carb
    ## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46   0_1    4    4
    ## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02   0_1    4    4
    ## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61   1_1    4    1
    ## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44   1_0    3    1
    ## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02   0_0    3    2
    ## Valiant             18.1   6 225.0 105 2.76 3.460 20.22   1_0    3    1
    ## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84   0_0    3    4
    ## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00   1_0    4    2
    ## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90   1_0    4    2
    ## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30   1_0    4    4
    ## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90   1_0    4    4
    ## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40   0_0    3    3
    ## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60   0_0    3    3
    ## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00   0_0    3    3
    ## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98   0_0    3    4
    ## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82   0_0    3    4
    ## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42   0_0    3    4
    ## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47   1_1    4    1
    ## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52   1_1    4    2
    ## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90   1_1    4    1
    ## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01   1_0    3    1
    ## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87   0_0    3    2
    ## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30   0_0    3    2
    ## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41   0_0    3    4
    ## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05   0_0    3    2
    ## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90   1_1    4    1
    ## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70   0_1    5    2
    ## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90   1_1    5    2
    ## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50   0_1    5    4
    ## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50   0_1    5    6
    ## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60   0_1    5    8
    ## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60   1_1    4    2
    

    粘贴再分割是可逆的操作

    mtcars %>%
      unite(vs_am, vs, am) %>%
      separate(vs_am, c("vs", "am"))
    ##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    ## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    ## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    ## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    ## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    ## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    ## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    ## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    ## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    ## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    ## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    ## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    ## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    ## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    ## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    ## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    ## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    ## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    ## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    ## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    ## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    ## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    ## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    ## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    ## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    ## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    ## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    ## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    ## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    ## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    ## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    ## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    ## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    

    separate函数-逆向的unite操作

    参数1:data,参数2: 要拆分的列,参数3:拆分成的新变量,参数4:Sep分割模式 这个函数能做到将1列拆解为多列,用法与unite非常相似 这里不重复过多,举几个简单示例说明即可

    library(dplyr)
    df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
    df
    ##      x
    ## 1 <NA>
    ## 2  a.b
    ## 3  a.d
    ## 4  b.c
    

    分割x为A-B

    df %>% separate(x, c("A", "B"))
    ##      A    B
    ## 1 <NA> <NA>
    ## 2    a    b
    ## 3    a    d
    ## 4    b    c
    

    如果你想只保留第二个变量

    df %>% separate(x, c(NA, "B"))
    ##      B
    ## 1 <NA>
    ## 2    b
    ## 3    d
    ## 4    c
    

    一个比较难办的问题是,如果需要裂解的列,裂解出来并不是相同的长度怎么办? separate函数提供了几个参数 extra与fill参数来控制裂解的方式 extra用于控制裂解碎片过多,warn:警告信息但扔掉多余,drop:扔掉但并不警告,merge:不扔掉,多余的merge起来 fill,warn警告但从左侧开始填充,right,右侧填充NA,left左侧填充NA

    df <- data.frame(x = c("a", "a b", "a b c", NA))
    df
    ##       x
    ## 1     a
    ## 2   a b
    ## 3 a b c
    ## 4  <NA>
    

    这样的方式会有warning

    df %>% separate(x, c("a", "b"))
    ## Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [3].
    ## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
    ##      a    b
    ## 1    a <NA>
    ## 2    a    b
    ## 3    a    b
    ## 4 <NA> <NA>
    

    扔掉多余信息,右侧填充NA

    df %>% separate(x, c("a", "b"), extra = "drop", fill = "right")
    ##      a    b
    ## 1    a <NA>
    ## 2    a    b
    ## 3    a    b
    ## 4 <NA> <NA>
    

    merge多余的,并从左侧开始填充

    df
    ##       x
    ## 1     a
    ## 2   a b
    ## 3 a b c
    ## 4  <NA>
    df %>% separate(x, c("a", "b"), extra = "merge", fill = "left")
    ##      a    b
    ## 1 <NA>    a
    ## 2    a    b
    ## 3    a  b c
    ## 4 <NA> <NA>
    

    同上

    df <- data.frame(x = c("x: 123", "y: error: 7"))
    df
    ##             x
    ## 1      x: 123
    ## 2 y: error: 7
    df %>% separate(x, c("key", "value"), ": ", extra = "merge")
    ##   key    value
    ## 1   x      123
    ## 2   y error: 7
    

    相关阅读:

    tidyr总结篇(本文)
    dplyr总结篇
    R语言中的连接dplyr中的join系列与merge函数
    R语言中创建函数参数的问题
    R语言with/within函数添加数据框到环境变量
    R语言简单for循环(二)
    R语言for循环01-批量完成相关系数计算
    R语言-相关系数计算(一)
    认真聊一聊R语言中的paste/paste0函数
    R语言aggregate函数

    今天的内容就到这里,我是白介素2,下期再见。

    转载请注明出处。

    相关文章

      网友评论

        本文标题:tidyr总结篇-续

        本文链接:https://www.haomeiwen.com/subject/achihctx.html