美文网首页
[15] 《R数据科学》使用arrange()排列行

[15] 《R数据科学》使用arrange()排列行

作者: 灰常不错 | 来源:发表于2020-11-02 22:46 被阅读0次

    arrange()函数的工作方式与filter()函数十分相似,但前者不是选择行,而是改变行的顺序。它接受一个数据框和一组作为排序依据的列名作为参数。

    文章摘要

    1. 依次按行排序
    2. 使用desc()按行降序
    3. 缺失值排序规则

    依次按行排序

    如果列名不止一个,那么就使用后面的列在前面排序的基础上进行排序:

    arrange(flights,year,month,day)
    # A tibble: 336,776 x 19
        year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
       <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
     1  2013     1     1      517            515         2      830            819        11
     2  2013     1     1      533            529         4      850            830        20
     3  2013     1     1      542            540         2      923            850        33
     4  2013     1     1      544            545        -1     1004           1022       -18
     5  2013     1     1      554            600        -6      812            837       -25
     6  2013     1     1      554            558        -4      740            728        12
     7  2013     1     1      555            600        -5      913            854        19
     8  2013     1     1      557            600        -3      709            723       -14
     9  2013     1     1      557            600        -3      838            846        -8
    10  2013     1     1      558            600        -2      753            745         8
    # ... with 336,766 more rows, and 10 more variables: carrier <chr>, flight <int>,
    #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
    #   minute <dbl>, time_hour <dttm>
    

    使用desc()按行降序

    arrange(flights,desc(arr_delay))
    # A tibble: 336,776 x 19
        year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
       <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
     1  2013     1     9      641            900      1301     1242           1530      1272
     2  2013     6    15     1432           1935      1137     1607           2120      1127
     3  2013     1    10     1121           1635      1126     1239           1810      1109
     4  2013     9    20     1139           1845      1014     1457           2210      1007
     5  2013     7    22      845           1600      1005     1044           1815       989
     6  2013     4    10     1100           1900       960     1342           2211       931
     7  2013     3    17     2321            810       911      135           1020       915
     8  2013     7    22     2257            759       898      121           1026       895
     9  2013    12     5      756           1700       896     1058           2020       878
    10  2013     5     3     1133           2055       878     1250           2215       875
    # ... with 336,766 more rows, and 10 more variables: carrier <chr>, flight <int>,
    #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
    #   minute <dbl>, time_hour <dttm>
    

    缺失值排序规则

    缺失值总排在最后:

    df <- tibble(x=c(5,2,NA))
    arrange(df,x)
    # A tibble: 3 x 1
          x
      <dbl>
    1     2
    2     5
    3    NA
    
    arrange(df,desc(x))
    # A tibble: 3 x 1
          x
      <dbl>
    1     5
    2     2
    3    NA
    

    练习

    (1)如何使用arrange()将缺失值排在最前面?

    arrange(flights, desc(is.na(dep_time)))
    # A tibble: 336,776 x 19
        year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
       <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
     1  2013     1     1       NA           1630        NA       NA           1815        NA
     2  2013     1     1       NA           1935        NA       NA           2240        NA
     3  2013     1     1       NA           1500        NA       NA           1825        NA
     4  2013     1     1       NA            600        NA       NA            901        NA
     5  2013     1     2       NA           1540        NA       NA           1747        NA
     6  2013     1     2       NA           1620        NA       NA           1746        NA
     7  2013     1     2       NA           1355        NA       NA           1459        NA
     8  2013     1     2       NA           1420        NA       NA           1644        NA
     9  2013     1     2       NA           1321        NA       NA           1536        NA
    10  2013     1     2       NA           1545        NA       NA           1910        NA
    # ... with 336,766 more rows, and 10 more variables: carrier <chr>, flight <int>,
    #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
    #   minute <dbl>, time_hour <dttm>
    

    (2)对flights排序以找出延误时间最长的航班。找出出发时间最早的航班。

    head(arrange(flights, desc(dep_delay)), 1)
    # A tibble: 1 x 19
       year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
      <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
    1  2013     1     9      641            900      1301     1242           1530      1272
    
    head(arrange(flights, dep_delay), 1)
    # A tibble: 1 x 19
       year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
      <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
    1  2013    12     7     2040           2123       -43       40           2352        48
    

    (3)对flight排序以找出速度最快的航班。

    head(arrange(flights, desc(distance / air_time)), 1)
    # A tibble: 1 x 19
       year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
      <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
    1  2013     5    25     1709           1700         9     1923           1937       -14
    

    (4)哪个航班的飞行时间最长?哪个最短?

    head(arrange(flights, desc(air_time)), 1)
    # A tibble: 1 x 19
       year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
      <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
    1  2013     3    17     1337           1335         2     1937           1836        61
    head(arrange(flights, air_time), 1)
    # A tibble: 1 x 19
       year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay
      <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>     <dbl>
    1  2013     1    16     1355           1315        40     1442           1411        31
    

    相关文章

      网友评论

          本文标题:[15] 《R数据科学》使用arrange()排列行

          本文链接:https://www.haomeiwen.com/subject/gtkvvktx.html