美文网首页R语言学习R语言R
R语言学习笔记(14)-常用包

R语言学习笔记(14)-常用包

作者: Akuooo | 来源:发表于2021-01-27 15:48 被阅读0次

    reshape包;tidyr,dplyr包;链式操作符%>%

    参考:
    https://www.bilibili.com/video/BV19x411X7C6?p=38

    一、reshape2包对数据格式进行转换

    > x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5),
    +                 data = 1:5)
    > y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5),
    +                 data = 1:5)
    > x
      k1 k2 data
    1 NA  1    1
    2 NA NA    2
    3  3 NA    3
    4  4  4    4
    5  5  5    5
    > y
      k1 k2 data
    1 NA NA    1
    2  2 NA    2
    3 NA  3    3
    4  4  4    4
    5  5  5    5
    

    如上两个数据框,无法直接用
    rbind和cbind进行合并,会乱

    1. merge()处理
      可以根据一个或多个公有的向量进行合并
    #根据k1进行合并
    > merge(x,y,by = "k1")
      k1 k2.x data.x k2.y data.y
    1  4    4      4    4      4
    2  5    5      5    5      5
    3 NA    1      1   NA      1
    4 NA    1      1    3      3
    5 NA   NA      2   NA      1
    6 NA   NA      2    3      3
    #incomparable = T,表示丢掉NA
    >> merge(x, y, by = "k2",incomparables = NA)
    > merge(x, y, by = "k2",incomparables = NA)
      k2 k1.x data.x k1.y data.y
    1  4    4      4    4      4
    2  5    5      5    5      5
    #根据k1,k2进行合并
    > merge(x, y, by = c("k1","k2"))
      k1 k2 data.x data.y
    1  4  4      4      4
    2  5  5      5      5
    3 NA NA      2      1
    
    1. reshape包
      reshape包重构整个数据的万能数据包,最新的是reshape2
    > install.packages("reshape2")
    > library(reshape2)
    #与Excel中数据透视表功能类似
    > help(package = "reshape2")
    
    help_reshape2.png

    (1)melt对宽数据进行处理,得到长数据

    > head(airquality)
      Ozone Solar.R Wind Temp Month Day
    1    41     190  7.4   67     5   1
    2    36     118  8.0   72     5   2
    3    12     149 12.6   74     5   3
    4    18     313 11.5   62     5   4
    5    NA      NA 14.3   56     5   5
    6    28      NA 14.9   66     5   6
    #将列名首字母改为小写
    > names(airquality) <- tolower(names(airquality))
    > head(airquality)
      ozone solar.r wind temp month day
    1    41     190  7.4   67     5   1
    2    36     118  8.0   72     5   2
    3    12     149 12.6   74     5   3
    4    18     313 11.5   62     5   4
    5    NA      NA 14.3   56     5   5
    6    28      NA 14.9   66     5   6
    #使用melt函数处理数据
    > melt(airquality)
    > aql <- melt(airquality)
    #融合后,每一行都是标识符和变量的组合,不能有重复项。数据变为三列。
    #其中variable是因子类型
    #这也就是宽数据变为长数据的过程
    > head(aql)
      variable value
    1    ozone    41
    2    ozone    36
    3    ozone    12
    4    ozone    18
    5    ozone    NA
    6    ozone    28
    > head(aql,50)
       variable value
    1     ozone    41
    2     ozone    36
    3     ozone    12
    4     ozone    18
    5     ozone    NA
    6     ozone    28
    7     ozone    23
    8     ozone    19
    9     ozone     8
    10    ozone    NA
    11    ozone     7
    12    ozone    16
    13    ozone    11
    14    ozone    14
    15    ozone    18
    16    ozone    14
    17    ozone    34
    18    ozone     6
    19    ozone    30
    20    ozone    11
    21    ozone     1
    22    ozone    11
    23    ozone     4
    24    ozone    32
    25    ozone    NA
    26    ozone    NA
    27    ozone    NA
    28    ozone    23
    29    ozone    45
    30    ozone   115
    31    ozone    37
    32    ozone    NA
    33    ozone    NA
    34    ozone    NA
    35    ozone    NA
    36    ozone    NA
    37    ozone    NA
    38    ozone    29
    39    ozone    NA
    40    ozone    71
    41    ozone    39
    42    ozone    NA
    43    ozone    NA
    44    ozone    23
    45    ozone    NA
    46    ozone    NA
    47    ozone    21
    48    ozone    37
    49    ozone    20
    50    ozone    12
    #需要设置,month和day是用来当做ID,其余四个作为变量值。
    #ID就是用来区分不同行数之间的变量
    #重要!需要区分哪部分作为行的观测值,哪部分作为列的观测值
    > aql <- melt(airquality,id.vars = c("month","day"))
    > head(aql,50)
       month day variable value
    1      5   1    ozone    41
    2      5   2    ozone    36
    3      5   3    ozone    12
    4      5   4    ozone    18
    5      5   5    ozone    NA
    6      5   6    ozone    28
    7      5   7    ozone    23
    8      5   8    ozone    19
    9      5   9    ozone     8
    10     5  10    ozone    NA
    11     5  11    ozone     7
    12     5  12    ozone    16
    13     5  13    ozone    11
    14     5  14    ozone    14
    15     5  15    ozone    18
    16     5  16    ozone    14
    17     5  17    ozone    34
    18     5  18    ozone     6
    19     5  19    ozone    30
    20     5  20    ozone    11
    21     5  21    ozone     1
    22     5  22    ozone    11
    23     5  23    ozone     4
    24     5  24    ozone    32
    25     5  25    ozone    NA
    26     5  26    ozone    NA
    27     5  27    ozone    NA
    28     5  28    ozone    23
    29     5  29    ozone    45
    30     5  30    ozone   115
    31     5  31    ozone    37
    32     6   1    ozone    NA
    33     6   2    ozone    NA
    34     6   3    ozone    NA
    35     6   4    ozone    NA
    36     6   5    ozone    NA
    37     6   6    ozone    NA
    38     6   7    ozone    29
    39     6   8    ozone    NA
    40     6   9    ozone    71
    41     6  10    ozone    39
    42     6  11    ozone    NA
    43     6  12    ozone    NA
    44     6  13    ozone    23
    45     6  14    ozone    NA
    46     6  15    ozone    NA
    47     6  16    ozone    21
    48     6  17    ozone    37
    49     6  18    ozone    20
    50     6  19    ozone    12
    

    (2)cast将长数据变为宽数据
    reshape2将cast分为了几种
    ①dcast:处理数据框,读取melt的结果,根据提供的公式进行数据融合。

    cast.png
    参数:formula,融合后的数据格式。
    "~"在R找那个表示相关联,说明而这有关系,但不一定是相等
    #重录数据
    > aqw <- dcast(aql,month ~ variable, fun.aggregate = mean,na.rm = TRUE)
    > head(aqw)
      month    ozone  solar.r      wind     temp
    1     5 23.61538 181.2963 11.622581 65.54839
    2     6 29.44444 190.1667 10.266667 79.10000
    3     7 59.11538 216.4839  8.941935 83.90323
    4     8 59.96154 171.8571  8.793548 83.96774
    5     9 31.44828 167.4333 10.180000 76.90000
    #fun.aggregate也可以设置为sum等其他函数
    > aqw <- dcast(aql,month ~ variable, fun.aggregate = sum,na.rm = TRUE)
    > head(aqw)
      month ozone solar.r  wind temp
    1     5   614    4895 360.3 2032
    2     6   265    5705 308.0 2373
    3     7  1537    6711 277.2 2601
    4     8  1559    4812 272.6 2603
    5     9   912    5023 305.4 2307
    

    ②acast:返回向量,矩阵或数组

    二、tidyr&dplyr数据转换

    这两个包相对于reshape2,操作更加简便。
    安装:

    > install.packages(c("tidyr","dplyr"))
    > library(tidyr)
    > library(dplyr)
    
    1. tidyr包(Tidy Messy Data)

    Overview

    The goal of tidyr is to help you create tidy data. Tidy data is data where:

    1. Every column is variable.
    2. Every row is an observation.
    3. Every cell is a single value.
      Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. If you ensure that your data is tidy, you’ll spend less time fighting with the tools and more time working on your analysis. Learn more about tidy data in [vignette("tidy-data")](https://tidyr.tidyverse.org/articles/tidy-data.html)
      (1)gather(),将宽数据转化为长数据,类似reshape2的melt()
      (2)spread(),将长数据转化为宽数据,类似reshape2的cast()
      (3)unit(),将多列合并为一列
      (4)separate(),将一列分为多列
      以mtcars数据集作为演示
    #mtcars每一列是变量,每一行是观测值
    #取部分数据进行演示
    > tdata <- mtcars[1:10,1:3]
    #汽车名是以行名存在的,对数据进行处理,将行名添加到数据中
    > tdata <- data.frame(names = rownames(tdata),tdata)
                                  names  mpg cyl  disp
    Mazda RX4                 Mazda RX4 21.0   6 160.0
    Mazda RX4 Wag         Mazda RX4 Wag 21.0   6 160.0
    Datsun 710               Datsun 710 22.8   4 108.0
    Hornet 4 Drive       Hornet 4 Drive 21.4   6 258.0
    Hornet Sportabout Hornet Sportabout 18.7   8 360.0
    Valiant                     Valiant 18.1   6 225.0
    Duster 360               Duster 360 14.3   8 360.0
    Merc 240D                 Merc 240D 24.4   4 146.7
    Merc 230                   Merc 230 22.8   4 140.8
    Merc 280                   Merc 280 19.2   6 167.6
    

    (1)gather()函数
    优点:固定列不变,其他列进行转换


    gather().png
    > gather(tdata, key = "Key",value = "Value",cyl,disp,mpg)
    > gather(tdata, key = "Key",value = "Value",cyl,disp,mpg)
                   names  Key Value
    1          Mazda RX4  cyl   6.0
    2      Mazda RX4 Wag  cyl   6.0
    3         Datsun 710  cyl   4.0
    4     Hornet 4 Drive  cyl   6.0
    5  Hornet Sportabout  cyl   8.0
    6            Valiant  cyl   6.0
    7         Duster 360  cyl   8.0
    8          Merc 240D  cyl   4.0
    9           Merc 230  cyl   4.0
    10          Merc 280  cyl   6.0
    11         Mazda RX4 disp 160.0
    12     Mazda RX4 Wag disp 160.0
    13        Datsun 710 disp 108.0
    14    Hornet 4 Drive disp 258.0
    15 Hornet Sportabout disp 360.0
    16           Valiant disp 225.0
    17        Duster 360 disp 360.0
    18         Merc 240D disp 146.7
    19          Merc 230 disp 140.8
    20          Merc 280 disp 167.6
    21         Mazda RX4  mpg  21.0
    22     Mazda RX4 Wag  mpg  21.0
    23        Datsun 710  mpg  22.8
    24    Hornet 4 Drive  mpg  21.4
    25 Hornet Sportabout  mpg  18.7
    26           Valiant  mpg  18.1
    27        Duster 360  mpg  14.3
    28         Merc 240D  mpg  24.4
    29          Merc 230  mpg  22.8
    30          Merc 280  mpg  19.2
    # : 表示将某些列聚集到同一列中
    > gather(tdata, key = "Key",value = "Value",cyl:disp,mpg)
    # - 减去不需要的列
    > gather(tdata, key = "Key",value = "Value",cyl,-disp)
                   names  mpg  disp Key Value
    1          Mazda RX4 21.0 160.0 cyl     6
    2      Mazda RX4 Wag 21.0 160.0 cyl     6
    3         Datsun 710 22.8 108.0 cyl     4
    4     Hornet 4 Drive 21.4 258.0 cyl     6
    5  Hornet Sportabout 18.7 360.0 cyl     8
    6            Valiant 18.1 225.0 cyl     6
    7         Duster 360 14.3 360.0 cyl     8
    8          Merc 240D 24.4 146.7 cyl     4
    9           Merc 230 22.8 140.8 cyl     4
    10          Merc 280 19.2 167.6 cyl     6
    #若敲列的名字容易敲错,也可以敲编号
    > gdata <- gather(tdata, key = "Key",value = "Value",2:4)
    

    (2) spread()函数
    与gather相反


    spread.png
    #首先确定哪一列打散
    > spread(gdata, key = "Key",value = "Value")
                   names cyl  disp  mpg
    1         Datsun 710   4 108.0 22.8
    2         Duster 360   8 360.0 14.3
    3     Hornet 4 Drive   6 258.0 21.4
    4  Hornet Sportabout   8 360.0 18.7
    5          Mazda RX4   6 160.0 21.0
    6      Mazda RX4 Wag   6 160.0 21.0
    7           Merc 230   4 140.8 22.8
    8          Merc 240D   4 146.7 24.4
    9           Merc 280   6 167.6 19.2
    10           Valiant   6 225.0 18.1
    

    (3)separate()
    可以将一列拆分成多列


    separate.png
    > df <- data.frame(x = c(NA,"a.b","a.d","b.c"))
    > df
         x
    1 <NA>
    2  a.b
    3  a.d
    4  b.c
    #将这一列,按“.”分为两列
    > separate(df,col = x,into = c("A","B"))
         A    B
    1 <NA> <NA>
    2    a    b
    3    a    d
    4    b    c
    > df <- data.frame(x = c(NA,"a.b-c","a-d","b-c"))
    > separate(df,col = x,into = c("A","B"))
         A    B
    1 <NA> <NA>
    2    a    b
    3    a    d
    4    b    c
    Warning message:
    Expected 2 pieces. Additional pieces discarded in 1 rows [2]. 
    #可看出第一个还是按“.”分割,c的值被丢掉了
    #指定sep参数为连字符
    > separate(df,col = x,into = c("A","B"),sep = "-")
         A    B
    1 <NA> <NA>
    2  a.b    c
    3    a    d
    4    b    c
    

    (4)unite()
    将separate之后的数据连接起来


    unite.png
    > x <- separate(df,col = x,into = c("A","B"),sep = "-")
    > unite(x,col = "AB",A,B,sep = "-")
         AB
    1 NA-NA
    2 a.b-c
    3   a-d
    4   b-c
    
    1. dplyr包
      不仅可以对单个表格操作,还可以对双表格操作
      [1] "%>%"                   "across"                "add_count"            
      [4] "add_count_"            "add_row"               "add_rownames"         
      [7] "add_tally"             "add_tally_"            "all_equal"            
     [10] "all_of"                "all_vars"              "anti_join"            
     [13] "any_of"                "any_vars"              "arrange"              
     [16] "arrange_"              "arrange_all"           "arrange_at"           
     [19] "arrange_if"            "as.tbl"                "as_data_frame"        
     [22] "as_label"              "as_tibble"             "auto_copy"            
     [25] "band_instruments"      "band_instruments2"     "band_members"         
     [28] "bench_tbls"            "between"               "bind_cols"            
     [31] "bind_rows"             "c_across"              "case_when"            
     [34] "changes"               "check_dbplyr"          "coalesce"             
     [37] "collapse"              "collect"               "combine"              
     [40] "common_by"             "compare_tbls"          "compare_tbls2"        
     [43] "compute"               "contains"              "copy_to"              
     [46] "count"                 "count_"                "cumall"               
     [49] "cumany"                "cume_dist"             "cummean"              
     [52] "cur_column"            "cur_data"              "cur_data_all"         
     [55] "cur_group"             "cur_group_id"          "cur_group_rows"       
     [58] "current_vars"          "data_frame"            "data_frame_"          
     [61] "db_analyze"            "db_begin"              "db_commit"            
     [64] "db_create_index"       "db_create_indexes"     "db_create_table"      
     [67] "db_data_type"          "db_desc"               "db_drop_table"        
     [70] "db_explain"            "db_has_table"          "db_insert_into"       
     [73] "db_list_tables"        "db_query_fields"       "db_query_rows"        
     [76] "db_rollback"           "db_save_query"         "db_write_table"       
     [79] "dense_rank"            "desc"                  "dim_desc"             
     [82] "distinct"              "distinct_"             "distinct_all"         
     [85] "distinct_at"           "distinct_if"           "distinct_prepare"     
     [88] "do"                    "do_"                   "dplyr_col_modify"     
     [91] "dplyr_reconstruct"     "dplyr_row_slice"       "ends_with"            
     [94] "enexpr"                "enexprs"               "enquo"                
     [97] "enquos"                "ensym"                 "ensyms"               
    [100] "eval_tbls"             "eval_tbls2"            "everything"           
    [103] "explain"               "expr"                  "failwith"             
    [106] "filter"                "filter_"               "filter_all"           
    [109] "filter_at"             "filter_if"             "first"                
    [112] "frame_data"            "full_join"             "funs"                 
    [115] "funs_"                 "glimpse"               "group_by"             
    [118] "group_by_"             "group_by_all"          "group_by_at"          
    [121] "group_by_drop_default" "group_by_if"           "group_by_prepare"     
    [124] "group_cols"            "group_data"            "group_indices"        
    [127] "group_indices_"        "group_keys"            "group_map"            
    [130] "group_modify"          "group_nest"            "group_rows"           
    [133] "group_size"            "group_split"           "group_trim"           
    [136] "group_vars"            "group_walk"            "grouped_df"           
    [139] "groups"                "id"                    "ident"                
    [142] "if_else"               "inner_join"            "intersect"            
    [145] "is.grouped_df"         "is.src"                "is.tbl"               
    [148] "is_grouped_df"         "lag"                   "last"                 
    [151] "last_col"              "lead"                  "left_join"            
    [154] "location"              "lst"                   "lst_"                 
    [157] "make_tbl"              "matches"               "min_rank"             
    [160] "mutate"                "mutate_"               "mutate_all"           
    [163] "mutate_at"             "mutate_each"           "mutate_each_"         
    [166] "mutate_if"             "n"                     "n_distinct"           
    [169] "n_groups"              "na_if"                 "near"                 
    [172] "nest_by"               "nest_join"             "new_grouped_df"       
    [175] "nth"                   "ntile"                 "num_range"            
    [178] "one_of"                "order_by"              "percent_rank"         
    [181] "progress_estimated"    "pull"                  "quo"                  
    [184] "quo_name"              "quos"                  "recode"               
    [187] "recode_factor"         "relocate"              "rename"               
    [190] "rename_"               "rename_all"            "rename_at"            
    [193] "rename_if"             "rename_vars"           "rename_vars_"         
    [196] "rename_with"           "right_join"            "row_number"           
    [199] "rows_delete"           "rows_insert"           "rows_patch"           
    [202] "rows_update"           "rows_upsert"           "rowwise"              
    [205] "same_src"              "sample_frac"           "sample_n"             
    [208] "select"                "select_"               "select_all"           
    [211] "select_at"             "select_if"             "select_var"           
    [214] "select_vars"           "select_vars_"          "semi_join"            
    [217] "setdiff"               "setequal"              "show_query"           
    [220] "slice"                 "slice_"                "slice_head"           
    [223] "slice_max"             "slice_min"             "slice_sample"         
    [226] "slice_tail"            "sql"                   "sql_escape_ident"     
    [229] "sql_escape_string"     "sql_join"              "sql_select"           
    [232] "sql_semi_join"         "sql_set_op"            "sql_subquery"         
    [235] "sql_translate_env"     "src"                   "src_df"               
    [238] "src_local"             "src_mysql"             "src_postgres"         
    [241] "src_sqlite"            "src_tbls"              "starts_with"          
    [244] "starwars"              "storms"                "summarise"            
    [247] "summarise_"            "summarise_all"         "summarise_at"         
    [250] "summarise_each"        "summarise_each_"       "summarise_if"         
    [253] "summarize"             "summarize_"            "summarize_all"        
    [256] "summarize_at"          "summarize_each"        "summarize_each_"      
    [259] "summarize_if"          "sym"                   "syms"                 
    [262] "tally"                 "tally_"                "tbl"                  
    [265] "tbl_df"                "tbl_nongroup_vars"     "tbl_ptype"            
    [268] "tbl_sum"               "tbl_vars"              "tibble"               
    [271] "top_frac"              "top_n"                 "transmute"            
    [274] "transmute_"            "transmute_all"         "transmute_at"         
    [277] "transmute_if"          "tribble"               "trunc_mat"            
    [280] "type_sum"              "ungroup"               "union"                
    [283] "union_all"             "validate_grouped_df"   "vars"                 
    [286] "with_groups"           "with_order"            "wrap_dbplyr_obj"   
    ##一共有288个函数
    

    (1)对单个表格进行操作
    ①过滤filter

    #将长度小于等于7的数据过滤掉
    > dplyr::filter(iris,Sepal.Length>7)
       Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
    1           7.1         3.0          5.9         2.1 virginica
    2           7.6         3.0          6.6         2.1 virginica
    3           7.3         2.9          6.3         1.8 virginica
    4           7.2         3.6          6.1         2.5 virginica
    5           7.7         3.8          6.7         2.2 virginica
    6           7.7         2.6          6.9         2.3 virginica
    7           7.7         2.8          6.7         2.0 virginica
    8           7.2         3.2          6.0         1.8 virginica
    9           7.2         3.0          5.8         1.6 virginica
    10          7.4         2.8          6.1         1.9 virginica
    11          7.9         3.8          6.4         2.0 virginica
    12          7.7         3.0          6.1         2.3 virginica
    

    由于dplyr的函数太多,所以一般运用函数时,是dplyr::函数名,这样就不会出现歧义
    ②去除重复行distinct()
    类似unique

    > dplyr::distinct(rbind(iris[1:10,],iris[1:15,]))
       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1           5.1         3.5          1.4         0.2  setosa
    2           4.9         3.0          1.4         0.2  setosa
    3           4.7         3.2          1.3         0.2  setosa
    4           4.6         3.1          1.5         0.2  setosa
    5           5.0         3.6          1.4         0.2  setosa
    6           5.4         3.9          1.7         0.4  setosa
    7           4.6         3.4          1.4         0.3  setosa
    8           5.0         3.4          1.5         0.2  setosa
    9           4.4         2.9          1.4         0.2  setosa
    10          4.9         3.1          1.5         0.1  setosa
    11          5.4         3.7          1.5         0.2  setosa
    12          4.8         3.4          1.6         0.2  setosa
    13          4.8         3.0          1.4         0.1  setosa
    14          4.3         3.0          1.1         0.1  setosa
    15          5.8         4.0          1.2         0.2  setosa
    

    ③切片slice()
    用于取出数据的任意行

    > dplyr::slice(iris,10:15)
      Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    1          4.9         3.1          1.5         0.1  setosa
    2          5.4         3.7          1.5         0.2  setosa
    3          4.8         3.4          1.6         0.2  setosa
    4          4.8         3.0          1.4         0.1  setosa
    5          4.3         3.0          1.1         0.1  setosa
    6          5.8         4.0          1.2         0.2  setosa
    

    ④随机取样sample_n()

    > dplyr::sample_n(iris,10)
       Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    1           6.9         3.1          4.9         1.5 versicolor
    2           5.4         3.9          1.7         0.4     setosa
    3           5.9         3.0          4.2         1.5 versicolor
    4           6.3         2.9          5.6         1.8  virginica
    5           5.1         3.5          1.4         0.2     setosa
    6           6.1         2.9          4.7         1.4 versicolor
    7           5.5         3.5          1.3         0.2     setosa
    8           5.7         2.9          4.2         1.3 versicolor
    9           5.8         2.8          5.1         2.4  virginica
    10          4.8         3.4          1.6         0.2     setosa
    

    ⑤按比例随机选取sample_frac()

    > dplyr::sample_frac(iris,0.1)
       Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    1           6.7         3.3          5.7         2.1  virginica
    2           5.5         2.4          3.8         1.1 versicolor
    3           5.1         3.5          1.4         0.2     setosa
    4           4.9         2.4          3.3         1.0 versicolor
    5           5.5         3.5          1.3         0.2     setosa
    6           4.8         3.0          1.4         0.3     setosa
    7           5.6         3.0          4.1         1.3 versicolor
    8           6.7         3.3          5.7         2.5  virginica
    9           4.8         3.1          1.6         0.2     setosa
    10          6.1         2.6          5.6         1.4  virginica
    11          6.0         2.9          4.5         1.5 versicolor
    12          4.6         3.6          1.0         0.2     setosa
    13          4.4         3.2          1.3         0.2     setosa
    14          4.4         3.0          1.3         0.2     setosa
    15          6.5         3.0          5.8         2.2  virginica
    

    ⑥排序arrange()

    #按花的长度进行排序
    > dplyr::arrange(iris,Sepal.Length)
    #按相反方向进行排序
    > dplyr::arrange(iris,desc(Sepal.Length))
    

    ⑦取子集select()

    #理解select功能
    > ?select
    

    ⑧统计函数summarise()

    > summarise(iris,avg = mean(Sepal.Length))
           avg
    1 5.843333
    #还可将mean换成sum,计算总长度
    

    链式操作符%>%
    用于实现将一个函数的输出传递给下一个函数,作为下一个函数的输入,相当于是管道函数
    在Rstudio中可以使用Ctrl+shift+M快捷键输出

    > head(mtcars,20)
                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    #取出数据第十一到第十二行
    > head(mtcars,20) %>% tail(10)
                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    

    ⑨分组group_by

    > dplyr::group_by(iris,Species)
    # A tibble: 150 x 5
    # Groups:   Species [3]
       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
              <dbl>       <dbl>        <dbl>       <dbl> <fct>  
     1          5.1         3.5          1.4         0.2 setosa 
     2          4.9         3            1.4         0.2 setosa 
     3          4.7         3.2          1.3         0.2 setosa 
     4          4.6         3.1          1.5         0.2 setosa 
     5          5           3.6          1.4         0.2 setosa 
     6          5.4         3.9          1.7         0.4 setosa 
     7          4.6         3.4          1.4         0.3 setosa 
     8          5           3.4          1.5         0.2 setosa 
     9          4.4         2.9          1.4         0.2 setosa 
    10          4.9         3.1          1.5         0.1 setosa 
    # ... with 140 more rows
    > iris %>% group_by(Species)
    # A tibble: 150 x 5
    # Groups:   Species [3]
       Sepal.Length Sepal.Width Petal.Length Petal.Width Species
              <dbl>       <dbl>        <dbl>       <dbl> <fct>  
     1          5.1         3.5          1.4         0.2 setosa 
     2          4.9         3            1.4         0.2 setosa 
     3          4.7         3.2          1.3         0.2 setosa 
     4          4.6         3.1          1.5         0.2 setosa 
     5          5           3.6          1.4         0.2 setosa 
     6          5.4         3.9          1.7         0.4 setosa 
     7          4.6         3.4          1.4         0.3 setosa 
     8          5           3.4          1.5         0.2 setosa 
     9          4.4         2.9          1.4         0.2 setosa 
    10          4.9         3.1          1.5         0.1 setosa 
    # ... with 140 more rows
    ##再计算一下平均值
    > iris %>% group_by(Species) %>% summarise(avg=mean(Sepal.Width))
    # A tibble: 3 x 2
      Species      avg
    * <fct>      <dbl>
    1 setosa      3.43
    2 versicolor  2.77
    3 virginica   2.97
    ##再按结果大小进行排序
    > iris %>% group_by(Species) %>% summarise(avg=mean(Sepal.Width)) %>% arrange(avg)
    # A tibble: 3 x 2
      Species      avg
      <fct>      <dbl>
    1 versicolor  2.77
    2 virginica   2.97
    3 setosa      3.43
    

    ⑩添加变量mutate()

    > dplyr::mutate(iris,new = Sepal.Length+Petal.Length)
        Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  new
    1            5.1         3.5          1.4         0.2     setosa  6.5
    2            4.9         3.0          1.4         0.2     setosa  6.3
    3            4.7         3.2          1.3         0.2     setosa  6.0
    4
    ……
    

    (2)对双表格操作
    主要是如何将两个表格进行整合
    多种方式:左连接,右连接,内连接,全连接,半连接,反连接等

    #创建两个数据框
    > a=data.frame(x1=c("A","B","C"),x2=c(1,2,3))
    > b=data.frame(x1=c("A","B","D"),x3=c(T,F,T))
    > a
      x1 x2
    1  A  1
    2  B  2
    3  C  3
    > b
      x1    x3
    1  A  TRUE
    2  B FALSE
    3  D  TRUE
    

    ①左连接
    以左边的表为基础

    > dplyr::left_join(a,b,by="x1")
      x1 x2    x3
    1  A  1  TRUE
    2  B  2 FALSE
    3  C  3    NA
    

    ②右连接

    > dplyr::right_join(a,b,by="x1")
      x1 x2    x3
    1  A  1  TRUE
    2  B  2 FALSE
    3  D NA  TRUE
    

    ③内连接(取x1的交集)
    全连接(取x1的并集)

    > dplyr::full_join(a,b,by="x1")
      x1 x2    x3
    1  A  1  TRUE
    2  B  2 FALSE
    3  C  3    NA
    4  D NA  TRUE
    > dplyr::inner_join(a,b,by="x1")
      x1 x2    x3
    1  A  1  TRUE
    2  B  2 FALSE
    

    ④半连接(根据右侧表内容,对左侧表进行过滤,即将a中与b的交集取出来)
    反连接(根据右侧表内容过滤,但是是将a中与b的补集取出来)
    主要是进行集合的各种运算

    > dplyr::semi_join(a,b,by="x1")
      x1 x2
    1  A  1
    2  B  2
    > dplyr::anti_join(a,b,by="x1")
      x1 x2
    1  C  3
    

    数据集合并
    intrsect union secdev
    本质:集合运算
    以mtcars数据集作为演示

    #slice取出的数据不包含行名
    #使用mute()为数据集添加一行
    > mtcars <- mutate(mtcars,Model=rownames(mtcars))
    > first <- slice(mtcars,1:20)
    > second <- slice (mtcars,10:30)
    ##这两者之间有重合部分
    
    
    ##接下来测试这些函数
    #intersect()取交集
    > intersect(first, second)
                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb               Model
    Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4            Merc 280
    Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4           Merc 280C
    Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3          Merc 450SE
    Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3          Merc 450SL
    Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3         Merc 450SLC
    Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  Cadillac Fleetwood
    Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 Lincoln Continental
    Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4   Chrysler Imperial
    Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1            Fiat 128
    Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2         Honda Civic
    Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      Toyota Corolla
    
    #union_all取并集
    > union_all(first, second)
                              mpg cyl  disp  hp drat    wt  qsec vs am gear carb               Model
    Mazda RX4                21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4           Mazda RX4
    Mazda RX4 Wag            21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       Mazda RX4 Wag
    Datsun 710               22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1          Datsun 710
    Hornet 4 Drive           21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      Hornet 4 Drive
    Hornet Sportabout        18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2   Hornet Sportabout
    Valiant                  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1             Valiant
    Duster 360               14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4          Duster 360
    Merc 240D                24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2           Merc 240D
    Merc 230                 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2            Merc 230
    Merc 280...10            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4            Merc 280
    Merc 280C...11           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4           Merc 280C
    Merc 450SE...12          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3          Merc 450SE
    Merc 450SL...13          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3          Merc 450SL
    Merc 450SLC...14         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3         Merc 450SLC
    Cadillac Fleetwood...15  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  Cadillac Fleetwood
    Lincoln Continental...16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 Lincoln Continental
    Chrysler Imperial...17   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4   Chrysler Imperial
    Fiat 128...18            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1            Fiat 128
    Honda Civic...19         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2         Honda Civic
    Toyota Corolla...20      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      Toyota Corolla
    Merc 280...21            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4            Merc 280
    Merc 280C...22           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4           Merc 280C
    Merc 450SE...23          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3          Merc 450SE
    Merc 450SL...24          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3          Merc 450SL
    Merc 450SLC...25         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3         Merc 450SLC
    Cadillac Fleetwood...26  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  Cadillac Fleetwood
    Lincoln Continental...27 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 Lincoln Continental
    Chrysler Imperial...28   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4   Chrysler Imperial
    Fiat 128...29            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1            Fiat 128
    Honda Civic...30         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2         Honda Civic
    Toyota Corolla...31      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      Toyota Corolla
    Toyota Corona            21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1       Toyota Corona
    Dodge Challenger         15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2    Dodge Challenger
    AMC Javelin              15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2         AMC Javelin
    Camaro Z28               13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4          Camaro Z28
    Pontiac Firebird         19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2    Pontiac Firebird
    Fiat X1-9                27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1           Fiat X1-9
    Porsche 914-2            26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2       Porsche 914-2
    Lotus Europa             30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2        Lotus Europa
    Ford Pantera L           15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4      Ford Pantera L
    Ferrari Dino             19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6        Ferrari Dino
    #union取非冗余的并集
    > union(first, second)
                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb               Model
    Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4           Mazda RX4
    Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       Mazda RX4 Wag
    Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1          Datsun 710
    Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      Hornet 4 Drive
    Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2   Hornet Sportabout
    Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1             Valiant
    Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4          Duster 360
    Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2           Merc 240D
    Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2            Merc 230
    Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4            Merc 280
    Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4           Merc 280C
    Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3          Merc 450SE
    Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3          Merc 450SL
    Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3         Merc 450SLC
    Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  Cadillac Fleetwood
    Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 Lincoln Continental
    Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4   Chrysler Imperial
    Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1            Fiat 128
    Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2         Honda Civic
    Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      Toyota Corolla
    Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1       Toyota Corona
    Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2    Dodge Challenger
    AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2         AMC Javelin
    Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4          Camaro Z28
    Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2    Pontiac Firebird
    Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1           Fiat X1-9
    Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2       Porsche 914-2
    Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2        Lotus Europa
    Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4      Ford Pantera L
    Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6        Ferrari Dino
    
    #setdiff()取first补集
    > setdiff(first, second)
                       mpg cyl  disp  hp drat    wt  qsec vs am gear carb             Model
    Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4         Mazda RX4
    Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4     Mazda RX4 Wag
    Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1        Datsun 710
    Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1    Hornet 4 Drive
    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 Hornet Sportabout
    Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1           Valiant
    Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4        Duster 360
    Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2         Merc 240D
    Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2          Merc 230
    #取second的补集
    > setdiff(second, first) 
                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb            Model
    Toyota Corona    21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1    Toyota Corona
    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2 Dodge Challenger
    AMC Javelin      15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2      AMC Javelin
    Camaro Z28       13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4       Camaro Z28
    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2 Pontiac Firebird
    Fiat X1-9        27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1        Fiat X1-9
    Porsche 914-2    26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2    Porsche 914-2
    Lotus Europa     30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2     Lotus Europa
    Ford Pantera L   15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4   Ford Pantera L
    Ferrari Dino     19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6     Ferrari Dino
    

    相关文章

      网友评论

        本文标题:R语言学习笔记(14)-常用包

        本文链接:https://www.haomeiwen.com/subject/euenzktx.html