美文网首页
dplyr包的使用

dplyr包的使用

作者: 生信菜菜鸟 | 来源:发表于2022-01-20 12:19 被阅读0次

    dplyr 定义了数据处理的规范语法,其中主要包含以下10个主要的函数。

    • mutate
    • select
    • rename
    • filter
    • summarise
    • group_by
    • arrange
    • left_join
    • right_join
    • full_join

    select选择列

    select(df, 1) selects the first column
    select(df, last_col()) selects the last column
    select(df, c(a, b, c)) selects columns a, b, and c
    select(df, starts_with("a")) selects all columns whose name starts with “a”
    select(df, ends_with("z")) selects all columns whose name ends with “z”
    select(df, where(is.numeric)) selects all numeric columns
    vars <- c("mpg", "vs")
    mtcars %>% select(all_of(vars))
    mtcars %>% select(any_of(vars))
    mtcars %>% select(!all_of(vars))
    

    mutate新增一列

    mutate(df, 
           extra1 = c(2, 5, 9, 8, 5, 6),
           extra2 = c(1, 2, 3, 3, 2, 1),
           extra3 = c(8)
           ) 
    

    rename修改列名

    df %>% 
      select(name, type, total) %>% 
      rename(total_score = total)
    

    filter筛选

    df %>% filter(type == "english", score >= 75)
    

    summarise统计汇总

    df %>% summarise(
      mean_score   = mean(score),
      median_score = median(score),
      n            = n(),
      sum          = sum(score)
    )
    

    group_by分组统计

    实际运用中,summarise()函数往往配合group_by()一起使用,即,先分组再统计。

    df %>%
      group_by(name) %>%
      summarise(
        mean_score = mean(total),
        sd_score   = sd(total)
      )
    

    arrange排序

    df %>% 
      arrange(type, desc(total))
    

    left_join左联结

    left_join(df1, df2, by = "name")
    

    right_join右联结

    df1 %>% dplyr::right_join(df2, by = "name")
    

    full_join满联结

    df1 %>% dplyr::right_join(df2, by = "name")
    

    inner_join内联结

    df1 %>% inner_join(df2, by = "name")
    

    semi_join

    df1 %>% semi_join(df2, by = "name")
    

    anti_join

    df1 %>% anti_join(df2, by = "name")
    

    across

    across(.cols = everything(), .fns = NULL, ..., .names = NULL)
    
    • 用在mutate和summarise函数里面
    • 对多列执行相同的函数操作,返回数据框
    penguins %>%
      summarize(
         across(c(bill_depth_mm, bill_length_mm, flipper_length_mm), mean)
     )
    

    相关文章

      网友评论

          本文标题:dplyr包的使用

          本文链接:https://www.haomeiwen.com/subject/cxxdhrtx.html