美文网首页
dplyr包的使用

dplyr包的使用

作者: 生信成长日记 | 来源:发表于2022-01-20 12:19 被阅读0次

dplyr 定义了数据处理的规范语法,其中主要包含以下10个主要的函数。

  • mutate
  • select
  • rename
  • filter
  • summarise
  • group_by
  • arrange
  • left_join
  • right_join
  • full_join

select选择列

select(df, 1) selects the first column
select(df, last_col()) selects the last column
select(df, c(a, b, c)) selects columns a, b, and c
select(df, starts_with("a")) selects all columns whose name starts with “a”
select(df, ends_with("z")) selects all columns whose name ends with “z”
select(df, where(is.numeric)) selects all numeric columns
vars <- c("mpg", "vs")
mtcars %>% select(all_of(vars))
mtcars %>% select(any_of(vars))
mtcars %>% select(!all_of(vars))

mutate新增一列

mutate(df, 
       extra1 = c(2, 5, 9, 8, 5, 6),
       extra2 = c(1, 2, 3, 3, 2, 1),
       extra3 = c(8)
       ) 

rename修改列名

df %>% 
  select(name, type, total) %>% 
  rename(total_score = total)

filter筛选

df %>% filter(type == "english", score >= 75)

summarise统计汇总

df %>% summarise(
  mean_score   = mean(score),
  median_score = median(score),
  n            = n(),
  sum          = sum(score)
)

group_by分组统计

实际运用中,summarise()函数往往配合group_by()一起使用,即,先分组再统计。

df %>%
  group_by(name) %>%
  summarise(
    mean_score = mean(total),
    sd_score   = sd(total)
  )

arrange排序

df %>% 
  arrange(type, desc(total))

left_join左联结

left_join(df1, df2, by = "name")

right_join右联结

df1 %>% dplyr::right_join(df2, by = "name")

full_join满联结

df1 %>% dplyr::right_join(df2, by = "name")

inner_join内联结

df1 %>% inner_join(df2, by = "name")

semi_join

df1 %>% semi_join(df2, by = "name")

anti_join

df1 %>% anti_join(df2, by = "name")

across

across(.cols = everything(), .fns = NULL, ..., .names = NULL)
  • 用在mutate和summarise函数里面
  • 对多列执行相同的函数操作,返回数据框
penguins %>%
  summarize(
     across(c(bill_depth_mm, bill_length_mm, flipper_length_mm), mean)
 )

相关文章

网友评论

      本文标题:dplyr包的使用

      本文链接:https://www.haomeiwen.com/subject/cxxdhrtx.html