dplyr 定义了数据处理的规范语法,其中主要包含以下10个主要的函数。
- mutate
- select
- rename
- filter
- summarise
- group_by
- arrange
- left_join
- right_join
- full_join
select选择列
select(df, 1) selects the first column
select(df, last_col()) selects the last column
select(df, c(a, b, c)) selects columns a, b, and c
select(df, starts_with("a")) selects all columns whose name starts with “a”
select(df, ends_with("z")) selects all columns whose name ends with “z”
select(df, where(is.numeric)) selects all numeric columns
vars <- c("mpg", "vs")
mtcars %>% select(all_of(vars))
mtcars %>% select(any_of(vars))
mtcars %>% select(!all_of(vars))
mutate新增一列
mutate(df,
extra1 = c(2, 5, 9, 8, 5, 6),
extra2 = c(1, 2, 3, 3, 2, 1),
extra3 = c(8)
)
rename修改列名
df %>%
select(name, type, total) %>%
rename(total_score = total)
filter筛选
df %>% filter(type == "english", score >= 75)
summarise统计汇总
df %>% summarise(
mean_score = mean(score),
median_score = median(score),
n = n(),
sum = sum(score)
)
group_by分组统计
实际运用中,summarise()函数往往配合group_by()一起使用,即,先分组再统计。
df %>%
group_by(name) %>%
summarise(
mean_score = mean(total),
sd_score = sd(total)
)
arrange排序
df %>%
arrange(type, desc(total))
left_join左联结
left_join(df1, df2, by = "name")
right_join右联结
df1 %>% dplyr::right_join(df2, by = "name")
full_join满联结
df1 %>% dplyr::right_join(df2, by = "name")
inner_join内联结
df1 %>% inner_join(df2, by = "name")
semi_join
df1 %>% semi_join(df2, by = "name")
anti_join
df1 %>% anti_join(df2, by = "name")
across
across(.cols = everything(), .fns = NULL, ..., .names = NULL)
- 用在mutate和summarise函数里面
- 对多列执行相同的函数操作,返回数据框
penguins %>%
summarize(
across(c(bill_depth_mm, bill_length_mm, flipper_length_mm), mean)
)
网友评论