Reduce multiple values down to a single value
概括的含义 :用统计量来描述原变量
summary functions to create summary statistics
创建1个或者多个标量来概括tbl中的变量。如果tbl事先被分组,则返回多行,未分组时返回一行。
普通summarise()函数指定需要概括的变量,形如:
library(dplyr)
summarise(mtcars, avg = mean(mpg)) # 指定用mpg均值概括mpg变量
此外,summarise()的变体有下面3种,
- summarise_all()概括所有列
summarise_all(mtcars, mean) # 对mtcars的所有列进行均值计算
- summarise_at()使用变量名称指定列进行概括
starwars %>%
summarise_at(c("height", "mass"), mean, na.rm = TRUE)
summarise_if()使用条件检验,满足条件的进行概括
starwars %>%
summarise_if(is.numeric, mean, na.rm = TRUE)
pipes operator: %>%
x %>% f(y) 相当于f(x, y)
pipes example
the_data <- read.csv('file.csv') %>% subset(variable_a > x)
相当于:
data <- read.csv('file.csv')
the_data <- subset(data, variable_a > x)
优点:简化代码,避免无用的中间变量的创建;
缺点:影响代码可读性
summarise未分组情形
# A summary applied to ungrouped tbl returns a single row
library(dplyr)
mtcars %>%
summarise(mean = mean(disp), n = n())
summarise分组情形
library(dplyr)
mtcars %>%
group_by(cyl) %>% # 按照cyl分组
summarise(mean = mean(disp), n = n())
网友评论