整齐的数据特性:
- 每一列都是一个变量
- 每一行都是一个观测值
tidyr 四大常用函数
- gather() 使“宽”数据变成长数据
- spread() 使“长”数据变成宽数据
- separate() 将单个列拆分为多个列
- unite() 将多个列组合成一个列
导入数据
library("lubridate")
ec2 <- ggplot2::economics %>% tbl_df() %>%
transmute(year = year(date),
month = month(date), rate = uempmed) %>%
filter(year > 2005) %>% spread(year, rate)
image.png
1. gather()
gather 函数主要四个参数
- data :数据集
- key :列明
- value :原来值的新的列名
- ...: 需要聚集的变量,删除前面加-
gather(ec2, key=year, value=unemp, `2006`:`2015`)
image.png
# convert=TRUE, 把year 变量的字符串转成数值型,
# na.rm 删除缺失值
economics_2 <- gather(ec2, year,rate,`2006`:`2015`, convert = TRUE, na.rm = TRUE)
2. spread
- 是gather 函数的逆运算
weather <- dplyr::tibble(
day = rep(1:3,2),
obs = rep(c("temp","rain"),each = 3),
val = c(c(23,22,20),c(0,0,5))
)
image.png
spread(weather, key = obs, value = val)
image.png
3. separate
trt <- dplyr::tibble( var = paste0(rep(c("beg", "end"),
each=3),"_",rep(c("a","b","c"))),
val = c(1,4,2,10,5,11)
)
separate(trt,var,c("time","treatment"),"_")
image.png
4. unite
a <- separate(trt,var,c("time","treatment"),"_")
a %>% unite(var,time,treatment,sep="_")
image.png
网友评论