美文网首页
R | data.table

R | data.table

作者: shwzhao | 来源:发表于2022-07-27 21:15 被阅读0次

“data.table 高度抽象的语法无疑增加了学习成本,但它的高效性能和处理大数据能力,使得非常有必要学习它。当然,读者如果既想要 data.table 的高性能,又想要 tidyverse 的整洁语法,也可以借助一些衔接二者的中间包,如 dtplyr, tidyfst 等。”


创建

  • data.table():
  • as.data.table()

读取

  • fread("file.csv")
    select = c("a", "b")): 读取指定的列

写出

  • fwrite(dt, "file.csv")

行操作

  • dt[1:2,]
  • dt[a > 5,]
  • dt[, c := 1:.N, by = b]
  • dt[, c := shift(a, 1), by = b]
  • dt[, c := shift(a, 1, type = "lead"), by = b]

><
>= <=
is.na()
!is.na()
%in%
|&!
%like%
%between%

列操作

  • dt[, c(2)]
  • dt[, .(b, c)]
  • dt[, .(x = sum(a))]
  • dt[, c := 1+2]
  • dt[,`:=`(c = 1, d = 2)]
  • dt[, c := NULL]
  • dt[, b := as.integer(b)]
  • dt[, lapply(.SD, mean). SDcols = c("a", "b")]
  • cols <- c("a")
    dt[, paste0(cols , "_m") := lapply(.SD, mean)]

分组

  • dt[, j, by = .(a)]
  • dt[, j, keyby = .(a)]
  • dt[, .(c = sum(b)), by = a)]
  • dt[, c := sum(b), by = a]
  • dt[, .SD[1], by = a]
  • dt[, .SD[.N], by = a]
  • dt[...][...]

函数

  • setorder(dt, a, -b)
  • unique[dt, by = c("a", "b")]: 去重
  • uniqueN(dt, by = c("a", "b")): 计数
  • setnames(dt, c("a", "b"), c("x", "y")): 重命名
  • setkey(dt, a, b)

data.table中 以set为前缀的函数和操作符:=不需要<-就可以改变数据。
例如,setDT(df) 等同于 df < - as.data.table(df)

合并

  • dt_a[dt_b, on = .(b = y)]
  • dt_a[dt_b, on = .(b = y, c > z)]
  • rbind(dt_a, dt_b)
  • cbind(dt_a, dt_b)

重塑

  • dcast(): 长变宽
  • melt(): 宽变长
dcast(dt,
      id - y,
      value.var = c("a", "b"))

melt(dt,
     id.vars = c("id"),
     measure.vars = patterns("^a", "^b"),
     variable.name = "y",
     value.name = c("a", "b"))

相关文章

网友评论

      本文标题:R | data.table

      本文链接:https://www.haomeiwen.com/subject/frgzirtx.html