美文网首页
R | data.table

R | data.table

作者: shwzhao | 来源:发表于2022-07-27 21:15 被阅读0次

    “data.table 高度抽象的语法无疑增加了学习成本,但它的高效性能和处理大数据能力,使得非常有必要学习它。当然,读者如果既想要 data.table 的高性能,又想要 tidyverse 的整洁语法,也可以借助一些衔接二者的中间包,如 dtplyr, tidyfst 等。”


    创建

    • data.table():
    • as.data.table()

    读取

    • fread("file.csv")
      select = c("a", "b")): 读取指定的列

    写出

    • fwrite(dt, "file.csv")

    行操作

    • dt[1:2,]
    • dt[a > 5,]
    • dt[, c := 1:.N, by = b]
    • dt[, c := shift(a, 1), by = b]
    • dt[, c := shift(a, 1, type = "lead"), by = b]

    ><
    >= <=
    is.na()
    !is.na()
    %in%
    |&!
    %like%
    %between%

    列操作

    • dt[, c(2)]
    • dt[, .(b, c)]
    • dt[, .(x = sum(a))]
    • dt[, c := 1+2]
    • dt[,`:=`(c = 1, d = 2)]
    • dt[, c := NULL]
    • dt[, b := as.integer(b)]
    • dt[, lapply(.SD, mean). SDcols = c("a", "b")]
    • cols <- c("a")
      dt[, paste0(cols , "_m") := lapply(.SD, mean)]

    分组

    • dt[, j, by = .(a)]
    • dt[, j, keyby = .(a)]
    • dt[, .(c = sum(b)), by = a)]
    • dt[, c := sum(b), by = a]
    • dt[, .SD[1], by = a]
    • dt[, .SD[.N], by = a]
    • dt[...][...]

    函数

    • setorder(dt, a, -b)
    • unique[dt, by = c("a", "b")]: 去重
    • uniqueN(dt, by = c("a", "b")): 计数
    • setnames(dt, c("a", "b"), c("x", "y")): 重命名
    • setkey(dt, a, b)

    data.table中 以set为前缀的函数和操作符:=不需要<-就可以改变数据。
    例如,setDT(df) 等同于 df < - as.data.table(df)

    合并

    • dt_a[dt_b, on = .(b = y)]
    • dt_a[dt_b, on = .(b = y, c > z)]
    • rbind(dt_a, dt_b)
    • cbind(dt_a, dt_b)

    重塑

    • dcast(): 长变宽
    • melt(): 宽变长
    dcast(dt,
          id - y,
          value.var = c("a", "b"))
    
    melt(dt,
         id.vars = c("id"),
         measure.vars = patterns("^a", "^b"),
         variable.name = "y",
         value.name = c("a", "b"))
    

    相关文章

      网友评论

          本文标题:R | data.table

          本文链接:https://www.haomeiwen.com/subject/frgzirtx.html