两个dataframe,df1和df2。找出dataframe中的非重复行 (unique rows)。
df1
df1 <- data.frame(id=c(1:5), animal=c("cat", "dog", "parakeet", "lion", "duck"))
df1
id animal
1 1 cat
2 2 dog
3 3 parakeet
4 4 lion
5 5 duck
df2
df2 <- table1[c(1,3,5),]
df2
id animal
1 1 cat
3 3 parakeet
5 5 duck
方法1. anti_join
from dplyr
library(dplyr)
anti_join(df1, df2)
# Joining, by = c("id", "animal")
# id animal
# 1 2 dog
# 2 4 lion
方法2. setdiff
from dplyr
library(dplyr)
setdiff(df1, df2)
setdiff
和 anti_join
都是dplyr
里面的function,这两个函数有什么区别呢?
setdiff
要求两个dataframe要有同样的列,而anti_join
只需要两个dataframe之间有任意一列列名一样就可以。
df1 <- data.frame(id=c(1:5), A=c(10, 11, 12, 13, 14), B = c("a", "b", "c", "d", "d"))
df2 <- data.frame(id=c(1:3), A=c(10, 100, 1000))
df1
id A B
1 1 10 a
2 2 11 b
3 3 12 c
4 4 13 d
5 5 14 d
df2
id A
1 1 10
2 2 100
3 3 1000
setdiff(df1,df2)
# 错误: not compatible: Cols in x but not y: `B`.
anti_join(df1, df2)
# Joining, by = c("id", "A")
# id A B
# 1 2 11 b
# 2 3 12 c
# 3 4 13 d
# 4 5 14 d
# anti_join还可以自定义想要比对的列,比如,用id这一列来比较,找出df1中独有的行:
anti_join(df1, df2, by="id")
# id A B
# 1 4 13 d
# 2 5 14 d
方法3. Try the %in%
command and reverse it with !
找出df1中animal这列不在(! %in%
)df2中animal的元素/行
df1[!df1$animal %in% df2$animal, ]
方法4. 用data.table
library(data.table)
setkey(setDT(df1))[!df2]
# id animal
# 1: 2 dog
# 2: 4 lion
网友评论