【上一篇:46.关于Mutating joins类函数和merge函数】
【下一篇:48.关于Set Operations-集合操作函数】
上篇讲了Mutating joins类的四个函数和base R中的merge函数,本篇讲Filtering joins。
Filtering joins类函数包括semi_join()、anti_join(),返回的结果是过滤后的observations。
Set Operations类函数(集合操作类函数)包括intersect()、union()、setdiff(),用来比较两个数据框,返回两个数据框行的交集、并集和差集。
Filtering joins类的两个函数的Usage比Mutating joins类的四个函数简单多了:
semi_join(x, y, by = NULL, copy = FALSE, ...)
anti_join(x, y, by = NULL, copy = FALSE, ...)
Filtering joins类函数根据y中是否存在匹配项,筛选连接x中的筛选行。semi_join返回x中与y匹配的所有行;anti_join返回x中没有与y匹配的所有行,也就是与semo_join的作用正好是相反的。by参数的用法和Mutating join类函数中的是一致的。
总结:返回结果是x的一个子集,所有x的属性都保持不变;行的顺序尽可能与x中的顺序保持一致,列不会被编辑。
这两个函数很简单,举例说明如下:
nt1<-tibble(
name = c("ZhangS","LiS","WangW","ZhaoL","QianQ","SunB","LiJ","ZhouS"),
score = c("90","89","95","93","88","96","92","85"),
sex = c("male","male","female","female","male","female","female","female"),
address = c("Beijing","Tianjin","Guangzhou","Shanghai","Shenzen","Wuhan","Liuyang","Fuzhou")
)
nt2<-tibble(
name = c("ZhangS","ZhaoL","QianQ","ZhouS"),
hobbies = c("baseball","basketball","tennis","swim")
)
> semi_join(nt1,nt2)
Joining, by = "name"
# A tibble: 4 x 4
name score sex address
<chr> <chr> <chr> <chr>
1 ZhangS 90 male Beijing
2 ZhaoL 93 female Shanghai
3 QianQ 88 male Shenzen
4 ZhouS 85 female Fuzhou
> anti_join(nt1,nt2)
Joining, by = "name"
# A tibble: 4 x 4
name score sex address
<chr> <chr> <chr> <chr>
1 LiS 89 male Tianjin
2 WangW 95 female Guangzhou
3 SunB 96 female Wuhan
4 LiJ 92 female Liuyang
当然也有重复值的问题,但简单很多:如果nt1中有重复值,无论nt2中有没有重复、重复几次,都不影响nt1输出所有重复行。
【上一篇:46.关于Mutating joins类函数和merge函数】
【下一篇:48.关于Set Operations-集合操作函数】
网友评论