美文网首页
DataCamp课程 <用dplyr合并数据> Chapter1

DataCamp课程 <用dplyr合并数据> Chapter1

作者: Jason数据分析生信教室 | 来源:发表于2021-08-07 08:03 被阅读0次

用dplyr合并数据

Chapter1. 合并数据表
Chapter2. 向左,向右合并
Chapter3. 完全合并,半完全合并
Chapter4. 问题实践

Chapter1. 合并数据表

有两组数据,一组叫parts,一组叫part_categories

>
head(parts)
# A tibble: 6 x 3
  part_num name                                                   part_cat_id
  <chr>    <chr>                                                        <dbl>
1 0901     Baseplate 16 x 30 with Set 080 Yellow House Print                1
2 0902     Baseplate 16 x 24 with Set 080 Small White House Print           1
3 0903     Baseplate 16 x 24 with Set 080 Red House Print                   1
4 0904     Baseplate 16 x 24 with Set 080 Large White House Print           1
5 1        Homemaker Bookcase 2 x 4 x 4                                     7
6 10016414 Sticker Sheet #1 for 41055-1 
>
head(part_categories)
# A tibble: 6 x 2
     id name                   
  <dbl> <chr>                  
1     1 Baseplates             
2     3 Bricks Sloped          
3     4 Duplo, Quatro and Primo
4     5 Bricks Special         
5     6 Bricks Wedged          
6     7 Containers 

要通过part_cat_idid对两个数组进行合并。

parts %>% 
 inner_join(part_categories,by=c("part_cat_id"="id"))
# A tibble: 17,501 x 4
   part_num   name.x                                  part_cat_id name.y        
   <chr>      <chr>                                         <dbl> <chr>         
 1 0901       Baseplate 16 x 30 with Set 080 Yellow ~           1 Baseplates    
 2 0902       Baseplate 16 x 24 with Set 080 Small W~           1 Baseplates    
 3 0903       Baseplate 16 x 24 with Set 080 Red Hou~           1 Baseplates    
 4 0904       Baseplate 16 x 24 with Set 080 Large W~           1 Baseplates    
 5 1          Homemaker Bookcase 2 x 4 x 4                      7 Containers    
 6 10016414   Sticker Sheet #1 for 41055-1                     58 Stickers      
 7 10026stk01 Sticker for Set 10026 - (44942/4184185)          58 Stickers      
 8 10039      Pullback Motor 8 x 4 x 2/3                       44 Mechanical    
 9 10048      Minifig Hair Tousled                             65 Minifig Headw~
10 10049      Minifig Shield Broad with Spiked Botto~          27 Minifig Acces~
# ... with 17,491 more rows

发现一个问题,就是有两个变量重名了,默认变成了name.xname.y。很明显看着不爽,要改成别的名字。可以用suffix来给变量名添加后缀。

parts %>% 
    inner_join(part_categories, by = c("part_cat_id" = "id"), suffix=c("_part","_category"))
# A tibble: 17,501 x 4
   part_num   name_part                             part_cat_id name_category   
   <chr>      <chr>                                       <dbl> <chr>           
 1 0901       Baseplate 16 x 30 with Set 080 Yello~           1 Baseplates      
 2 0902       Baseplate 16 x 24 with Set 080 Small~           1 Baseplates      
 3 0903       Baseplate 16 x 24 with Set 080 Red H~           1 Baseplates      
 4 0904       Baseplate 16 x 24 with Set 080 Large~           1 Baseplates      
 5 1          Homemaker Bookcase 2 x 4 x 4                    7 Containers      
 6 10016414   Sticker Sheet #1 for 41055-1                   58 Stickers        
 7 10026stk01 Sticker for Set 10026 - (44942/41841~          58 Stickers        
 8 10039      Pullback Motor 8 x 4 x 2/3                     44 Mechanical      
 9 10048      Minifig Hair Tousled                           65 Minifig Headwear
10 10049      Minifig Shield Broad with Spiked Bot~          27 Minifig Accesso~
# ... with 17,491 more rows

当然同样的方法还可以用来合并多个数据。

sets %>%
    # Add inventories using an inner join 
    inner_join(inventories,by="set_num") %>%
    # Add inventory_parts using an inner join 
    inner_join(inventory_parts,by=c("id"="inventory_id"))
# A tibble: 258,958 x 9
   set_num name           year theme_id    id version part_num color_id quantity
   <chr>   <chr>         <dbl>    <dbl> <dbl>   <dbl> <chr>       <dbl>    <dbl>
 1 700.3-1 Medium Gift ~  1949      365 24197       1 bdoor01         2        2
 2 700.3-1 Medium Gift ~  1949      365 24197       1 bdoor01        15        1
 3 700.3-1 Medium Gift ~  1949      365 24197       1 bdoor01         4        1
 4 700.3-1 Medium Gift ~  1949      365 24197       1 bslot02        15        6
 5 700.3-1 Medium Gift ~  1949      365 24197       1 bslot02         2        6
 6 700.3-1 Medium Gift ~  1949      365 24197       1 bslot02         4        6
 7 700.3-1 Medium Gift ~  1949      365 24197       1 bslot02         1        6
 8 700.3-1 Medium Gift ~  1949      365 24197       1 bslot02        14        6
 9 700.3-1 Medium Gift ~  1949      365 24197       1 bslot02a       15        6
10 700.3-1 Medium Gift ~  1949      365 24197       1 bslot02a        2        6
# ... with 258,948 more rows

相关文章

网友评论

      本文标题:DataCamp课程 <用dplyr合并数据> Chapter1

      本文链接:https://www.haomeiwen.com/subject/lqtyvltx.html