美文网首页
DAY 6 R 语言

DAY 6 R 语言

作者: Peng_001 | 来源:发表于2020-05-07 10:43 被阅读0次

    Joining Data with dplyr

    inner join

    inner_join,通过某个列表信息作为纽带,将两个不同的表格连接起来。

    sets %>%
      inner_join(theme, by = c("theme_id" = "id"), suffix = c("_set", "_theme"))
    # 通过theme_id与id将sets 与theme 连接起来。
    

    c("first_table_column" = "second_table_column"). 需要注意的是,by后面内容有顺序要求。通过by 后面内容中的向量使用,分割,可以设定列表通过多行内容连接。

    1. 连接多行列表



      只要是相连的两列有相同内容的信息即可,即便所有列表可能不全部包含该列信息。

    left join

    如果相连接的列表中有某个列表没有某个内容,inner join 会自动忽略掉那一行oberservation 的信息。


    使用left join 可以保留first table 中的信息,而被连接列表中没有的信息则显示为NA


    例子
    inventory_version_1 <- inventories %>%
        filter(version == 1)
    
    # Join versions to sets
    sets %>%
      left_join(inventory_version_1, by = "set_num") %>%
      # Filter for where version is na
      filter(is.na(version))
    

    right join

    与left join 相反,保留连接的second table中内容


    • 三者对比


    replace_na

    replace_na(list(n=0))
    例子

    batman_parts %>%
      # Combine the star_wars_parts table 
      full_join(star_wars_parts, by = c("part_num", "color_id"), suffix = c("_batman", "_star_wars")) %>%
      # Replace NAs with 0s in the n_batman and n_star_wars columns 
      replace_na(list(n_batman = 0, n_star_wars = 0))
    

    join table to themselves


    通过自连接可以将这种具有层级关系的表格连接在一起。
    子数据关于父数据的id(parent_id)直接与对于的父数据id 相等。
    例子

    # themes 表格
          id name           parent_id
       <dbl> <chr>              <dbl>
     1     1 Technic               NA
     2     2 Arctic Technic         1
     3     3 Competition            1
     4     4 Expert Builder         1
     5     5 Model                  1
     6     6 Airport                5
     7     7 Construction           5
     8     8 Farm                   5
     9     9 Fire                   5
    10    10 Harbor                 5
    
    themes %>% 
        # Inner join the themes table
        inner_join(themes, by = c("id" = "parent_id"),suffix = c("_parent", "_child")) %>%
        # Filter for the "Harry Potter" parent name 
        filter(name_parent == "Harry Potter")
    
    # 自连接后输出
        id name_parent  parent_id id_child name_child          
      <dbl> <chr>            <dbl>    <dbl> <chr>               
    1   246 Harry Potter        NA      247 Chamber of Secrets  
    2   246 Harry Potter        NA      248 Goblet of Fire      
    3   246 Harry Potter        NA      249 Order of the Phoenix
    4   246 Harry Potter        NA      250 Prisoner of Azkaban 
    5   246 Harry Potter        NA      251 Sorcerer's Stone    
    6   246 Harry Potter        NA      667 Fantastic Beasts
    

    full join

    全连接,不管连的(first table)还是被连的(second table)若数据存在缺失,均连接起来。
    缺失内容用NA表示。

    batman_parts %>%
      # Combine the star_wars_parts table 
      full_join(star_wars_parts, by = c("part_num", "color_id"), suffix = c("_batman", "_star_wars")) %>%
      # Replace NAs with 0s in the n_batman and n_star_wars columns 
      replace_na(list(n_batman = 0, n_star_wars = 0))
    

    inner_join, left/right_join, full_join 都可以称为mutating join。

    filtering join

    semi join & anti join


    例子

    # Filter the batwing set for parts that are also in the batmobile set
    batwing %>%
    semi_join(batmobile, by = c("part_num"))
    
    # Filter the batwing set for parts that aren't in the batmobile set
    batwing %>%
    anti_join(batmobile, by = c("part_num"))
    

    总结


    ps: 使用n() 获得总共的变量数量

    questions_with_tags %>%
        # Group by tag_name
        group_by(tag_name) %>%
        # Get mean score and num_questions
        summarize(score = mean(score),
                  num_questions = n()) %>%
        # Sort num_questions in descending order
        arrange(desc(num_questions))
    

    其他语句

    bind_rows() 可以将两个表格的内容上下拼接起来(行与行)
    例子

    # Combine the two tables into posts_with_tags
    posts_with_tags <- bind_rows(questions_with_tags %>% mutate(type = "question"),
                                  answers_with_tags %>% mutate(type = "answer"))
    
    # Add a year column, then aggregate by type, year, and tag_name
    posts_with_tags %>%
      mutate(year = year(creation_date)) %>%
      count(type, year, tag_name)
    

    总结

    相关文章

      网友评论

          本文标题:DAY 6 R 语言

          本文链接:https://www.haomeiwen.com/subject/ubcughtx.html