【r<-基础|分析】初学者学习tidyverse

作者: 王诗翔 | 来源:发表于2018-06-03 15:33 被阅读29次

    tidyverse是一组处理与可视化R包的集合,其中ggplot2dplyr最广为人知。

    核心包有以下一些:

    • ggplot2 - 可视化数据
    • dplyr - 数据操作语法,可以用它解决大部分数据处理问题
    • tidyr - 清理数据
    • readr - 读入表格数据
    • purrr - 提供一个完整一致的工具集增强R的函数编程
    • tibble - 新一代数据框
    • stringr - 提供函数集用来处理字符数据
    • forcats - 提供有用工具用来处理因子问题

    有几个包没接触过,R包太多了,这些强力包还是有必要接触和学习下使用,碰到问题事半功倍。

    安装tidyverse

    install.packages("tidyverse")
    

    导入:

    library(tidyverse)
    ## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
    ## √ ggplot2 2.2.1     √ purrr   0.2.4
    ## √ tibble  1.4.2     √ dplyr   0.7.4
    ## √ tidyr   0.8.0     √ stringr 1.3.0
    ## √ readr   1.1.1     √ forcats 0.3.0
    ## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
    ## x dplyr::filter() masks stats::filter()
    ## x dplyr::lag()    masks stats::lag()
    

    有用的函数

    # tidyverse与其他包的冲突
    tidyverse_conflicts()
    # 列出所有tidyverse的依赖包
    tidyverse_deps()
    #获取tidyverse的logo
    tidyverse_logo()
    # 列出所有tidyverse包
    tidyverse_packages()
    # 更新tidyverse包
    tidyverse_update()
    

    载入数据

    library(datasets)
    #install.packages("gapminder")
    library(gapminder)
    attach(iris)
    

    dplyr

    过滤

    filter()函数可以用来取数据子集。

    iris %>% 
        filter(Species == "virginica") # 指定满足的行
    ##    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
    ## 1           6.3         3.3          6.0         2.5 virginica
    ## 2           5.8         2.7          5.1         1.9 virginica
    ## 3           7.1         3.0          5.9         2.1 virginica
    ## 4           6.3         2.9          5.6         1.8 virginica
    ## 5           6.5         3.0          5.8         2.2 virginica
    ## 6           7.6         3.0          6.6         2.1 virginica
    ## 7           4.9         2.5          4.5         1.7 virginica
    ## 8           7.3         2.9          6.3         1.8 virginica
    ## 9           6.7         2.5          5.8         1.8 virginica
    ## 10          7.2         3.6          6.1         2.5 virginica
    ## 11          6.5         3.2          5.1         2.0 virginica
    ## 12          6.4         2.7          5.3         1.9 virginica
    ## 13          6.8         3.0          5.5         2.1 virginica
    ## 14          5.7         2.5          5.0         2.0 virginica
    ## 15          5.8         2.8          5.1         2.4 virginica
    ## 16          6.4         3.2          5.3         2.3 virginica
    ## 17          6.5         3.0          5.5         1.8 virginica
    ## 18          7.7         3.8          6.7         2.2 virginica
    ## 19          7.7         2.6          6.9         2.3 virginica
    ## 20          6.0         2.2          5.0         1.5 virginica
    ## 21          6.9         3.2          5.7         2.3 virginica
    ## 22          5.6         2.8          4.9         2.0 virginica
    ## 23          7.7         2.8          6.7         2.0 virginica
    ## 24          6.3         2.7          4.9         1.8 virginica
    ## 25          6.7         3.3          5.7         2.1 virginica
    ## 26          7.2         3.2          6.0         1.8 virginica
    ## 27          6.2         2.8          4.8         1.8 virginica
    ## 28          6.1         3.0          4.9         1.8 virginica
    ## 29          6.4         2.8          5.6         2.1 virginica
    ## 30          7.2         3.0          5.8         1.6 virginica
    ## 31          7.4         2.8          6.1         1.9 virginica
    ## 32          7.9         3.8          6.4         2.0 virginica
    ## 33          6.4         2.8          5.6         2.2 virginica
    ## 34          6.3         2.8          5.1         1.5 virginica
    ## 35          6.1         2.6          5.6         1.4 virginica
    ## 36          7.7         3.0          6.1         2.3 virginica
    ## 37          6.3         3.4          5.6         2.4 virginica
    ## 38          6.4         3.1          5.5         1.8 virginica
    ## 39          6.0         3.0          4.8         1.8 virginica
    ## 40          6.9         3.1          5.4         2.1 virginica
    ## [到达getOption("max.print") -- 略过10行]]
    iris %>% 
        filter(Species == "virginica", Sepal.Length > 6) # 多个条件用,分隔
    ##    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
    ## 1           6.3         3.3          6.0         2.5 virginica
    ## 2           7.1         3.0          5.9         2.1 virginica
    ## 3           6.3         2.9          5.6         1.8 virginica
    ## 4           6.5         3.0          5.8         2.2 virginica
    ## 5           7.6         3.0          6.6         2.1 virginica
    ## 6           7.3         2.9          6.3         1.8 virginica
    ## 7           6.7         2.5          5.8         1.8 virginica
    ## 8           7.2         3.6          6.1         2.5 virginica
    ## 9           6.5         3.2          5.1         2.0 virginica
    ## 10          6.4         2.7          5.3         1.9 virginica
    ## 11          6.8         3.0          5.5         2.1 virginica
    ## 12          6.4         3.2          5.3         2.3 virginica
    ## 13          6.5         3.0          5.5         1.8 virginica
    ## 14          7.7         3.8          6.7         2.2 virginica
    ## 15          7.7         2.6          6.9         2.3 virginica
    ## 16          6.9         3.2          5.7         2.3 virginica
    ## 17          7.7         2.8          6.7         2.0 virginica
    ## 18          6.3         2.7          4.9         1.8 virginica
    ## 19          6.7         3.3          5.7         2.1 virginica
    ## 20          7.2         3.2          6.0         1.8 virginica
    ## 21          6.2         2.8          4.8         1.8 virginica
    ## 22          6.1         3.0          4.9         1.8 virginica
    ## 23          6.4         2.8          5.6         2.1 virginica
    ## 24          7.2         3.0          5.8         1.6 virginica
    ## 25          7.4         2.8          6.1         1.9 virginica
    ## 26          7.9         3.8          6.4         2.0 virginica
    ## 27          6.4         2.8          5.6         2.2 virginica
    ## 28          6.3         2.8          5.1         1.5 virginica
    ## 29          6.1         2.6          5.6         1.4 virginica
    ## 30          7.7         3.0          6.1         2.3 virginica
    ## 31          6.3         3.4          5.6         2.4 virginica
    ## 32          6.4         3.1          5.5         1.8 virginica
    ## 33          6.9         3.1          5.4         2.1 virginica
    ## 34          6.7         3.1          5.6         2.4 virginica
    ## 35          6.9         3.1          5.1         2.3 virginica
    ## 36          6.8         3.2          5.9         2.3 virginica
    ## 37          6.7         3.3          5.7         2.5 virginica
    ## 38          6.7         3.0          5.2         2.3 virginica
    ## 39          6.3         2.5          5.0         1.9 virginica
    ## 40          6.5         3.0          5.2         2.0 virginica
    ## [到达getOption("max.print") -- 略过1行]]
    

    排序

    arrange()函数用来对观察值排序,默认是升序。

    iris %>% 
        arrange(Sepal.Length)
    ##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    ## 1            4.3         3.0          1.1         0.1     setosa
    ## 2            4.4         2.9          1.4         0.2     setosa
    ## 3            4.4         3.0          1.3         0.2     setosa
    ## 4            4.4         3.2          1.3         0.2     setosa
    ## 5            4.5         2.3          1.3         0.3     setosa
    ## 6            4.6         3.1          1.5         0.2     setosa
    ## 7            4.6         3.4          1.4         0.3     setosa
    ## 8            4.6         3.6          1.0         0.2     setosa
    ## 9            4.6         3.2          1.4         0.2     setosa
    ## 10           4.7         3.2          1.3         0.2     setosa
    ## 11           4.7         3.2          1.6         0.2     setosa
    ## 12           4.8         3.4          1.6         0.2     setosa
    ## 13           4.8         3.0          1.4         0.1     setosa
    ## 14           4.8         3.4          1.9         0.2     setosa
    ## 15           4.8         3.1          1.6         0.2     setosa
    ## 16           4.8         3.0          1.4         0.3     setosa
    ## 17           4.9         3.0          1.4         0.2     setosa
    ## 18           4.9         3.1          1.5         0.1     setosa
    ## 19           4.9         3.1          1.5         0.2     setosa
    ## 20           4.9         3.6          1.4         0.1     setosa
    ## 21           4.9         2.4          3.3         1.0 versicolor
    ## 22           4.9         2.5          4.5         1.7  virginica
    ## 23           5.0         3.6          1.4         0.2     setosa
    ## 24           5.0         3.4          1.5         0.2     setosa
    ## 25           5.0         3.0          1.6         0.2     setosa
    ## 26           5.0         3.4          1.6         0.4     setosa
    ## 27           5.0         3.2          1.2         0.2     setosa
    ## 28           5.0         3.5          1.3         0.3     setosa
    ## 29           5.0         3.5          1.6         0.6     setosa
    ## 30           5.0         3.3          1.4         0.2     setosa
    ## 31           5.0         2.0          3.5         1.0 versicolor
    ## 32           5.0         2.3          3.3         1.0 versicolor
    ## 33           5.1         3.5          1.4         0.2     setosa
    ## 34           5.1         3.5          1.4         0.3     setosa
    ## 35           5.1         3.8          1.5         0.3     setosa
    ## 36           5.1         3.7          1.5         0.4     setosa
    ## 37           5.1         3.3          1.7         0.5     setosa
    ## 38           5.1         3.4          1.5         0.2     setosa
    ## 39           5.1         3.8          1.9         0.4     setosa
    ## 40           5.1         3.8          1.6         0.2     setosa
    ## [到达getOption("max.print") -- 略过110行]]
    
    iris %>% 
        arrange(desc(Sepal.Length)) # 降序
    ##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    ## 1            7.9         3.8          6.4         2.0  virginica
    ## 2            7.7         3.8          6.7         2.2  virginica
    ## 3            7.7         2.6          6.9         2.3  virginica
    ## 4            7.7         2.8          6.7         2.0  virginica
    ## 5            7.7         3.0          6.1         2.3  virginica
    ## 6            7.6         3.0          6.6         2.1  virginica
    ## 7            7.4         2.8          6.1         1.9  virginica
    ## 8            7.3         2.9          6.3         1.8  virginica
    ## 9            7.2         3.6          6.1         2.5  virginica
    ## 10           7.2         3.2          6.0         1.8  virginica
    ## 11           7.2         3.0          5.8         1.6  virginica
    ## 12           7.1         3.0          5.9         2.1  virginica
    ## 13           7.0         3.2          4.7         1.4 versicolor
    ## 14           6.9         3.1          4.9         1.5 versicolor
    ## 15           6.9         3.2          5.7         2.3  virginica
    ## 16           6.9         3.1          5.4         2.1  virginica
    ## 17           6.9         3.1          5.1         2.3  virginica
    ## 18           6.8         2.8          4.8         1.4 versicolor
    ## 19           6.8         3.0          5.5         2.1  virginica
    ## 20           6.8         3.2          5.9         2.3  virginica
    ## 21           6.7         3.1          4.4         1.4 versicolor
    ## 22           6.7         3.0          5.0         1.7 versicolor
    ## 23           6.7         3.1          4.7         1.5 versicolor
    ## 24           6.7         2.5          5.8         1.8  virginica
    ## 25           6.7         3.3          5.7         2.1  virginica
    ## 26           6.7         3.1          5.6         2.4  virginica
    ## 27           6.7         3.3          5.7         2.5  virginica
    ## 28           6.7         3.0          5.2         2.3  virginica
    ## 29           6.6         2.9          4.6         1.3 versicolor
    ## 30           6.6         3.0          4.4         1.4 versicolor
    ## 31           6.5         2.8          4.6         1.5 versicolor
    ## 32           6.5         3.0          5.8         2.2  virginica
    ## 33           6.5         3.2          5.1         2.0  virginica
    ## 34           6.5         3.0          5.5         1.8  virginica
    ## 35           6.5         3.0          5.2         2.0  virginica
    ## 36           6.4         3.2          4.5         1.5 versicolor
    ## 37           6.4         2.9          4.3         1.3 versicolor
    ## 38           6.4         2.7          5.3         1.9  virginica
    ## 39           6.4         3.2          5.3         2.3  virginica
    ## 40           6.4         2.8          5.6         2.1  virginica
    ## [到达getOption("max.print") -- 略过110行]]
    

    新增变量

    mutate()可以更新或者新增数据框一列。

    iris %>% 
        mutate(Sepal.Length = Sepal.Length * 10) # 将该列数值变成以mm为单位
    ##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
    ## 1             51         3.5          1.4         0.2     setosa
    ## 2             49         3.0          1.4         0.2     setosa
    ## 3             47         3.2          1.3         0.2     setosa
    ## 4             46         3.1          1.5         0.2     setosa
    ## 5             50         3.6          1.4         0.2     setosa
    ## 6             54         3.9          1.7         0.4     setosa
    ## 7             46         3.4          1.4         0.3     setosa
    ## 8             50         3.4          1.5         0.2     setosa
    ## 9             44         2.9          1.4         0.2     setosa
    ## 10            49         3.1          1.5         0.1     setosa
    ## 11            54         3.7          1.5         0.2     setosa
    ## 12            48         3.4          1.6         0.2     setosa
    ## 13            48         3.0          1.4         0.1     setosa
    ## 14            43         3.0          1.1         0.1     setosa
    ## 15            58         4.0          1.2         0.2     setosa
    ## 16            57         4.4          1.5         0.4     setosa
    ## 17            54         3.9          1.3         0.4     setosa
    ## 18            51         3.5          1.4         0.3     setosa
    ## 19            57         3.8          1.7         0.3     setosa
    ## 20            51         3.8          1.5         0.3     setosa
    ## 21            54         3.4          1.7         0.2     setosa
    ## 22            51         3.7          1.5         0.4     setosa
    ## 23            46         3.6          1.0         0.2     setosa
    ## 24            51         3.3          1.7         0.5     setosa
    ## 25            48         3.4          1.9         0.2     setosa
    ## 26            50         3.0          1.6         0.2     setosa
    ## 27            50         3.4          1.6         0.4     setosa
    ## 28            52         3.5          1.5         0.2     setosa
    ## 29            52         3.4          1.4         0.2     setosa
    ## 30            47         3.2          1.6         0.2     setosa
    ## 31            48         3.1          1.6         0.2     setosa
    ## 32            54         3.4          1.5         0.4     setosa
    ## 33            52         4.1          1.5         0.1     setosa
    ## 34            55         4.2          1.4         0.2     setosa
    ## 35            49         3.1          1.5         0.2     setosa
    ## 36            50         3.2          1.2         0.2     setosa
    ## 37            55         3.5          1.3         0.2     setosa
    ## 38            49         3.6          1.4         0.1     setosa
    ## 39            44         3.0          1.3         0.2     setosa
    ## 40            51         3.4          1.5         0.2     setosa
    ## [到达getOption("max.print") -- 略过110行]]
    iris %>% 
        mutate(SLMn = Sepal.Length * 10) # 创建新的一列
    ##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species SLMn
    ## 1            5.1         3.5          1.4         0.2     setosa   51
    ## 2            4.9         3.0          1.4         0.2     setosa   49
    ## 3            4.7         3.2          1.3         0.2     setosa   47
    ## 4            4.6         3.1          1.5         0.2     setosa   46
    ## 5            5.0         3.6          1.4         0.2     setosa   50
    ## 6            5.4         3.9          1.7         0.4     setosa   54
    ## 7            4.6         3.4          1.4         0.3     setosa   46
    ## 8            5.0         3.4          1.5         0.2     setosa   50
    ## 9            4.4         2.9          1.4         0.2     setosa   44
    ## 10           4.9         3.1          1.5         0.1     setosa   49
    ## 11           5.4         3.7          1.5         0.2     setosa   54
    ## 12           4.8         3.4          1.6         0.2     setosa   48
    ## 13           4.8         3.0          1.4         0.1     setosa   48
    ## 14           4.3         3.0          1.1         0.1     setosa   43
    ## 15           5.8         4.0          1.2         0.2     setosa   58
    ## 16           5.7         4.4          1.5         0.4     setosa   57
    ## 17           5.4         3.9          1.3         0.4     setosa   54
    ## 18           5.1         3.5          1.4         0.3     setosa   51
    ## 19           5.7         3.8          1.7         0.3     setosa   57
    ## 20           5.1         3.8          1.5         0.3     setosa   51
    ## 21           5.4         3.4          1.7         0.2     setosa   54
    ## 22           5.1         3.7          1.5         0.4     setosa   51
    ## 23           4.6         3.6          1.0         0.2     setosa   46
    ## 24           5.1         3.3          1.7         0.5     setosa   51
    ## 25           4.8         3.4          1.9         0.2     setosa   48
    ## 26           5.0         3.0          1.6         0.2     setosa   50
    ## 27           5.0         3.4          1.6         0.4     setosa   50
    ## 28           5.2         3.5          1.5         0.2     setosa   52
    ## 29           5.2         3.4          1.4         0.2     setosa   52
    ## 30           4.7         3.2          1.6         0.2     setosa   47
    ## 31           4.8         3.1          1.6         0.2     setosa   48
    ## 32           5.4         3.4          1.5         0.4     setosa   54
    ## 33           5.2         4.1          1.5         0.1     setosa   52
    ## [到达getOption("max.print") -- 略过117行]]
    

    整合函数流:

    iris %>% 
        filter(Species == "Virginica") %>% 
        mutate(SLMm = Sepal.Length) %>% 
        arrange(desc(SLMm))
    ## [1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
    ## [6] SLMm        
    ## <0 行> (或0-长度的row.names)
    

    汇总

    summarize()函数可以让我们将很多变量汇总为单个的数据点。

    iris %>% 
        summarize(medianSL = median(Sepal.Length))
    ##   medianSL
    ## 1      5.8
    iris %>% 
        filter(Species == "virginica") %>% 
        summarize(medianSL=median(Sepal.Length))
    ##   medianSL
    ## 1      6.5
    

    还可以一次性汇总多个变量

    iris %>% 
        filter(Species == "virginica") %>% 
        summarize(medianSL = median(Sepal.Length),
                  maxSL = max(Sepal.Length))
    ##   medianSL maxSL
    ## 1      6.5   7.9
    

    group_by()可以让我们安装指定的组别进行汇总数据,而不是针对整个数据框

    iris %>% 
        group_by(Species) %>% 
        summarize(medianSL = median(Sepal.Length),
                  maxSL = max(Sepal.Length))
    ## # A tibble: 3 x 3
    ##   Species    medianSL maxSL
    ##   <fct>         <dbl> <dbl>
    ## 1 setosa         5.00  5.80
    ## 2 versicolor     5.90  7.00
    ## 3 virginica      6.50  7.90
    
    iris %>% 
        filter(Sepal.Length>6) %>% 
        group_by(Species) %>% 
        summarize(medianPL = median(Petal.Length), 
                  maxPL = max(Petal.Length))
    ## # A tibble: 2 x 3
    ##   Species    medianPL maxPL
    ##   <fct>         <dbl> <dbl>
    ## 1 versicolor     4.60  5.00
    ## 2 virginica      5.60  6.90
    

    ggplot2

    散点图

    散点图可以帮助我们理解两个变量的数据关系,使用geom_point()可以绘制散点图:

    iris_small <- iris %>% 
        filter(Sepal.Length > 5)
    
    ggplot(iris_small, aes(x = Petal.Length,
                           y = Petal.Width)) + 
        geom_point()
    
    img

    额外的美学映射

    • 颜色
    ggplot(iris_small, aes(x = Petal.Length,
                           y = Petal.Width,
                           color = Species)) + 
        geom_point()
    
    img
    • 大小
    ggplot(iris_small, aes(x = Petal.Length,
                           y = Petal.Width,
                           color = Species,
                           size = Sepal.Length)) + 
        geom_point()
    
    img
    • 分面
    ggplot(iris_small, aes(x = Petal.Length,
                           y = Petal.Width)) + 
        geom_point() + 
        facet_wrap(~Species)
    
    img

    线图

    by_year <- gapminder %>% 
        group_by(year) %>% 
        summarize(medianGdpPerCap = median(gdpPercap))
    
    ggplot(by_year, aes(x = year,
                        y = medianGdpPerCap)) +
        geom_line() + 
        expand_limits(y=0)
    
    img

    条形图

    by_species <- iris %>%  
        filter(Sepal.Length > 6) %>% 
        group_by(Species) %>% 
        summarize(medianPL=median(Petal.Length))
    
    ggplot(by_species, aes(x = Species, y=medianPL)) + 
        geom_col()
    
    img

    直方图

    ggplot(iris_small, aes(x = Petal.Length)) + 
        geom_histogram()
    ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
    
    img

    箱线图

    ggplot(iris_small, aes(x=Species, y=Sepal.Length)) + 
        geom_boxplot()
    
    img

    资料来源:DataCamp

    dplyr tidyverse ggplot

    Related

    相关文章

      网友评论

        本文标题:【r<-基础|分析】初学者学习tidyverse

        本文链接:https://www.haomeiwen.com/subject/lymjsftx.html