美文网首页Rshiny
R Shiny 基础 3. 实战演练 数据探索

R Shiny 基础 3. 实战演练 数据探索

作者: Jason数据分析生信教室 | 来源:发表于2022-07-30 22:21 被阅读0次

    1. 引入

    融汇贯通之前介绍的内容,来实现一个简单的数据可视化app。首先会在R里进行一些数据探索,然后把这些功能转换为shiny的可交互模式。

    本章节会用到下面的包。

    library(shiny)
    library(vroom)
    library(tidyverse)
    

    2. 数据

    数据会用到National Electronic Injury Surveillance System (NEISS)的一份数据。记录了长期事故入院的病例数据。

    https://github.com/hadley/neiss

    本章只会用到2017年的数据,数据大小在10M左右。

    dir.create("neiss")
    #> Warning in dir.create("neiss"): 'neiss' already exists
    download <- function(name) {
      url <- "https://github.com/hadley/mastering-shiny/raw/master/neiss/"
      download.file(paste0(url, name), paste0("neiss/", name), quiet = TRUE)
    }
    download("injuries.tsv.gz")
    download("population.tsv")
    download("products.tsv")
    

    injuries 是长这样的

    injuries <- vroom::vroom("neiss/injuries.tsv.gz")
    injuries
    #> # A tibble: 255,064 × 10
    #>   trmt_date    age sex   race  body_part   diag         location prod_code weight
    #>   <date>     <dbl> <chr> <chr> <chr>       <chr>        <chr>        <dbl>  <dbl>
    #> 1 2017-01-01    71 male  white Upper Trunk Contusion O… Other P…      1807   77.7
    #> 2 2017-01-01    16 male  white Lower Arm   Burns, Ther… Home           676   77.7
    #> 3 2017-01-01    58 male  white Upper Trunk Contusion O… Home           649   77.7
    #> 4 2017-01-01    21 male  white Lower Trunk Strain, Spr… Home          4076   77.7
    #> 5 2017-01-01    54 male  white Head        Inter Organ… Other P…      1807   77.7
    #> 6 2017-01-01    21 male  white Hand        Fracture     Home          1884   77.7
    #> # … with 255,058 more rows, and 1 more variable: narrative <chr>
    

    每个变量的定义:

    • trmt_date is date the person was seen in the hospital (not when the accident occurred).
    • age, sex, and race give demographic information about the person who experienced the accident.
    • body_part is the location of the injury on the body (like ankle or ear); location is the place where the accident occurred (like home or school).
    • diag gives the basic diagnosis of the injury (like fracture or laceration).
    • prod_code is the primary product associated with the injury.
    • weight is statistical weight giving the estimated number of people who would suffer this injury if this dataset was scaled to the entire population of the US.
    • narrative is a brief story about how the accident occurred.

    然后把这个数据和其他两个数据匹配起来

    products <- vroom::vroom("neiss/products.tsv")
    products
    #> # A tibble: 38 × 2
    #>   prod_code title                            
    #>       <dbl> <chr>                            
    #> 1       464 knives, not elsewhere classified 
    #> 2       474 tableware and accessories        
    #> 3       604 desks, chests, bureaus or buffets
    #> 4       611 bathtubs or showers              
    #> 5       649 toilets                          
    #> 6       676 rugs or carpets, not specified   
    #> # … with 32 more rows
    
    population <- vroom::vroom("neiss/population.tsv")
    population
    #> # A tibble: 170 × 3
    #>     age sex    population
    #>   <dbl> <chr>       <dbl>
    #> 1     0 female    1924145
    #> 2     0 male      2015150
    #> 3     1 female    1943534
    #> 4     1 male      2031718
    #> 5     2 female    1965150
    #> 6     2 male      2056625
    #> # … with 164 more rows
    

    3. 数据探索

    在做成app之前,先把数据探索一下。可以从比较好玩的地方开始入手,比方说prod_code 为649的数据,这个表示发生漏电事故的产品编号是厕所。

    selected <- injuries %>% filter(prod_code == 649)
    nrow(selected)
    #> [1] 2993
    

    接着往下看可以找到发生事故的场所最多的是Home家里,身体部位是Head

    selected %>% count(location, wt = weight, sort = TRUE)
    #> # A tibble: 6 × 2
    #>   location                         n
    #>   <chr>                        <dbl>
    #> 1 Home                       99603. 
    #> 2 Other Public Property      18663. 
    #> 3 Unknown                    16267. 
    #> 4 School                       659. 
    #> 5 Street Or Highway             16.2
    #> 6 Sports Or Recreation Place    14.8
    
    selected %>% count(body_part, wt = weight, sort = TRUE)
    #> # A tibble: 24 × 2
    #>   body_part        n
    #>   <chr>        <dbl>
    #> 1 Head        31370.
    #> 2 Lower Trunk 26855.
    #> 3 Face        13016.
    #> 4 Upper Trunk 12508.
    #> 5 Knee         6968.
    #> 6 N.S./Unk     6741.
    #> # … with 18 more rows
    
    selected %>% count(diag, wt = weight, sort = TRUE)
    #> # A tibble: 20 × 2
    #>   diag                       n
    #>   <chr>                  <dbl>
    #> 1 Other Or Not Stated   32897.
    #> 2 Contusion Or Abrasion 22493.
    #> 3 Inter Organ Injury    21525.
    #> 4 Fracture              21497.
    #> 5 Laceration            18734.
    #> 6 Strain, Sprain         7609.
    #> # … with 14 more rows
    

    然后也可以用ggplot做简单的可视化分析,比方说按照agesex count一下数量

    summary <- selected %>% 
      count(age, sex, wt = weight)
    summary
    #> # A tibble: 208 × 3
    #>     age sex         n
    #>   <dbl> <chr>   <dbl>
    #> 1     0 female   4.76
    #> 2     0 male    14.3 
    #> 3     1 female 253.  
    #> 4     1 male   231.  
    #> 5     2 female 438.  
    #> 6     2 male   632.  
    #> # … with 202 more rows
    
    summary %>% 
      ggplot(aes(age, n, colour = sex)) + 
      geom_line() + 
      labs(y = "Estimated number of injuries")
    

    上图显示的是实际数字,也可以改成按照比例显示。比方说10000个人里有多少个。

    summary <- selected %>% 
      count(age, sex, wt = weight) %>% 
      left_join(population, by = c("age", "sex")) %>% 
      mutate(rate = n / population * 1e4)
    
    summary
    #> # A tibble: 208 × 5
    #>     age sex         n population   rate
    #>   <dbl> <chr>   <dbl>      <dbl>  <dbl>
    #> 1     0 female   4.76    1924145 0.0247
    #> 2     0 male    14.3     2015150 0.0708
    #> 3     1 female 253.      1943534 1.30  
    #> 4     1 male   231.      2031718 1.14  
    #> 5     2 female 438.      1965150 2.23  
    #> 6     2 male   632.      2056625 3.07  
    #> # … with 202 more rows
    
    summary %>% 
      ggplot(aes(age, rate, colour = sex)) + 
      geom_line(na.rm = TRUE) + 
      labs(y = "Injuries per 10,000 people")
    

    最后还可以查看一下里面一些案件的文字描述。比方说随机取样10个样本。

    selected %>% 
      sample_n(10) %>% 
      pull(narrative)
    #>  [1] "97 YOM FELL HITTING HEAD ON TOILET SEAT.DX:  NECK PX, BACK PX, FREQUENT FALLS."                                                   
    #>  [2] "95 YOF - CONTUSION HEAD - PT WAS TRANSFERRING FROM W.C TO TOILETAND FELL HITTING HEAD ON FLOOR@ N.H"                              
    #>  [3] "54YOM HAD A MECHANICAL FALL ATTEMPTING TO USE THE TOILET, C/O LT-SIDEDCHEST PAIN. DX - RIB FX, PNEUMOTHORAX, CHEST WALL CONTUSION"
    #>  [4] "99YF ATTEMPTING TO GET OFF THE TOILET&FELL FWD STRIKING HEAD&CHEST AGAINST THE WALKER, -LOC>>CHI, RIB FX, FREQ FALLS"             
    #>  [5] "79 YOF HAD SYNCOPAL EPISODE AND FELL FROM TOILET HITTING FACE ONFLOOR     DX  NASAL BONE FRACTURE"                                
    #>  [6] "76YOM C/O GLF @HOME JUST PTA. ATTEMPTING TO SIT ON TOILET, MISSED TOILET AND FELL. NO HI, NO LOC DX=LEFT HIP FRACTURE="           
    #>  [7] "6 YO M LAC HEAD-FELL,CLIMBING ON TOILET,STRUCK THE COUNTERTOP"                                                                    
    #>  [8] "85YOM SITTING ON THE TOILET AT HOME AND LEANED FORWARD FELL ONTO HEAD SUSTAINED A SUBDURAL HEMATOMA"                              
    #>  [9] "79YOF H'TMA HEAD- LOWERING ONTO TOILET, FELL ON FLOOR"                                                                            
    #> [10] "81YOM WENT TO SIT ON A TOILET AND MISSED IT AND SUSTAINED A CLOSED HEADINJURY"
    

    都是一些比较基础的数据探索。接下来要做的就是把这些事情转交给shiny,全都改写成shiny code。

    4. Shiny小试牛刀

    首先是设置UI界面。
    这里会用最简单的方式演示三张表格,一张图。可以提前用笔在纸上打一下草稿规划一下界面的排版。这里打算做成一个2行3列的界面。第一行显示三张表格,第二行显示一张图。
    由于每一行的最大宽度是12列,所以三张表格均匀分布的话是每一张占4列。

    prod_codes <- setNames(products$prod_code, products$title)
    
    ui <- fluidPage(
      fluidRow(
        column(6,
          selectInput("code", "Product", choices = prod_codes)
        )
      ),
      fluidRow(
        column(4, tableOutput("diag")),
        column(4, tableOutput("body_part")),
        column(4, tableOutput("location"))
      ),
      fluidRow(
        column(12, plotOutput("age_sex"))
      )
    )
    

    虽然到目前为止还没有对fluidRow()column()有过详细讲解,但是大概可以猜到是用来做什么的。包括setNames()selectInput() 之后的章节里都会有说明。

    然后是server端。

    server <- function(input, output, session) {
      selected <- reactive(injuries %>% filter(prod_code == input$code))
    
      output$diag <- renderTable(
        selected() %>% count(diag, wt = weight, sort = TRUE)
      )
      output$body_part <- renderTable(
        selected() %>% count(body_part, wt = weight, sort = TRUE)
      )
      output$location <- renderTable(
        selected() %>% count(location, wt = weight, sort = TRUE)
      )
    
      summary <- reactive({
        selected() %>%
          count(age, sex, wt = weight) %>%
          left_join(population, by = c("age", "sex")) %>%
          mutate(rate = n / population * 1e4)
      })
    
      output$age_sex <- renderPlot({
    
        summary() %>%
          ggplot(aes(age, n, colour = sex)) +
          geom_line() +
          labs(y = "Estimated number of injuries")
      }, res = 96)
    }
    

    5. 对齐表格样式

    刚才在刚才的结果里看出表格的行数层次不齐,不是很美观。如果可以像下面那样指定排名前几的就好了。

    injuries %>%
      mutate(diag = fct_lump(fct_infreq(diag), n = 5)) %>%
      group_by(diag) %>%
      summarise(n = as.integer(sum(weight)))
    #> # A tibble: 6 × 2
    #>   diag                        n
    #>   <fct>                   <int>
    #> 1 Other Or Not Stated   1806436
    #> 2 Fracture              1558961
    #> 3 Laceration            1432407
    #> 4 Strain, Sprain        1432556
    #> 5 Contusion Or Abrasion 1451987
    #> 6 Other                 1929147
    

    可以写一个function,不会写也没关系,之后会有详细的解说。

    count_top <- function(df, var, n = 5) {
      df %>%
        mutate({{ var }} := fct_lump(fct_infreq({{ var }}), n = n)) %>%
        group_by({{ var }}) %>%
        summarise(n = as.integer(sum(weight)))
    }
    

    然后在server端安排一下function。这里面有个细节,width=100% 指定撑满行距,让其看上去不会那么凌乱。

      output$diag <- renderTable(count_top(selected(), diag), width = "100%")
      output$body_part <- renderTable(count_top(selected(), body_part), width = "100%")
      output$location <- renderTable(count_top(selected(), location), width = "100%")
    

    5. 添加选项 rate vs count

    fluidRow(
        column(8,
          selectInput("code", "Product",
            choices = setNames(products$prod_code, products$title),
            width = "100%"
          )
        ),
        column(2, selectInput("y", "Y axis", c("rate", "count")))
      ),
    
    output$age_sex <- renderPlot({
        if (input$y == "count") {
          summary() %>%
            ggplot(aes(age, n, colour = sex)) +
            geom_line() +
            labs(y = "Estimated number of injuries")
        } else {
          summary() %>%
            ggplot(aes(age, rate, colour = sex)) +
            geom_line(na.rm = TRUE) +
            labs(y = "Injuries per 10,000 people")
        }
      }, res = 96)
    

    6. 添加文字敘述

    首先在UI里添加新的元素。比如说actionButton

    fluidRow(
        column(2, actionButton("story", "Tell me a story")),
        column(10, textOutput("narrative"))
      )
    

    然后在sever里添加一个eventReactive ,这个是表示只有被点击才会被激活。

    arrative_sample <- eventReactive(
        list(input$story, selected()),
        selected() %>% pull(narrative) %>% sample(1)
      )
      output$narrative <- renderText(narrative_sample())
    

    相关文章

      网友评论

        本文标题:R Shiny 基础 3. 实战演练 数据探索

        本文链接:https://www.haomeiwen.com/subject/djyhwrtx.html