美文网首页ggplot集锦
R | 统计连续变量在各个区间的频次

R | 统计连续变量在各个区间的频次

作者: 尘世中一个迷途小书僮 | 来源:发表于2021-12-25 23:02 被阅读0次

    我们通常会通过直方图来观察数据的结构,但对于如何在R中实际地统计各个区间的数据频次却较少接触。因此,本文总结三种将连续变量离散化的操作:

    ggplot2::cut_width()
    ggplot2::cut_interval()
    ggplot2::cut_number()
    

    数据模拟

    首先,我们从正态分布模拟100个数据进行后续分析

    set.seed(123)
    dat1 <- tibble(Num=rnorm(100))
    

    cut_width

    cut_width()将数据切成长度为width的区间

    dat1 %>% 
      mutate(Interval=cut_width(dat1$Num, width=0.5)) %>% 
      count(Interval)
    
    # A tibble: 10 x 2
       Interval          n
       <fct>         <int>
     1 [-2.75,-2.25]     1
     2 (-2.25,-1.75]     1
     3 (-1.75,-1.25]     4
     4 (-1.25,-0.75]     8
     5 (-0.75,-0.25]    23
     6 (-0.25,0.25]     21
     7 (0.25,0.75]      18
     8 (0.75,1.25]      13
     9 (1.25,1.75]       7
    10 (1.75,2.25]       4
    
    dat1 %>% 
      mutate(Interval=cut_width(dat1$Num, width=0.5)) %>% 
      count(Interval) %>% 
      ggplot(aes(x=Interval, y=n)) + 
      geom_bar(stat='identity')
    

    cut_interval

    cut_interval()将数据切成 n 个区间

    dat1 %>% 
      mutate(Interval=cut_interval(dat1$Num, n=10)) %>% 
      count(Interval)
    
    # A tibble: 10 x 2
       Interval             n
       <fct>            <int>
     1 [-2.31,-1.86]        2
     2 (-1.86,-1.41]        2
     3 (-1.41,-0.96]       10
     4 (-0.96,-0.511]      10
     5 (-0.511,-0.0609]    22
     6 (-0.0609,0.389]     18
     7 (0.389,0.838]       15
     8 (0.838,1.29]        11
     9 (1.29,1.74]          6
    10 (1.74,2.19]          4
    

    cut_number

    cut_number()将数据等分,确保每个区间有n个数据。类似于划分为n分位数。

    dat1 %>% 
      mutate(Interval=cut_number(dat1$Num, n=10)) %>% 
      count(Interval)
    
    # A tibble: 10 x 2
       Interval            n
       <fct>           <int>
     1 [-2.31,-1.07]      10
     2 (-1.07,-0.626]     10
     3 (-0.626,-0.387]    10
     4 (-0.387,-0.223]    10
     5 (-0.223,0.0618]    10
     6 (0.0618,0.315]     10
     7 (0.315,0.513]      10
     8 (0.513,0.882]      10
     9 (0.882,1.26]       10
    10 (1.26,2.19]        10
    

    Ref:

    https://ggplot2.tidyverse.org/reference/cut_interval.html

    完。

    相关文章

      网友评论

        本文标题:R | 统计连续变量在各个区间的频次

        本文链接:https://www.haomeiwen.com/subject/adpjqrtx.html