美文网首页
统计绘图 | 归一化 vs 标准化

统计绘图 | 归一化 vs 标准化

作者: shwzhao | 来源:发表于2022-08-19 11:52 被阅读0次

    具体请参考:
    https://en.wikipedia.org/wiki/Feature_scaling
    https://en.wikipedia.org/wiki/Normalization_(statistics)
    CSDN | 为什么要做特征归一化/标准化?
    CSDN | 标准化和归一化,请勿混为一谈,透彻理解数据变换)
    公众号 | 数据处理中的标准化、归一化,究竟是什么?
    公众号 | 数据标准化_z-score

    先看一下3组数据处理前的分布

    mtcars %>%
      select(mpg, disp, hp) %>%
      rownames_to_column("car") %>%
      pivot_longer(-car, names_to = "terms", values_to = "values") %>%
      ggplot() +
      geom_violin(aes(terms, values)) +
      theme_bw()
    
    image.png

    再看一下数据处理后的分布,进入了同一量纲,趋势看起来都比较一致,归一化有确定区间,标准化没有。

    mtcars %>%
      select(mpg, disp, hp) %>%
      rownames_to_column("car") %>%
      pivot_longer(-car, names_to = "terms", values_to = "values") %>%
      group_by(terms) %>%
      mutate(normal_values = (values-min(values))/(max(values)-min(values)),
             standard_values = (values-mean(values))/sd(values)) %>%
      pivot_longer(-c(car, terms), names_to = "termss", values_to = "valuess") %>%
      filter(termss != "values") %>%
      ggplot() +
      geom_violin(aes(termss, valuess)) +
      # geom_boxplot(aes(termss, valuess)) +
      facet_wrap(~terms) +
      theme_bw()
    
    image.png
    mtcars %>%
      select(mpg, disp, hp) %>%
      scale() %>%
      head()
    #>                          mpg        disp         hp
    #> Mazda RX4          0.1508848 -0.57061982 -0.5350928
    #> Mazda RX4 Wag      0.1508848 -0.57061982 -0.5350928
    #> Datsun 710         0.4495434 -0.99018209 -0.7830405
    #> Hornet 4 Drive     0.2172534  0.22009369 -0.5350928
    #> Hornet Sportabout -0.2307345  1.04308123  0.4129422
    #> Valiant           -0.3302874 -0.04616698 -0.6080186
    mtcars %>%
      select(mpg, disp, hp) %>%
      rownames_to_column("car") %>%
      pivot_longer(-car, names_to = "terms", values_to = "values") %>%
      group_by(terms) %>%
      mutate(normal_values = (values-min(values))/(max(values)-min(values)),
             standard_values = (values-mean(values))/sd(values)) %>%
      pivot_wider(id_cols = car, names_from = terms, values_from = standard_values) %>%
      head()
    #> # A tibble: 6 x 4
    #>   car                  mpg    disp     hp
    #>   <chr>              <dbl>   <dbl>  <dbl>
    #> 1 Mazda RX4          0.151 -0.571  -0.535
    #> 2 Mazda RX4 Wag      0.151 -0.571  -0.535
    #> 3 Datsun 710         0.450 -0.990  -0.783
    #> 4 Hornet 4 Drive     0.217  0.220  -0.535
    #> 5 Hornet Sportabout -0.231  1.04    0.413
    #> 6 Valiant           -0.330 -0.0462 -0.608
    

    相关文章

      网友评论

          本文标题:统计绘图 | 归一化 vs 标准化

          本文链接:https://www.haomeiwen.com/subject/pifugrtx.html