[R语言] 与ggplot2相关有趣的包

作者: 半为花间酒 | 来源:发表于2020-04-04 13:15 被阅读0次

    patchwork包

    部分内容参考:ggplot2拼图包patchwork推荐与使用

    - 不用赋值可以直接相加

    ggplot(mtcars) +
        geom_point(aes(mpg, disp)) +
        ggplot(mtcars) + 
        geom_boxplot(aes(gear, disp, group = gear))
    
    • plot_layout() 调整布局
    p1 <- ggplot(mtcars) + geom_point(aes(mpg, disp))
    p2 <- ggplot(mtcars) + geom_boxplot(aes(gear, disp, group = gear))
    
    p1 + p2 + plot_layout(ncol = 1, heights = c(3, 1))
    
    p1 <- ggplot(diamonds, aes(x=cut, y=price)) +
      geom_violin(aes(fill = cut))
    
    p2 <- ggplot(diamonds, aes(price)) +
      geom_histogram() +
      facet_wrap(~ cut, nrow = 2)
      
    p3 <- ggplot(diamonds, aes(price)) +
      geom_freqpoly(aes(color = cut))
    
    p2 + (p1 / p3) + plot_layout(ncol = 2, widths = c(2, 1))
    
    p1 + p2 + p3 + plot_layout(ncol = 1)
    # 等价于
    p1 / p2 / p3
    

    - plot_spacer() 填充空白

    p1 + plot_spacer() + p2
    

    - 操作符

    1. - 号
    (p1 + p2 - p3)/
    (p1 + p2 + p3)
    
    1. * 和 &
    p3 <- ggplot(mtcars) + geom_smooth(aes(disp, qsec))
    p4 <- ggplot(mtcars) + geom_bar(aes(carb))
    
    # *只会应用到当前嵌套层
    (p1 / (p2 + p3) / p4) * theme_bw()
    
    # &应用到全部嵌套层
    (p1 / (p2 + p3) / p4) & theme_bw()
    

    lvplot包

    针对大数据有超过箱线图的优势
    (没弄太明白,深入的功能待后续补充)

    An extension of standard boxplots which draws k letter statistics. Conventional boxplots (Tukey 1977) are useful displays for conveying rough information about the central 50% of the data and the extent of the data. For moderate-sized data sets (n < 1000), detailed estimates of tail behavior beyond the quartiles may not be trustworthy, so the information provided by boxplots is appropriately somewhat vague beyond the quartiles, and the expected number of “outliers” and “far-out” values for a Gaussian sample of size n is often less than 10 (Hoaglin, Iglewicz, and Tukey 1986). Large data sets (napprox 10,000-100,000)afford more precise estimates of quantiles in the tails beyond the quartiles and also can be expected to present a large number of “outliers” (about 0.4 + 0.007 n). The letter-value box plot addresses both these shortcomings: it conveys more detailed information in the tails using letter values, only out to the depths where the letter values are reliable estimates of their corresponding quantiles (corresponding to tail areas of roughly 2^{-i}); “outliers” are defined as a function of the most extreme letter value shown. All aspects shown on the letter-value boxplot are actual observations, thus remaining faithful to the principles that governed Tukey's original boxplot.

    library(lvplot)
    
    p1 <- ggplot(diamonds, aes(x=cut, y=price)) +
      geom_lv()
    
    p2 <- ggplot(diamonds, aes(x = cut, y = price)) +
      geom_boxplot()
    
    p1 + p2
    

    ggbeeswarm包

    部分内容参考:ggbeeswarm应用

    - quasirandom

    # 分类变量和连续变量的关系常规散点图很难观察太多信息
    p1 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_point() 
    
    # ggplot2内置添加扰动
    p2 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_jitter()
    
    # 利用ggbeeswarm包中的quasirandom,默认类似小提琴图
    p3 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom() 
    
    # groupOnX为F时和直接用geom_point一样
    p4 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom(groupOnX=F) 
    
    # 控制多组内变量分布宽度,宽度和组内数据正相关
    p5 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom(varwidth=TRUE) 
    
    # 控制组间的横向堆叠宽度
    p6 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom(dodge.width=0.8)  
    
    
    (p1 + p2) / (p3 + p4) / (p5 + p6)
    

    method参数控制点的分散方式

    p1 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom(method="tukey")
    
    p2 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom(method="tukeyDense")
    
    p3 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom(method="frowney")
    
    p4 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom(method="smiley")
    
    p5 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom(method="pseudorandom")
    
    p6 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom() # default
    
    (p1 + p2) / (p3 + p4) / (p5 + p6)
    

    - beeswarm

    # beeswarm比quasirandom规整
    p1 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_beeswarm()
    
    p2 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_quasirandom() 
    
    p1 / p2
    
    p1 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_beeswarm()
    
    # 控制点间距
    p2 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_beeswarm(cex=1.5)
    
    
    p3 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_beeswarm(groupOnX=F)
    
    # 我也不知道控制啥
    p4 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_beeswarm(priority='density')
    
    p5 <- ggplot(mpg,aes(class, displ,color = cyl)) +
      geom_beeswarm(dodge.width=1)
    
    (p1 + p2) / (p3 + p4) / (p5 + plot_spacer())
    

    Seriation包

    Seriation is an interesting technique for ordering the matrix (see this interesting post: http://nicolas.kruchten.com/content/2018/02/seriation/). The powerful seriation package implements quite a lot of methods for seriation. Since it is easy to extract row orders and column orders from the object returned by the core function seriate() from seriation package. They can be directly assigned to row_order and column_order to make the heatmap.

    除了介绍里提到的Make Patterns Pop Out of Heatmaps with Seriation,还有两个重要的参考来源:

    1. Getting Things in Order: An Introduction to the R Package seriation: 文章详细介绍了Seriation的原理和使用
    2. ComplexHeatmap-reference: ComplexHeatmap参考手册

    附上ComplexHeatmap不错的学习链接:使用ComplexHeatmap包绘制热图

    # 设置数据
    set.seed(123)
    nr1 = 4; nr2 = 8; nr3 = 6; nr = nr1 + nr2 + nr3
    nc1 = 6; nc2 = 8; nc3 = 10; nc = nc1 + nc2 + nc3
    mat = cbind(rbind(matrix(rnorm(nr1*nc1, mean = 1,   sd = 0.5), nr = nr1),
                      matrix(rnorm(nr2*nc1, mean = 0,   sd = 0.5), nr = nr2),
                      matrix(rnorm(nr3*nc1, mean = 0,   sd = 0.5), nr = nr3)),
                rbind(matrix(rnorm(nr1*nc2, mean = 0,   sd = 0.5), nr = nr1),
                      matrix(rnorm(nr2*nc2, mean = 1,   sd = 0.5), nr = nr2),
                      matrix(rnorm(nr3*nc2, mean = 0,   sd = 0.5), nr = nr3)),
                rbind(matrix(rnorm(nr1*nc3, mean = 0.5, sd = 0.5), nr = nr1),
                      matrix(rnorm(nr2*nc3, mean = 0.5, sd = 0.5), nr = nr2),
                      matrix(rnorm(nr3*nc3, mean = 1,   sd = 0.5), nr = nr3))
    )
    mat = mat[sample(nr, nr), sample(nc, nc)] # random shuffle rows and columns
    rownames(mat) = paste0("row", seq_len(nr))
    colnames(mat) = paste0("column", seq_len(nc))
    
    library(seriation)
    o = seriate(max(mat) - mat, method = "BEA_TSP")
    Heatmap(max(mat) - mat, name = "mat", 
            row_order = get_order(o, 1), column_order = get_order(o, 2))
    
    o1 = seriate(dist(mat), method = "TSP")
    o2 = seriate(dist(t(mat)), method = "TSP")
    Heatmap(mat, name = "mat", row_order = get_order(o1), column_order = get_order(o2))
    

    相关文章

      网友评论

        本文标题:[R语言] 与ggplot2相关有趣的包

        本文链接:https://www.haomeiwen.com/subject/qeplphtx.html