ggplot2进阶：统计+可视化

作者: 程凉皮儿 | 来源:发表于2020-06-30 08:45 被阅读0次

R语言入门——ggplot2
ggplot2进阶：统计+可视化
python--seaborn散点图
python--seaborn直方图
R语言可视化学习笔记之ggpubr包
python--seaborn盒形图
python--seaborn折线图
python--seaborn热力图
R for data science （第一章） ②
一饱眼福，高级的数据可视化案例来啦

Tidyverse course: ggplot Code

clp

29 June, 2020

前言

本教程包含tidyverse课程的ggplot部分幻灯片中显示的所有代码。并提供了课程中使用的练习的答案。

加载必要的包及内置数据集

library("tidyverse")
#> Warning: package 'ggplot2' was built under R version 3.6.2
#> Warning: package 'tibble' was built under R version 3.6.2
#> Warning: package 'tidyr' was built under R version 3.6.2
#> Warning: package 'purrr' was built under R version 3.6.2
#> Warning: package 'dplyr' was built under R version 3.6.2
library("ggplot2")
msleep
#> # A tibble: 83 x 11
#>    name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
#>    <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
#>  1 Chee… Acin… carni Carn… lc                  12.1      NA        NA      11.9
#>  2 Owl … Aotus omni  Prim… <NA>                17         1.8      NA       7  
#>  3 Moun… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6
#>  4 Grea… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1
#>  5 Cow   Bos   herbi Arti… domesticated         4         0.7       0.667  20  
#>  6 Thre… Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6
#>  7 Nort… Call… carni Carn… vu                   8.7       1.4       0.383  15.3
#>  8 Vesp… Calo… <NA>  Rode… <NA>                 7        NA        NA      17  
#>  9 Dog   Canis carni Carn… domesticated        10.1       2.9       0.333  13.9
#> 10 Roe … Capr… herbi Arti… lc                   3        NA        NA      21  
#> # … with 73 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>
class(msleep)
#> [1] "tbl_df"     "tbl"        "data.frame"

清洗数据

从msleep中删除NA值

msleep %>% filter(!is.na(vore)) -> msleep.clean
msleep.clean
#> # A tibble: 76 x 11
#>    name  genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
#>    <chr> <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
#>  1 Chee… Acin… carni Carn… lc                  12.1      NA        NA      11.9
#>  2 Owl … Aotus omni  Prim… <NA>                17         1.8      NA       7  
#>  3 Moun… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6
#>  4 Grea… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1
#>  5 Cow   Bos   herbi Arti… domesticated         4         0.7       0.667  20  
#>  6 Thre… Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6
#>  7 Nort… Call… carni Carn… vu                   8.7       1.4       0.383  15.3
#>  8 Dog   Canis carni Carn… domesticated        10.1       2.9       0.333  13.9
#>  9 Roe … Capr… herbi Arti… lc                   3        NA        NA      21  
#> 10 Goat  Capri herbi Arti… lc                   5.3       0.6      NA      18.7
#> # … with 66 more rows, and 2 more variables: brainwt <dbl>, bodywt <dbl>

绘制散点图(Scatterplot)

基础图形

ggplot(
  msleep.clean, 
  aes(x=bodywt, y=sleep_total)
)+geom_point() -> scatterplot

scatterplot

image.png

其实，我们可以直接从过滤通过管道符%>%进入ggplot，而不需要保存中间数据直接出图(这一波操作很秀！)。

msleep %>% 
  filter(!is.na(vore)) %>%
    ggplot(
      aes(x=bodywt, y=sleep_total)
  )+geom_point()

image.png

然后是各种美化策略

加点颜色

ggplot(
  msleep.clean, 
  aes(x=bodywt, y=sleep_total, colour=vore)
)+geom_point()

image.png

另一种加颜色的方法

ggplot(
  msleep.clean, 
  aes(x=bodywt, y=sleep_total)
)+geom_point(aes(colour=vore))

image.png

看上去这个数据比较集中在X轴的2000以内，现在这个显示不出更更多信息，要做到一图胜千言，那就要对X轴进行数据转换，常用的方法有取对数值(with log: log axis)

对X轴取取对数展示

ggplot(
  msleep.clean, 
  aes(x=bodywt, y=sleep_total, colour=vore)
)+geom_point() -> scatterplot

scatterplot+scale_x_log10()

image.png

一个字：秀！

另一个展示方法

ggplot(
  msleep.clean, 
  aes(x=log(bodywt), y=sleep_total,colour=vore)
)+geom_point()

image.png

展示更大的点、轴和图形标题

ggplot(
  msleep.clean, 
  aes(x=log(bodywt), y=sleep_total,colour=vore)
) +
  geom_point(size=4) +
  xlab("Log Body Weight") + 
  ylab("Total Hours Sleep") + 
  ggtitle("Some Sleep Data")  -> scatterplot

scatterplot

image.png

更换显示主题

theme_set(theme_bw(base_size=18))

scatterplot+theme(plot.title = element_text(hjust = 0.5)) -> scatterplot
scatterplot

image.png

更改x轴和y轴上的配色方案和刻度，并改进图例这将添加到先前的图形中，而不是重新创建它。

scatterplot +
  scale_colour_brewer(
    palette="Set1", 
    name="Trophic levels", 
    labels=c("Carnivore", "Herbivore", "Insectivore", "Omnivore")
  ) +
  scale_x_continuous(breaks=-5:10) +
  scale_y_continuous(breaks=seq(0,20, 2)) -> scatterplot
scatterplot

image.png

手动更改颜色

scatterplot +
  scale_color_manual(
    values=c("chocolate3", "chartreuse3", "darkorchid2","cyan3"),
    name="Trophic levels", 
    labels=c("Carnivore", "Herbivore", "Insectivore", "Omnivore")
  ) -> scatterplot
## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.
scatterplot

image.png

超喜欢这个“Insectivore”颜色

接下来就是练习题了

练习1

文件up_down_expression sion.txt包含一个表达矩阵，该数据集带有一个额外的列，该列将行分类为3组(上调、下调或不变)。加载up_down_expression sion.txt检查下文件的结构并绘制散点图geom_point()：红色表示上调，蓝色下调灰色不变， - 主标题：表达数据 - 颜色图例：下调，不变，上调 - 轴标签：条件1和条件2

expression <- read_tsv("up_down_expression.txt")
expression
#> # A tibble: 5,196 x 4
#>    Gene       Condition1 Condition2 State     
#>    <chr>           <dbl>      <dbl> <chr>     
#>  1 A4GNT          -3.68      -3.44  unchanging
#>  2 AAAS            4.55       4.39  unchanging
#>  3 AASDH           3.72       3.48  unchanging
#>  4 AATF            5.08       5.02  unchanging
#>  5 AATK            0.471      0.560 unchanging
#>  6 AB015752.4     -3.68      -3.59  unchanging
#>  7 ABCA7           3.45       3.83  unchanging
#>  8 ABCA9-AS1      -3.68      -3.59  unchanging
#>  9 ABCC11         -3.53      -1.86  unchanging
#> 10 ABCC3           0.931      3.26  up        
#> # … with 5,186 more rows

expression.scatter<-ggplot(expression, aes(Condition1, Condition2, colour=State))+
  geom_point()+
  scale_colour_manual(values=c("blue", "grey", "red"),
                      name="State", 
    labels=c("Down", "Unchanging", "Up"))+
  xlab("Condition 1") + 
  ylab("Condition 2") + 
  ggtitle("Expression data")+
  theme(plot.title = element_text(hjust = 0.5))

expression.scatter

image.png

现在，让我们尝试另一种类型的图：条形图。它类似于散点图，但x变量本质上是定性的或绝对的。

Stripchart

ggplot(
    msleep.clean, 
    aes(vore, sleep_total)
  )+geom_point()

image.png

抖动，变大，上色 jitter, bigger points and colours

ggplot(
  msleep.clean,
  aes(vore,sleep_total, colour=vore)
) + geom_point(size=4,position="jitter")

image.png

调节抖动的范围

ggplot(
  msleep.clean, 
  aes(vore, sleep_total, colour=vore)
) +
  geom_jitter(
    width = .2,
    size=4
  ) -> stripchart

stripchart

image.png

为平均值添加一条线，并为y轴添加标题

stripchart +
  stat_summary(
    fun.y="mean",
    geom='errorbar', 
    aes(ymin=..y.., ymax=..y..), 
    width=0.6, 
    size=1.5,
    colour="grey25"
  ) -> stripchart
#> Warning: `fun.y` is deprecated. Use `fun` instead.

stripchart

image.png

箱式图的雏形

一小段计算平均值(mean)和标准误(SEm)的tidyverse式的代码

msleep.clean %>%
  group_by(vore) %>%
    summarise(sleep=mean(sleep_total), sem=sd(sleep_total)/sqrt(n()))
#> # A tibble: 4 x 3
#>   vore    sleep   sem
#>   <chr>   <dbl> <dbl>
#> 1 carni   10.4  1.07 
#> 2 herbi    9.51 0.862
#> 3 insecti 14.9  2.65 
#> 4 omni    10.9  0.659

继续美化

stripchart +
  ylab("Total Hours Sleep") +
  xlab("Trophic Levels") +
  ggtitle("Some Sleep Data") +
  scale_y_continuous(breaks=seq(0, 20, 2)) +
  scale_x_discrete(labels=c("Carnivore", "Herbivore", "Insectivore", "Omnivore")) +
  theme(legend.position = "none") -> stripchart

stripchart

image.png

同前

stripchart +
  scale_colour_brewer(palette="Dark2")+
  scale_x_discrete(
    limit=c("insecti","omni","carni", "herbi"),
    labels=c("Insectivore", "Herbivore", "Carnivore", "Omnivore"))+
      theme(plot.title = element_text(hjust = 0.5)
  ) -> stripchart
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
stripchart

image.png

library("ggthemes")
## Warning: package 'ggthemes' was built under R version 3.5.3
stripchart+
  theme_wsj()+
  scale_colour_wsj("colors6")+
  theme(legend.position = "none")+
  theme(plot.title = element_text(hjust = 0.5))

image.png

## Scale for 'colour' is already present. Adding another scale for
## 'colour', which will replace the existing scale.

现在，让我们尝试一些其他的数据。DownloadFestival数据记录了为期三天的音乐节期间810名音乐会观众的hygiene scores(0-5)。

读入数据进行预处理

read_csv("DownloadFestival.csv") -> festival.data
festival.data
#> # A tibble: 810 x 5
#>    ticknumb gender  day1  day2   day3
#>       <dbl> <chr>  <dbl> <dbl>  <dbl>
#>  1     2111 Male    2.64  1.35  1.61 
#>  2     2229 Female  0.97  1.41  0.290
#>  3     2338 Male    0.84 NA    NA    
#>  4     2384 Female  3.03 NA    NA    
#>  5     2401 Female  0.88  0.08 NA    
#>  6     2405 Male    0.85 NA    NA    
#>  7     2467 Female  1.56 NA    NA    
#>  8     2478 Female  3.02 NA    NA    
#>  9     2490 Male    2.29 NA    NA    
#> 10     2504 Female  1.11  0.44  0.55 
#> # … with 800 more rows
max(festival.data$day1)
#> [1] 3.69

Histogram

ggplot(
  festival.data, 
  aes(day1)
)+geom_histogram()

image.png

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

稍美化一下

ggplot(
  festival.data, 
  aes(day1)
)+geom_histogram(binwidth=0.3)

image.png

进一步美化

ggplot(
  festival.data, 
  aes(day1)
)+geom_histogram(binwidth=0.3, color="black", fill="yellow")+
  labs(x="Score", y="Counts")+
  theme(plot.title = element_text(hjust = 0.5))+
  ggtitle("Hygiene at Day 1") -> Day1Histogram
Day1Histogram

image.png

现在我们想要画出所有3天每个性别情况。所以对数据进行清洗(reshape)。我们还将删除NAs。

festival.data %>%
  gather(day,score,-ticknumb,-gender) -> festival.data.stack

festival.data.stack %>% filter(!is.na(score)) -> festival.data.stack
festival.data.stack
#> # A tibble: 1,197 x 4
#>    ticknumb gender day   score
#>       <dbl> <chr>  <chr> <dbl>
#>  1     2111 Male   day1   2.64
#>  2     2229 Female day1   0.97
#>  3     2338 Male   day1   0.84
#>  4     2384 Female day1   3.03
#>  5     2401 Female day1   0.88
#>  6     2405 Male   day1   0.85
#>  7     2467 Female day1   1.56
#>  8     2478 Female day1   3.02
#>  9     2490 Male   day1   2.29
#> 10     2504 Female day1   1.11
#> # … with 1,187 more rows

ggplot(festival.data.stack,aes(score))+
  geom_histogram(binwidth=0.3, color="black", fill="yellow")+
  labs(x="Hygiene score", y="Counts")+
  facet_grid(gender~day) -> histogram.3days
histogram.3days

image.png

可以修改小分面(facets)的标签。下面是一些示例。

histogram.3days<-ggplot(festival.data.stack,aes(score))+
  geom_histogram(binwidth=0.3, color="black", fill="yellow")+
  labs(x="Hygiene score", y="Counts")+
  facet_grid(gender~day)+
  theme(strip.text.x = element_text(size = 16, colour = "purple", face="bold"),
        strip.text.y = element_text(size=12, face="bold"))
histogram.3days

image.png

密度图

density.3days<-ggplot(festival.data.stack, aes(score))+
  geom_density(aes(group=day, fill=day), alpha=0.5)+
  facet_grid(~gender)
density.3days

image.png

练习 2: Plot a stripchart representing all 3 days and each gender

stripchart <-ggplot(festival.data.stack, aes(gender, score, colour=gender))+ 
    facet_grid(~day)+
    geom_point(position="jitter")+
    scale_colour_manual(values=c("darkorange", "darkorchid4"))+ 
    stat_summary(geom='errorbar',fun.y=mean, aes(ymin=..y.., ymax=..y..), 
        colour="black", width=0.8, size=1.5)+
            labs(x="Gender", y="Score")+
            theme(legend.position = "none")
#> Warning: `fun.y` is deprecated. Use `fun` instead.
stripchart

image.png

从条形图中，我们可以为平均值或任何其他描述性地理位置添加一条线作为统计汇总。

具体操作分2步：

stripchart<-ggplot(festival.data.stack, aes(gender, score,colour=gender))+facet_grid(~day)+
  geom_point(position="jitter")+
  scale_colour_manual(values=c("darkorange", "darkorchid4"))+
  labs(x="Gender", y="Score")+
  theme(legend.position = "none")

stripchart

image.png

加均值线

stripchart+
  stat_summary(fun.y="mean",geom="errorbar", aes(ymin=..y.., ymax=..y..), width=0.8, colour="black", size = 1.3)
#> Warning: `fun.y` is deprecated. Use `fun` instead.

image.png

加个框框

stripchart+
  geom_boxplot(alpha=0, colour="black")

image.png

进一步美化

stripchart+
  geom_boxplot(aes(gender, score, fill=gender), alpha=0.5, colour="black")+
    scale_fill_manual(values=c("darkorange", "darkorchid4"))

image.png

说到把图表做得更漂亮，我们可以改进箱式图(boxplot)。

如果需要，我们可以更改x轴上的顺序：boxplot+scale_x_discrete(limits=c(“Male”,“Female”))

boxplot<-ggplot(festival.data.stack, aes(gender,score))+
  geom_boxplot()+
  facet_grid(~day)

boxplot

image.png

boxplot <-ggplot(festival.data.stack, aes(gender,score, fill=gender))+
    facet_grid(~day)+
    stat_boxplot(geom="errorbar", width=0.5)+   
    geom_boxplot(outlier.shape=8)+
    theme(legend.position = "none")+
    scale_fill_manual(values=c("sienna1","darkorchid3 "))+
    labs(x="Gender", y="Score")
boxplot

image.png

Violinplot (beanplot) 小提琴图

stripchart+
  geom_violin(alpha=0, colour="black")

image.png

基础款

violinplot<-ggplot(festival.data.stack, aes(gender,score))+geom_violin()+facet_grid(~day)
violinplot

image.png

美化版

violinplot<-ggplot(festival.data.stack, aes(gender,score,fill=gender))+
        facet_grid(~day)+
        geom_violin(trim = FALSE)+
        scale_fill_manual(values=c("goldenrod2","darkgrey"))+
        theme(legend.position="none")+
        stat_summary(fun.y=median, geom="point", size=2, color="black")+
        labs(x="Gender", y="Hygiene scores")
#> Warning: `fun.y` is deprecated. Use `fun` instead.
violinplot

image.png

叠加箱式图

violinplot+geom_boxplot(width=0.3)

image.png

violinplot+geom_jitter(width=0.1,size=1, shape=1)

image.png

Barchart 柱状图：首先，我们要计算平均值和sem，并将这些值存储在一个文件中。

festival.data.stack %>%
    group_by(gender,day) %>%
      summarise(mean=mean(score), sem=sd(score)/sqrt(n())) -> score.sem

score.sem
#> # A tibble: 6 x 4
#> # Groups:   gender [2]
#>   gender day    mean    sem
#>   <chr>  <chr> <dbl>  <dbl>
#> 1 Female day1  1.88  0.0316
#> 2 Female day2  1.08  0.0608
#> 3 Female day3  1.10  0.0990
#> 4 Male   day1  1.60  0.0362
#> 5 Male   day2  0.773 0.0585
#> 6 Male   day3  0.829 0.0721

barchart<-ggplot(score.sem, aes(day,mean, fill=gender))+
  geom_bar(stat="identity")
barchart

image.png

加误差线

barchart<-ggplot(score.sem, aes(day,mean, fill=gender))+
  geom_bar(stat="identity", position="dodge")+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem), position="dodge")
barchart

image.png

barchart<-ggplot(score.sem, aes(day,mean, fill=gender))+
  geom_bar(position="dodge", stat="identity")+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem), position="dodge")
barchart

image.png

美化

barchart<-ggplot(score.sem, aes(day,mean, fill=gender))+
  geom_bar(position="dodge", colour="black",stat="identity",size=1)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem), width=.5, position=position_dodge(0.9), size=1)+
  ylab("Mean scores")+ 
  ggtitle("Levels of hygiene over 3 days of concert")+
  theme(plot.title = element_text(hjust = 0.5))+
  theme(plot.title = element_text(size = 19))+
  theme(axis.title.x=element_blank())+
  scale_fill_manual(values=c("darkorange3", "darkorchid4"), name="Gender")
barchart

image.png

Linegraph 折线图

linegraph<-ggplot(score.sem, aes(day, mean, group=gender))+
    geom_line()+
    geom_point()+
    geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem))

linegraph

image.png

美化

linegraph<-ggplot(score.sem, aes(day,mean, colour=gender, group=gender))+
  geom_line(size=1.5)+
  geom_point(size=4)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem), width=.2, size=1.5)
linegraph

image.png

进一步美化

linegraph<-ggplot(score.sem, aes(day,mean, colour=gender, group=gender))+
  geom_line(size=1.5)+
  geom_point(size=5)+
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem), width=.2, size=1.5)+
  labs(x="", y="Mean scores")+
  scale_y_continuous(breaks=seq(0, 2, 0.2))+
  ggtitle("Levels of hygiene over 3 days of concert")+
  theme(plot.title = element_text(hjust = 0.5))+
  scale_colour_manual(values=c("purple","darkorange3"), name="")+
  theme(legend.position = c(0.85, 0.9))+
  theme(legend.text=element_text(size=14))+
  theme(legend.background = element_rect(fill = "transparent"))

linegraph

image.png

练习 3 该文件包含3个不同数据集(一个WT和两个mutants)的positional count数据。绘制一个图,显示同一图下的所有3个数据集先读Chrometic_position_data.txt检查文件的结构, 用gather()将文件从宽格式重新构造为长格式, 重命名列：Gentype和Value绘制基本折线图.

chromosome<-read_tsv("chromosome_position_data.txt")
chromosome
#> # A tibble: 184 x 4
#>    Position  Mut1  Mut2    WT
#>       <dbl> <dbl> <dbl> <dbl>
#>  1 91757273  2.71  1.34  1.25
#>  2 91757323  2.71  1.3   1.25
#>  3 91757373  5.41  1.14  1.25
#>  4 91757423  2.71  1.58  1.88
#>  5 91757473  2.71  1.19  1.25
#>  6 91757523  2.71  2.82  2.5 
#>  7 91757573  2.71  3.15  3.75
#>  8 91757623  0     4.05  3.75
#>  9 91757673  0     2.94  3.12
#> 10 91757723  0     2.69  3.12
#> # … with 174 more rows
chromosome %>%
  gather(Genotype, Value,-Position) -> chromosome.long
chromosome.long
#> # A tibble: 552 x 3
#>    Position Genotype Value
#>       <dbl> <chr>    <dbl>
#>  1 91757273 Mut1      2.71
#>  2 91757323 Mut1      2.71
#>  3 91757373 Mut1      5.41
#>  4 91757423 Mut1      2.71
#>  5 91757473 Mut1      2.71
#>  6 91757523 Mut1      2.71
#>  7 91757573 Mut1      2.71
#>  8 91757623 Mut1      0   
#>  9 91757673 Mut1      0   
#> 10 91757723 Mut1      0   
#> # … with 542 more rows
chromosome.linegraph<-ggplot(chromosome.long, aes(x=Position, y=Value, group=Genotype, colour=Genotype))+
geom_line(size=2)
chromosome.linegraph

image.png

画一张图表，显示一个典型婴儿在出生后9个月的年龄和体重之间的关系。读入数据weight_chart.txt检查文件结构绘制基本折线图绘制更漂亮的版本：更改点的大小和颜色更改线的粗细和颜色更改y轴：比例从2 kg更改为10 kg 更改x轴：比例从0 t 10个月更改两个轴上的标签向图表添加标题

weight<-read_tsv("weight_chart.txt")
weight
#> # A tibble: 10 x 2
#>      Age Weight
#>    <dbl>  <dbl>
#>  1     0    3.6
#>  2     1    4.4
#>  3     2    5.2
#>  4     3    6  
#>  5     4    6.6
#>  6     5    7.2
#>  7     6    7.8
#>  8     7    8.4
#>  9     8    8.8
#> 10     9    9.2
#基础
weight.linegraph<-ggplot(weight, aes(Age, Weight))+
  geom_line()+
  geom_point()
weight.linegraph

image.png

#美化
weight.linegraph<-ggplot(weight, aes(Age, Weight))+
 geom_line(size=1, colour="lightblue2")+
 geom_point(shape=16, size=3, colour="darkorchid1")+
  scale_y_continuous(breaks=2:10, limits = c(2, 10))+
  scale_x_continuous(breaks=0:10, limits = c(0, 10))+
  labs(x="Age (months)", y="Weight (kg)")+
  ggtitle("Relation between age and weight")+
  theme(plot.title = element_text(hjust = 0.5))
weight.linegraph

image.png

练习 5 文件brain_bodyweight.txt包含一系列物种的log10大脑和体重数据，以及每个点的SEM测量结果。将这些数据绘制在带有误差条的散点图上，显示每个点下的平均值+/-SEM和数据集的名称。读入brain_bodyweitt.txt检查文件的结构，绘制一个基本的图形。绘制一个更漂亮的版本，对于水平误差线需要更改：geom_barh()，对于标签需要更改：geom_text()。

由于从教程中获取的.txt文档有点问题，所以费了点时间

options(stringsAsFactors = F)
brain.bodyweight<- read.csv("brain_bodyweight.txt",sep=',',header = T)
brain.bodyweight
#>             Species Bodyweight Brainweight Bodyweight.SEM Brainweight.SEM  X
#> 1               Cow      2.670       2.630        0.12500         0.12700 NA
#> 2              Goat      1.440       2.060        0.01930         0.14200 NA
#> 3        Guinea Pig      0.017       0.740        0.00163         0.03590 NA
#> 4        Diplodocus      4.070       1.700        0.10100         0.13400 NA
#> 5             Horse      2.720       2.820        0.00367         0.16900 NA
#> 6               Cat      0.519       1.410        0.02920         0.10200 NA
#> 7           Gorilla      2.320       2.610        0.02680         0.24300 NA
#> 8             Human      1.790       3.120        0.03610         0.23100 NA
#> 9  African Elephant      3.820       3.760        0.11200         0.32900 NA
#> 10    Rhesus Monkey      0.833       2.250        0.04650         0.14600 NA
#> 11         Kangaroo      1.540       1.750        0.01750         0.05870 NA
#> 12          Hamster     -0.921       0.100        0.02740         0.00981 NA
#> 13            Mouse     -1.640      -0.398        0.06860         0.02340 NA
#> 14           Rabbit      0.398       1.080        0.01150         0.06690 NA
#> 15            Sheep      1.740       2.240        0.04100         0.13600 NA
#> 16       Chimpanzee      1.720       2.640        0.01640         0.23800 NA
#> 17    Brachiosaurus      4.940       2.190        0.26900         0.11200 NA
#> 18              Rat     -0.553       0.279        0.03370         0.00188 NA
#> 19             Mole     -0.914       0.477        0.06170         0.04770 NA
#> 20              Pig      2.280       2.260        0.01330         0.16000 NA
brain.bodyweight=brain.bodyweight[,-6]
brain.bodyweight
#>             Species Bodyweight Brainweight Bodyweight.SEM Brainweight.SEM
#> 1               Cow      2.670       2.630        0.12500         0.12700
#> 2              Goat      1.440       2.060        0.01930         0.14200
#> 3        Guinea Pig      0.017       0.740        0.00163         0.03590
#> 4        Diplodocus      4.070       1.700        0.10100         0.13400
#> 5             Horse      2.720       2.820        0.00367         0.16900
#> 6               Cat      0.519       1.410        0.02920         0.10200
#> 7           Gorilla      2.320       2.610        0.02680         0.24300
#> 8             Human      1.790       3.120        0.03610         0.23100
#> 9  African Elephant      3.820       3.760        0.11200         0.32900
#> 10    Rhesus Monkey      0.833       2.250        0.04650         0.14600
#> 11         Kangaroo      1.540       1.750        0.01750         0.05870
#> 12          Hamster     -0.921       0.100        0.02740         0.00981
#> 13            Mouse     -1.640      -0.398        0.06860         0.02340
#> 14           Rabbit      0.398       1.080        0.01150         0.06690
#> 15            Sheep      1.740       2.240        0.04100         0.13600
#> 16       Chimpanzee      1.720       2.640        0.01640         0.23800
#> 17    Brachiosaurus      4.940       2.190        0.26900         0.11200
#> 18              Rat     -0.553       0.279        0.03370         0.00188
#> 19             Mole     -0.914       0.477        0.06170         0.04770
#> 20              Pig      2.280       2.260        0.01330         0.16000

brain.bodyweight.graph<-ggplot(brain.bodyweight, aes(x=Bodyweight, y=Brainweight))+
  geom_point()+
  geom_errorbar(aes(ymin=Brainweight-Brainweight.SEM, ymax=Brainweight+Brainweight.SEM))+
  geom_errorbarh(aes(xmin=Bodyweight-Bodyweight.SEM, xmax=Bodyweight+Bodyweight.SEM))+
  geom_text(aes(label=Species), hjust = 1.05, vjust = -0.6, size=2.7)
brain.bodyweight.graph

image.png


brain.bodyweight.graph<-ggplot(brain.bodyweight, aes(x=Bodyweight, y=Brainweight))+
  geom_point()+
  geom_errorbar(aes(ymin=Brainweight-Brainweight.SEM, ymax=Brainweight+Brainweight.SEM), width=.1, size=1, colour="tomato3")+
  geom_errorbarh(aes(xmin=Bodyweight-Bodyweight.SEM, xmax=Bodyweight+Bodyweight.SEM), height=.1, size=1, colour="tomato3")+
  geom_point(size=2)+
  geom_text(aes(label=Species), hjust = 1.1, vjust = -0.6, size=2.7)
brain.bodyweight.graph

image.png

进一步美化

library("ggrepel")
## Warning: package 'ggrepel' was built under R version 3.5.3
ggplot(brain.bodyweight, aes(x=Bodyweight, y=Brainweight))+
  geom_errorbar(aes(ymin=Brainweight-Brainweight.SEM, ymax=Brainweight+Brainweight.SEM), 
width=.1, size=0.5, colour="grey28")+
  geom_errorbarh(aes(xmin=Bodyweight-Bodyweight.SEM, xmax=Bodyweight+Bodyweight.SEM), 
height=.1, size=0.5, colour="grey28")+
  geom_point(shape=21, size=3, colour="black", fill="maroon3")+
  geom_label_repel(aes(label = Species), box.padding=0.6, point.padding =0.5, 
fill="mintcream", segment.colour="grey", size=3) -> brain.bodyweight.graph
brain.bodyweight.graph

image.png

Stacked bar: categorical data 堆叠柱状图

Changing<-read_csv("Changing.csv")
Changing
#> # A tibble: 60 x 3
#>    Type.of.Behaviour Sample.Size Stage.of.Change 
#>    <chr>                   <dbl> <chr>           
#>  1 Smoking cessation         108 Precontemplation
#>  2 Smoking cessation         187 Contemplation   
#>  3 Smoking cessation           0 Preparation     
#>  4 Smoking cessation         134 Action          
#>  5 Smoking cessation         247 Maintenance     
#>  6 Quitting cocaine            8 Precontemplation
#>  7 Quitting cocaine           15 Contemplation   
#>  8 Quitting cocaine            0 Preparation     
#>  9 Quitting cocaine           71 Action          
#> 10 Quitting cocaine           62 Maintenance     
#> # … with 50 more rows

stackedBar<-ggplot(Changing, aes(Type.of.Behaviour, Sample.Size, fill=Stage.of.Change))+
geom_bar(stat="identity")
stackedBar

image.png

更改比较的顺序：factor(variable name, levels = c(“”, “” .)) 。旋转图表以读取x轴标签：coord_flip()

Changing$Stage.of.Change <- factor(Changing$Stage.of.Change, levels = c("Maintenance","Action","Preparation","Contemplation","Precontemplation"))

stackedBar<-ggplot(Changing, aes(Type.of.Behaviour, Sample.Size, fill = Stage.of.Change))+
  geom_bar(stat="identity", colour="black")+
  coord_flip()
stackedBar

image.png

进一步美化

stackedBar<-stackedBar+
  labs(title="Stages for Each of the 12 Problem Behaviours", y="Sample Size", fill="Stages of Change")+
  theme(plot.title = element_text(hjust = 0.5, size=12, face="bold"))+
  theme(axis.title.y=element_blank())+
  scale_fill_brewer(palette = 4)+
  theme(axis.text.x = element_text(size=10), axis.text.y = element_text(size=9))+
   theme(legend.text=element_text(size=8), legend.title=element_text(size=10, face="bold"))+
  theme(axis.title.x = element_text(size=10))
stackedBar

image.png

提高对比度

stackedBar+scale_fill_brewer(palette="RdYlGn", direction=-1)

image.png

练习 6 让我们将相同的数据绘制为百分比，将变化的数据绘制为百分比。将文件格式更改为应急xtabs() 计算百分比prop.table() 将格式更改为dataframe as.data.frame() 检查前几行head() 像以前一样使用不同的调色板绘制数据

contingency.table100<-prop.table(xtabs(Sample.Size~Type.of.Behaviour+Stage.of.Change, Changing),1)*100
contingency.table100
#>                         Stage.of.Change
#> Type.of.Behaviour        Maintenance    Action Preparation Contemplation
#>   Adolescent delinquency   25.786164 27.044025    0.000000     28.930818
#>   Condom use               35.294118  6.191950    0.000000     17.956656
#>   Exercise acquisition     19.386332 14.086471   25.383543     33.751743
#>   High fat diet            56.666667  2.777778    0.000000     17.777778
#>   Mammography screening    42.553191 18.439716    0.000000     17.021277
#>   Physicians'practices     49.629630  1.481481    2.222222     14.814815
#>   Quitting cocaine         39.743590 45.512821    0.000000      9.615385
#>   Radon gas exposure        0.000000  8.166189    0.000000     17.335244
#>   Safer sex                 0.000000 47.887324    0.000000      7.981221
#>   Smoking cessation        36.538462 19.822485    0.000000     27.662722
#>   Sunscreen use            35.242291  4.405286    0.000000      7.929515
#>   Weight control           14.634146 17.886179    0.000000     52.845528
#>                         Stage.of.Change
#> Type.of.Behaviour        Precontemplation
#>   Adolescent delinquency        18.238994
#>   Condom use                    40.557276
#>   Exercise acquisition           7.391911
#>   High fat diet                 22.777778
#>   Mammography screening         21.985816
#>   Physicians'practices          31.851852
#>   Quitting cocaine               5.128205
#>   Radon gas exposure            74.498567
#>   Safer sex                     44.131455
#>   Smoking cessation             15.976331
#>   Sunscreen use                 52.422907
#>   Weight control                14.634146

Changing.percent<-as.data.frame(contingency.table100)
Changing.percent
#>         Type.of.Behaviour  Stage.of.Change      Freq
#> 1  Adolescent delinquency      Maintenance 25.786164
#> 2              Condom use      Maintenance 35.294118
#> 3    Exercise acquisition      Maintenance 19.386332
#> 4           High fat diet      Maintenance 56.666667
#> 5   Mammography screening      Maintenance 42.553191
#> 6    Physicians'practices      Maintenance 49.629630
#> 7        Quitting cocaine      Maintenance 39.743590
#> 8      Radon gas exposure      Maintenance  0.000000
#> 9               Safer sex      Maintenance  0.000000
#> 10      Smoking cessation      Maintenance 36.538462
#> 11          Sunscreen use      Maintenance 35.242291
#> 12         Weight control      Maintenance 14.634146
#> 13 Adolescent delinquency           Action 27.044025
#> 14             Condom use           Action  6.191950
#> 15   Exercise acquisition           Action 14.086471
#> 16          High fat diet           Action  2.777778
#> 17  Mammography screening           Action 18.439716
#> 18   Physicians'practices           Action  1.481481
#> 19       Quitting cocaine           Action 45.512821
#> 20     Radon gas exposure           Action  8.166189
#> 21              Safer sex           Action 47.887324
#> 22      Smoking cessation           Action 19.822485
#> 23          Sunscreen use           Action  4.405286
#> 24         Weight control           Action 17.886179
#> 25 Adolescent delinquency      Preparation  0.000000
#> 26             Condom use      Preparation  0.000000
#> 27   Exercise acquisition      Preparation 25.383543
#> 28          High fat diet      Preparation  0.000000
#> 29  Mammography screening      Preparation  0.000000
#> 30   Physicians'practices      Preparation  2.222222
#> 31       Quitting cocaine      Preparation  0.000000
#> 32     Radon gas exposure      Preparation  0.000000
#> 33              Safer sex      Preparation  0.000000
#> 34      Smoking cessation      Preparation  0.000000
#> 35          Sunscreen use      Preparation  0.000000
#> 36         Weight control      Preparation  0.000000
#> 37 Adolescent delinquency    Contemplation 28.930818
#> 38             Condom use    Contemplation 17.956656
#> 39   Exercise acquisition    Contemplation 33.751743
#> 40          High fat diet    Contemplation 17.777778
#> 41  Mammography screening    Contemplation 17.021277
#> 42   Physicians'practices    Contemplation 14.814815
#> 43       Quitting cocaine    Contemplation  9.615385
#> 44     Radon gas exposure    Contemplation 17.335244
#> 45              Safer sex    Contemplation  7.981221
#> 46      Smoking cessation    Contemplation 27.662722
#> 47          Sunscreen use    Contemplation  7.929515
#> 48         Weight control    Contemplation 52.845528
#> 49 Adolescent delinquency Precontemplation 18.238994
#> 50             Condom use Precontemplation 40.557276
#> 51   Exercise acquisition Precontemplation  7.391911
#> 52          High fat diet Precontemplation 22.777778
#> 53  Mammography screening Precontemplation 21.985816
#> 54   Physicians'practices Precontemplation 31.851852
#> 55       Quitting cocaine Precontemplation  5.128205
#> 56     Radon gas exposure Precontemplation 74.498567
#> 57              Safer sex Precontemplation 44.131455
#> 58      Smoking cessation Precontemplation 15.976331
#> 59          Sunscreen use Precontemplation 52.422907
#> 60         Weight control Precontemplation 14.634146

绘制百分比图

stackedBar.percent<-ggplot(Changing.percent,aes(Type.of.Behaviour, Freq, fill = Stage.of.Change))+
        geom_bar(stat="identity",colour="black")+
    coord_flip()+
        scale_fill_brewer(palette = "Spectral", direction=-1)+
        labs(title="Stages for Each of the 12 Problem Behaviours", y="Frequencies")+
    theme(axis.title.y=element_blank())+
        theme(plot.title = element_text(hjust = 0.5, size=12, face="bold"))+
    theme(axis.text.x = element_text(size=10), axis.text.y = element_text(size=9))+
    theme(legend.text=element_text(size=8), legend.title=element_text(size=10, face="bold"))+
    theme(axis.title.x = element_text(size=10))
stackedBar.percent

image.png

保存图片

stackedBar_percent <-ggsave(stackedBar.percent, file="stackedBar_percent.png")

本教程所用数据部分来自http://www.bioinformatics.babraham.ac.uk/training.html

R语言入门——ggplot2
常用可视化R包 ggplot2 统计变换总结
ggplot2进阶：统计+可视化
Tidyverse course: ggplot Code clp 29 June, 2020 前言本教程包含t...
python--seaborn散点图
seaborn是专门用于统计数据可视化的包，可媲美R语言中的ggplot2包。本文介绍用seaborn绘制散点图。...
python--seaborn直方图
seaborn是专门用于统计数据可视化的包，可媲美R语言中的ggplot2包。本文介绍用seaborn绘制直方图。...
R语言可视化学习笔记之ggpubr包
Hadley Wickham创建的可视化包ggplot2可以流畅地进行优美的可视化，但是如果要通过ggplot2定...
python--seaborn盒形图
seaborn是专门用于统计数据可视化的包，可媲美R语言中的ggplot2包。本文介绍用seaborn绘制盒形图。...
python--seaborn折线图
seaborn是专门用于统计数据可视化的包，可媲美R语言中的ggplot2包。本文介绍用seaborn绘制折线图。...
python--seaborn热力图
seaborn是专门用于统计数据可视化的包，可媲美R语言中的ggplot2包。本文介绍用seaborn绘制热力图。...
R for data science （第一章） ②
使用ggplot2进行数据可视化①使用ggplot2进行数据可视化② 添加其他变量的一种方法是aesthetics...
一饱眼福，高级的数据可视化案例来啦
可视化的进阶历史渊源，古有拿破仑行军图： 1854 年宽街霍乱爆发事件地图将数据化成可视化统计图优点多多，能够直...