散点图是一种常用的数据可视化工具,用于展示两个变量之间的关系。它通过在坐标轴上绘制一系列点来表示数据集中的个体观测值,其中每个点的位置由两个变量的值确定。
添加平滑曲线
平滑曲线是通过数据点之间的平滑拟合来创建的曲线,其目的是更好地显示趋势、模式或关系,而不是强调单个数据点的波动。在统计学和数据可视化中,平滑曲线常用于揭示数据中的潜在模式,减少噪音和突发的波动。使用loess函数进行局部多项式平滑。
library(ggplot2)
library(cowplot)
library(tidyverse)
library(ggsci)
data("diamonds")
set.seed(123)
small_diamonds <- sample_n(diamonds, size = 1000)
p1 <- ggplot(data = small_diamonds,
aes(x = carat, y = price)) +
geom_point(shape = 21, size = 4,
color = 'black', aes(fill = cut)) +
geom_smooth(method = 'lm') +
scale_fill_npg() +
labs(title = 'point plot',
x = 'weight of the diamond ',
y = 'price in US dollars',
fill = 'quality of the cut') +
scale_x_continuous(breaks = seq(0,3,0.5)) +
scale_y_continuous(breaks = seq(0, 15000, 5000),
labels = c('0', '5K', '10K', '15K')) +
theme_test() +
theme(plot.title = element_text(hjust = 0.5),
legend.background = element_blank(),
legend.position = c(0.25, 0.76)) +
ggtitle('lm')
p2 <- ggplot(data = small_diamonds,
aes(x = carat, y = price)) +
geom_point(shape = 21, size = 4,
color = 'black', aes(fill = cut)) +
geom_smooth(method = 'loess') +
scale_fill_npg() +
labs(title = 'point plot',
x = 'weight of the diamond ',
y = 'price in US dollars',
fill = 'quality of the cut') +
scale_x_continuous(breaks = seq(0,3,0.5)) +
scale_y_continuous(breaks = seq(0, 15000, 5000),
labels = c('0', '5K', '10K', '15K')) +
theme_test() +
theme(plot.title = element_text(hjust = 0.5),
legend.background = element_blank(),
legend.position = c(0.25, 0.76)) +
ggtitle('loess')
p3 <- ggplot(data = small_diamonds,
aes(x = carat, y = price)) +
geom_point(shape = 21, size = 4,
color = 'black', aes(fill = cut)) +
geom_smooth(method = 'gam') +
scale_fill_npg() +
labs(title = 'point plot',
x = 'weight of the diamond ',
y = 'price in US dollars',
fill = 'quality of the cut') +
scale_x_continuous(breaks = seq(0,3,0.5)) +
scale_y_continuous(breaks = seq(0, 15000, 5000),
labels = c('0', '5K', '10K', '15K')) +
theme_test() +
theme(plot.title = element_text(hjust = 0.5),
legend.background = element_blank(),
legend.position = c(0.25, 0.76)) +
ggtitle('gam')
p4 <- ggplot(data = small_diamonds,
aes(x = carat, y = price, fill = cut)) +
geom_point(shape = 21, size = 4, color = 'black') +
geom_smooth() +
scale_fill_npg() +
labs(title = 'point plot',
x = 'weight of the diamond ',
y = 'price in US dollars',
fill = 'quality of the cut') +
scale_x_continuous(breaks = seq(0,3,0.5)) +
scale_y_continuous(breaks = seq(0, 15000, 5000),
labels = c('0', '5K', '10K', '15K')) +
theme_test() +
theme(plot.title = element_text(hjust = 0.5),
legend.background = element_blank(),
legend.position = c(0.25, 0.76)) +
ggtitle('ggplot(aes(fill = cut))')
![](https://img.haomeiwen.com/i27313279/f817ec2979cc0057.png)
添加相关系数
相关系数是一种用于衡量两个变量之间线性关系强弱的统计量。它表示了两个变量之间的关联程度,其取值范围在 -1 到 1 之间。相关系数越接近1,表示两个变量正相关(正向关系,一个变量增加,另一个也增加),越接近-1表示两个变量负相关(负向关系,一个变量增加,另一个减少),而接近0则表示两个变量之间没有线性关系。
# 先计算相关系数
cor.test(small_diamonds$carat, small_diamonds$price, method = "pearson")
ggplot(data = small_diamonds,
aes(x = carat, y = price)) +
geom_point(shape = 21, size = 4,
color = 'black', aes(fill = cut)) +
geom_smooth(method = 'lm') +
scale_fill_npg() +
labs(x = 'weight of the diamond ',
y = 'price in US dollars',
fill = 'quality of the cut') +
scale_x_continuous(breaks = seq(0,3,0.5)) +
scale_y_continuous(breaks = seq(0, 15000, 5000),
labels = c('0', '5K', '10K', '15K')) +
theme_test() +
theme(plot.title = element_text(hjust = 0.5),
legend.background = element_blank(),
legend.position = c(0.15, 0.76)) +
annotate("text", x = 2, y = 10,
label = "Pearson's r = 0.92; Pvalue < 2.2e-16")
![](https://img.haomeiwen.com/i27313279/198665e8eb44c0cd.png)
添加边际图
边际图(Marginal Plots)是在散点图的边缘显示单变量分布的图表。它们用于同时可视化两个变量之间的关系以及每个变量的单变量分布。在R语言中,可以使用ggplot2包创建散点图的边际图。
ggplot(data = small_diamonds,
aes(x = carat, y = price)) +
geom_point(shape = 21, size = 4,
color = 'black', aes(fill = cut)) +
geom_smooth(method = 'lm') +
geom_rug(aes(color = cut), show.legend = F) +
scale_fill_npg() +
scale_color_npg() +
labs(x = 'weight of the diamond ',
y = 'price in US dollars',
fill = 'quality of the cut') +
scale_x_continuous(breaks = seq(0,3,0.5)) +
scale_y_continuous(breaks = seq(0, 15000, 5000),
labels = c('0', '5K', '10K', '15K')) +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5),
legend.background = element_blank(),
legend.position = c(0.15, 0.76))
![](https://img.haomeiwen.com/i27313279/e2371f853973ec07.png)
网友评论