前言:
此文参考自B站孟浩巍的系列视频[1]代码,利用个人测序数据绘图,纯练手,代码偏多,尽量给出解释;
1. 数据载入及预处理
> rm(list=ls())
> setwd("c:/Users/Administrator/Documents/data_analysis/")
# 载入基因计数csv文件
> sign.gene <- read.csv(file = "c:/Users/Administrator/Documents/data_analysis/2018-10-06J.F.XIE_sequencing_data/Sugar_A_vs_Yeast_A_diff_exp.csv", header = T)
#选择处理组数据
> sugar_A_TPM <- sign.gene$Sugar_A_mean_TPM
#选择对照组数据
> Yeast_A_TPM <- sign.gene$Yeast_A_mean_TPM
#横坐标,计算log2FC
> log2_foldchange <- log2(sugar_A_TPM / Yeast_A_TPM)
#将无效值替换为0
> log2_foldchange[sugar_A_TPM == 0] <- 0
> log2_foldchange[Yeast_A_TPM == 0] <- 0
#“纵坐标”,计算-log10_pvalue
> log10_p_value <- log10(sign.gene$pvalue)* (-1)
关于无效数据的处理,参看文章R语言中特殊值NaN、Inf 、NA、NULL的处理[2]
2. Plot图形绘制
- 1st edition
> plot(x = log2_foldchange, y = log10_p_value, xlim = c(-4, 4), ylim = c(0.01, 2))
20181005-1.png
- 2nd edition
设置过滤参数,由于横纵轴数据要求一致,所以两个过滤的条件是一致的。
> log10_p_value.filter <- log10_p_value[log10_p_value >= 0.001]
> log2_foldchange.filter <- log2_foldchange[log10_p_value >= 0.001]
> plot(x = log2_foldchange.filter, y = log10_p_value.filter,
+ xlim = c(-4, 4), ylim = c(0, 2))
20181005-2.png
- 3rd edition
增加了颜色元素进来
plot(x = log2_foldchange.filter, y = log10_p_value.filter,
xlim = c(-4, 4), ylim = c(0, 2),
col = rgb(0,0,1,0.1), pch =16
)
20181005-3.png
- 4th edition
设立差异标准,并对差异进行颜色的赋值;
#判断多少点
> length(log2_foldchange.filter)
[1] 21987
#储存颜色的向量
> col_vector = rep(rgb(0,0,1,0.1), length(log2_foldchange.filter))
#找出筛选的条件,然后赋值为红色
> col_vector[log10_p_value.filter >= -1*log10(0.01)] = rgb(1,0,0)
> plot(x = log2_foldchange.filter, y = log10_p_value.filter,
+ xlim = c(-4, 4), ylim = c(0.01, 2),
+ col = col_vector, pch =16)
20181005-4.png
- 5th edition
设置筛选条件:
- p_value <= 0.05, #统计学显著性
- sugar or yeast FPKM > 0 #均一化后计数值
- foldchange >2 or < 0.5 # 差异倍数
#按照上述三个筛选标准筛选,用于颜色
> select_sign_vector = (sign.gene$pvalue <= 0.05) & (sign.gene$Sugar_A_mean_TPM > 50) & (sign.gene$Yeast_A_mean_TPM > 50) & (sign.gene$Sugar_A_mean_TPM >= 100 | sign.gene$Yeast_A_mean_TPM >= 100) & (abs(log2_foldchange) >= 1)
#查看筛选后的结果
> table(select_sign_vector)
select_sign_vector
FALSE TRUE
57862 244
#x-y筛选;颜色筛选
> log10_p_value.filter = log10_p_value[log10_p_value >= 0.001]
> log2_foldchange.filter = log2_foldchange[log10_p_value >= 0.001]
> select_sign_vector.filter = select_sign_vector[log10_p_value >= 0.001]
#颜色赋值
> col_vector = rep(rgb(0,0,1,0.1), length(log2_foldchange.filter))
> col_vector[select_sign_vector.filter >= -1*log10(0.1)] = rgb(1,0,0)
#筛选条件,两者一一对应
> length(select_sign_vector.filter)
[1] 21987
> length(col_vector)
[1] 21987
> length(select_sign_vector.filter >= -1*log10(0.01))
[1] 21987
#绘制图形
> plot(x = log2_foldchange.filter, y = log10_p_value.filter,
+ xlim = c(-4, 4), ylim = c(0.01, 2),
+ col = col_vector, pch =16)
> abline(h = -1*log10(0.05), lwd = 3, lty = 3, col = "#4C5B61")
20181005-5.png
公司给出的图形是用ggplot2绘制,筛选的标准就不得而知了。
相比,我们的出图是有点丑,但是整体操作流程就是这样。
Sugar_A_vs_Yeast_A_volcano.png
网友评论