2023-MutationalPatterns包学习笔记（四）-

作者: 信你个鬼 | 来源:发表于2023-10-22 11:37 被阅读0次

单细胞转录组3大R包系列学习笔记
大师兄的Python学习笔记(五）: 常用库之os包
使用Shiny搭建基于Seurat包的单细胞数据可视化平台
单细胞组学系列学习笔记
单细胞多组学系列学习笔记汇总
JS学习笔记（四）（闭包）
大师兄的Python学习笔记(七）: re包与正则表达式
Seurat包学习笔记（四）：Using sctransform
avaScript 闭包的理解
学习小组笔记Day6-Jocelyn

老规矩，先奉上学习资料链接：

https://bioconductor.org/packages/release/bioc/vignettes/MutationalPatterns/inst/doc/Introduction_to_MutationalPatterns.html

这一次学习Strand bias analyses章节部分的内容。

一、 Strand bias analyses

1）Transcriptional strand bias analysis

对于基因内的突变，可以确定突变是在转录链上还是在非转录链上，这可以用来评估转录偶联修复的参与程度。
为此，确定“C”或“T”碱基(因为按照惯例，我们将碱基替换为C>X或T>X)是否与基因定义在同一条链上。与基因定义相同链上的碱基替换被认为是“未转录的”，而在基因体的相反链上的碱基替换被认为是“已转录的”，因为基因定义报告的是未转录的编码链或正义链。没有报道碱基置换在不同链上与多个基因体重叠的链信息。

基因定义

从获取参考基因组的基因定义开始：

genes_hg19 <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)

##   403 genes were dropped because they have exons located on both strands of the same
##   reference sequence or on more than one reference sequence, so cannot be represented
##   by a single genomic range.
##   Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList object, or
##   use suppressMessages() to suppress this message.

genes_hg19

## GRanges object with 23056 ranges and 1 metadata column:
##         seqnames              ranges strand |     gene_id
##            <Rle>           <IRanges>  <Rle> | <character>
##       1    chr19   58858172-58874214      - |           1
##      10     chr8   18248755-18258723      + |          10
##     100    chr20   43248163-43280376      - |         100
##    1000    chr18   25530930-25757445      - |        1000
##   10000     chr1 243651535-244006886      - |       10000
##     ...      ...                 ...    ... .         ...
##    9991     chr9 114979995-115095944      - |        9991
##    9992    chr21   35736323-35743440      + |        9992
##    9993    chr22   19023795-19109967      - |        9993
##    9994     chr6   90539619-90584155      + |        9994
##    9997    chr22   50961997-50964905      - |        9997
##   -------
##   seqinfo: 93 sequences (1 circular) from hg19 genome

Strand bias profile

使用mut_strand函数可以获得VCF对象中所有位置的转录链信息。对于基因体外的位置，以及与不同链上的多个基因重叠的位置，此函数返回“-”

library(BSgenome)
ref_genome <- "BSgenome.Hsapiens.UCSC.hg19"
library(ref_genome, character.only = TRUE)
vcf_files <- list.files(system.file("extdata", package = "MutationalPatterns"),
  pattern = "sample.vcf", full.names = TRUE)
sample_names <- c("colon1", "colon2", "colon3","intestine1", "intestine2", "intestine3","liver1", "liver2", "liver3")
tissue <- c(rep("colon", 3), rep("intestine", 3), rep("liver", 3))

grl <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome)

# 获取转录链信息
strand <- mut_strand(grl[[1]], genes_hg19)
head(strand, 10)

##  [1] transcribed   -             transcribed   -             untranscribed -            
##  [7] -             -             untranscribed -            
## Levels: untranscribed transcribed -

然后可以用转录链信息制作突变计数矩阵(96个三核苷酸* 2条链= 192个特征)。注:只计算那些位于基因体内的突变。

mut_mat_s <- mut_matrix_stranded(grl, ref_genome, genes_hg19)
mut_mat_s[1:5, 1:5]

##                       colon1 colon2 colon3 intestine1 intestine2
## A[C>A]A-untranscribed      0      0      0          0          4
## A[C>A]A-transcribed        1      1      2          4          3
## A[C>A]C-untranscribed      0      0      1          1          1
## A[C>A]C-transcribed        0      0      0          0          1
## A[C>A]G-untranscribed      1      0      0          0          0

然后，可以可视化这个矩阵：

library(ggplot2)
# 改成自己的结果输出路径，最好用绝对路径
opt <- list(od = "/MutationalPatterns/Test_Result")

p <- plot_192_profile(mut_mat_s[, 1:2])
ggsave(filename = paste0(opt$od,"/192_profile.png"), width = 9, height = 5)

image-20231023015920957.png

Strand bias test

还可以计算每条链，每一个组织，每一种突变类型的突变数量：

strand_counts <- strand_occurrences(mut_mat_s, by = tissue)
head(strand_counts)

image-20231023020801436.png

接下来，可以使用这些计数来执行链不对称的泊松测试，并进行多次测试校正。

strand_bias <- strand_bias_test(strand_counts)
head(strand_bias)

image-20231023020915765.png

用不同的链绘制突变谱ps1，绘制链偏倚的效应大小(log2(未转录/转录))ps2。星号表示显著的链偏倚。这里我们使用p值来绘制星号。默认使用fdr。两幅图合并在一起进行对比：

library(gridExtra)
ps1 <- plot_strand(strand_counts, mode = "relative")
ps2 <- plot_strand_bias(strand_bias, sig_type = "p")
p <- grid.arrange(ps1, ps2)
ggsave(filename = paste0(opt$od,"/strand_bias.png"), width = 8, height = 5, plot = p)

image-20231023021311160.png

还可以更改fdr和p值的显著性截止值。最多可以为每个字段使用三个截止级别，这将更改显著和显著_fdr列中的星号数量。这些星号将在绘图中使用：

strand_bias_notstrict <- strand_bias_test(strand_counts, p_cutoffs = c(0.5, 0.1, 0.05), fdr_cutoffs = 0.5)
p <- plot_strand_bias(strand_bias_notstrict, sig_type = "p")
ggsave(filename = paste0(opt$od,"/strand_bias.diff.png"), width = 8, height = 5, plot = p)

image-20231023021704937.png

2）Replicative strand bias analysis

复制相关机制的参与可以通过检测前导链和滞后链之间的突变偏差来评估。复制链取决于DNA复制起始的位置。然而，复制时间是动态的和细胞类型特异性的，这使得复制链的测定不如转录链偏差分析简单。复制时间谱可以通过Repli-Seq实验生成。一旦确定了复制方向，就可以进行与转录链偏倚分析类似的链不对称分析。唯一的区别是您需要为mut_strand和mut_strand_matrix函数使用复制模式。

Define replication direction

输入需要有个bed文件

这个GRanges对象应该有一个strand_info元数据列，它只包含两个不同的注释，例如“左”和“右”，或者“领先”和“滞后”。基因组范围不能重叠，每个位置只允许一个注释。

repli_file <- system.file("extdata/ReplicationDirectionRegions.bed", package = "MutationalPatterns")
repli_strand <- read.table(repli_file, header = TRUE)
# Store in GRanges object
repli_strand_granges <- GRanges(
  seqnames = repli_strand$Chr,
  ranges = IRanges(
    start = repli_strand$Start + 1,
    end = repli_strand$Stop
  ),
  strand_info = repli_strand$Class
)
# UCSC seqlevelsstyle
seqlevelsStyle(repli_strand_granges) <- "UCSC"
repli_strand_granges

image-20231023102320324.png

GRanges对象中strand_info元数据的levels决定了mut_matrix_strand返回的突变矩阵中报告链的顺序，因此，如果希望先右后左计数，可以在运行mut_matrix_strand之前指定这一点：

repli_strand_granges$strand_info <- factor(repli_strand_granges$strand_info,levels = c("right", "left") )

Replication bias analysis

现在我们定义了复制方向，接下来的分析与转录偏置分析类似:

你可以像这样计算链矩阵、计数和偏差

mut_mat_s_rep <- mut_matrix_stranded(grl, ref_genome, repli_strand_granges, mode = "replication")
strand_counts_rep <- strand_occurrences(mut_mat_s_rep, by = tissue)
strand_bias_rep <- strand_bias_test(strand_counts_rep)

然后可视化：

ps1 <- plot_strand(strand_counts_rep, mode = "relative")
ps2 <- plot_strand_bias(strand_bias_rep)
p <- grid.arrange(ps1, ps2)
ggsave(filename = paste0(opt$od,"/Replication_bias.png"), width = 8, height = 5, plot = p)

image-20231023102901405.png

3）Signatures with strand bias

链偏差可以包括在特征分析中。例如，您可以对具有链特征的突变计数矩阵执行NMF：

nmf_res_strand <- extract_signatures(mut_mat_s, rank = 2, single_core = TRUE)
colnames(nmf_res_strand$signatures) <- c("Signature A", "Signature B")
str(nmf_res_strand, max.level = 2)

image-20231023113027811.png

这个包的内容很多，这次学习只能粗略过一遍了，后面还有一些，下次见~

单细胞转录组3大R包系列学习笔记
Seurat包学习笔记 Seurat包学习笔记（一）：Guided Clustering TutorialSeur...
大师兄的Python学习笔记(五）: 常用库之os包
大师兄的Python学习笔记(四）: Python的内置函数大师兄的Python学习笔记(六）: 常用库之sys包...
使用Shiny搭建基于Seurat包的单细胞数据可视化平台
image ▼Seurat包系列学习笔记▼：Seurat包学习笔记（一）：Guided Clustering Tu...
单细胞组学系列学习笔记
scRNA-seq数据分析 Seurat包学习笔记 Seurat包学习笔记（一）：Guided Clusterin...
单细胞多组学系列学习笔记汇总
单细胞转录组数据分析 Seurat包学习笔记 Seurat包学习笔记（一）：Guided Clustering T...
JS学习笔记（四）（闭包）
最近学这块知识学得有些吃力。还有很多遗漏的地方，只能以后多看些书来弥补了。第7章函数表达式函数定义的两种方式...
大师兄的Python学习笔记(七）: re包与正则表达式
大师兄的Python学习笔记(六）: 常用库之os包大师兄的Python学习笔记(八）: 异常处理一、关于re包...
Seurat包学习笔记（四）：Using sctransform
在本教程中，我们将学习Seurat3中使用SCTransform方法对单细胞测序数据进行标准化处理的方法。该方法是...
avaScript 闭包的理解
JavaScript 学习笔记之闭包这篇文章是我学习闭包的笔记与总结，后面会有一些例子，结合画图的形式来理解，什...
学习小组笔记Day6-Jocelyn
#2020年7月8日学习笔记学习R包 ##一、安装R包（dplyr） ##二、使用R包之常用的五个基本函数 ##...