老规矩,先奉上学习资料链接:
这一次学习Strand bias analyses章节部分的内容。
一、 Strand bias analyses
1)Transcriptional strand bias analysis
对于基因内的突变,可以确定突变是在转录链上还是在非转录链上,这可以用来评估转录偶联修复的参与程度。
为此,确定“C”或“T”碱基(因为按照惯例,我们将碱基替换为C>X或T>X)是否与基因定义在同一条链上。与基因定义相同链上的碱基替换被认为是“未转录的”,而在基因体的相反链上的碱基替换被认为是“已转录的”,因为基因定义报告的是未转录的编码链或正义链。没有报道碱基置换在不同链上与多个基因体重叠的链信息。
基因定义
从获取参考基因组的基因定义开始:
genes_hg19 <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
## 403 genes were dropped because they have exons located on both strands of the same
## reference sequence or on more than one reference sequence, so cannot be represented
## by a single genomic range.
## Use 'single.strand.genes.only=FALSE' to get all the genes in a GRangesList object, or
## use suppressMessages() to suppress this message.
genes_hg19
## GRanges object with 23056 ranges and 1 metadata column:
## seqnames ranges strand | gene_id
## <Rle> <IRanges> <Rle> | <character>
## 1 chr19 58858172-58874214 - | 1
## 10 chr8 18248755-18258723 + | 10
## 100 chr20 43248163-43280376 - | 100
## 1000 chr18 25530930-25757445 - | 1000
## 10000 chr1 243651535-244006886 - | 10000
## ... ... ... ... . ...
## 9991 chr9 114979995-115095944 - | 9991
## 9992 chr21 35736323-35743440 + | 9992
## 9993 chr22 19023795-19109967 - | 9993
## 9994 chr6 90539619-90584155 + | 9994
## 9997 chr22 50961997-50964905 - | 9997
## -------
## seqinfo: 93 sequences (1 circular) from hg19 genome
Strand bias profile
使用mut_strand函数可以获得VCF对象中所有位置的转录链信息。对于基因体外的位置,以及与不同链上的多个基因重叠的位置,此函数返回“-”
library(BSgenome)
ref_genome <- "BSgenome.Hsapiens.UCSC.hg19"
library(ref_genome, character.only = TRUE)
vcf_files <- list.files(system.file("extdata", package = "MutationalPatterns"),
pattern = "sample.vcf", full.names = TRUE)
sample_names <- c("colon1", "colon2", "colon3","intestine1", "intestine2", "intestine3","liver1", "liver2", "liver3")
tissue <- c(rep("colon", 3), rep("intestine", 3), rep("liver", 3))
grl <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome)
# 获取转录链信息
strand <- mut_strand(grl[[1]], genes_hg19)
head(strand, 10)
## [1] transcribed - transcribed - untranscribed -
## [7] - - untranscribed -
## Levels: untranscribed transcribed -
然后可以用转录链信息制作突变计数矩阵(96个三核苷酸* 2条链= 192个特征)。注:只计算那些位于基因体内的突变。
mut_mat_s <- mut_matrix_stranded(grl, ref_genome, genes_hg19)
mut_mat_s[1:5, 1:5]
## colon1 colon2 colon3 intestine1 intestine2
## A[C>A]A-untranscribed 0 0 0 0 4
## A[C>A]A-transcribed 1 1 2 4 3
## A[C>A]C-untranscribed 0 0 1 1 1
## A[C>A]C-transcribed 0 0 0 0 1
## A[C>A]G-untranscribed 1 0 0 0 0
然后,可以可视化这个矩阵:
library(ggplot2)
# 改成自己的结果输出路径,最好用绝对路径
opt <- list(od = "/MutationalPatterns/Test_Result")
p <- plot_192_profile(mut_mat_s[, 1:2])
ggsave(filename = paste0(opt$od,"/192_profile.png"), width = 9, height = 5)
![](https://img.haomeiwen.com/i17096784/ab1aa9fa3d617622.png)
Strand bias test
还可以计算每条链,每一个组织,每一种突变类型的突变数量:
strand_counts <- strand_occurrences(mut_mat_s, by = tissue)
head(strand_counts)
![](https://img.haomeiwen.com/i17096784/1f478e099a33fe1b.png)
接下来,可以使用这些计数来执行链不对称的泊松测试,并进行多次测试校正。
strand_bias <- strand_bias_test(strand_counts)
head(strand_bias)
![](https://img.haomeiwen.com/i17096784/0f35823368f47cd5.png)
用不同的链绘制突变谱ps1,绘制链偏倚的效应大小(log2(未转录/转录))ps2。星号表示显著的链偏倚。这里我们使用p值来绘制星号。默认使用fdr。两幅图合并在一起进行对比:
library(gridExtra)
ps1 <- plot_strand(strand_counts, mode = "relative")
ps2 <- plot_strand_bias(strand_bias, sig_type = "p")
p <- grid.arrange(ps1, ps2)
ggsave(filename = paste0(opt$od,"/strand_bias.png"), width = 8, height = 5, plot = p)
![](https://img.haomeiwen.com/i17096784/4d43488cf4173955.png)
还可以更改fdr和p值的显著性截止值。最多可以为每个字段使用三个截止级别,这将更改显著和显著_fdr列中的星号数量。这些星号将在绘图中使用:
strand_bias_notstrict <- strand_bias_test(strand_counts, p_cutoffs = c(0.5, 0.1, 0.05), fdr_cutoffs = 0.5)
p <- plot_strand_bias(strand_bias_notstrict, sig_type = "p")
ggsave(filename = paste0(opt$od,"/strand_bias.diff.png"), width = 8, height = 5, plot = p)
![](https://img.haomeiwen.com/i17096784/bfcf4a8079cd65b8.png)
2)Replicative strand bias analysis
复制相关机制的参与可以通过检测前导链和滞后链之间的突变偏差来评估。复制链取决于DNA复制起始的位置。然而,复制时间是动态的和细胞类型特异性的,这使得复制链的测定不如转录链偏差分析简单。复制时间谱可以通过Repli-Seq实验生成。一旦确定了复制方向,就可以进行与转录链偏倚分析类似的链不对称分析。唯一的区别是您需要为mut_strand和mut_strand_matrix函数使用复制模式。
Define replication direction
输入需要有个bed文件
这个GRanges对象应该有一个strand_info元数据列,它只包含两个不同的注释,例如“左”和“右”,或者“领先”和“滞后”。基因组范围不能重叠,每个位置只允许一个注释。
repli_file <- system.file("extdata/ReplicationDirectionRegions.bed", package = "MutationalPatterns")
repli_strand <- read.table(repli_file, header = TRUE)
# Store in GRanges object
repli_strand_granges <- GRanges(
seqnames = repli_strand$Chr,
ranges = IRanges(
start = repli_strand$Start + 1,
end = repli_strand$Stop
),
strand_info = repli_strand$Class
)
# UCSC seqlevelsstyle
seqlevelsStyle(repli_strand_granges) <- "UCSC"
repli_strand_granges
![](https://img.haomeiwen.com/i17096784/9440befe96572eb0.png)
GRanges对象中strand_info元数据的levels决定了mut_matrix_strand返回的突变矩阵中报告链的顺序,因此,如果希望先右后左计数,可以在运行mut_matrix_strand之前指定这一点:
repli_strand_granges$strand_info <- factor(repli_strand_granges$strand_info,levels = c("right", "left") )
Replication bias analysis
现在我们定义了复制方向,接下来的分析与转录偏置分析类似:
你可以像这样计算链矩阵、计数和偏差
mut_mat_s_rep <- mut_matrix_stranded(grl, ref_genome, repli_strand_granges, mode = "replication")
strand_counts_rep <- strand_occurrences(mut_mat_s_rep, by = tissue)
strand_bias_rep <- strand_bias_test(strand_counts_rep)
然后可视化:
ps1 <- plot_strand(strand_counts_rep, mode = "relative")
ps2 <- plot_strand_bias(strand_bias_rep)
p <- grid.arrange(ps1, ps2)
ggsave(filename = paste0(opt$od,"/Replication_bias.png"), width = 8, height = 5, plot = p)
![](https://img.haomeiwen.com/i17096784/e00367c6d6b299c2.png)
3)Signatures with strand bias
链偏差可以包括在特征分析中。例如,您可以对具有链特征的突变计数矩阵执行NMF:
nmf_res_strand <- extract_signatures(mut_mat_s, rank = 2, single_core = TRUE)
colnames(nmf_res_strand$signatures) <- c("Signature A", "Signature B")
str(nmf_res_strand, max.level = 2)
![](https://img.haomeiwen.com/i17096784/e63627b76fb61cc9.png)
这个包的内容很多,这次学习只能粗略过一遍了,后面还有一些,下次见~
网友评论