在之前的步骤中,已经筛选出感兴趣的转录本信息,并添加了多种注释了。接下来就可以做最后的分析了,比如哪个基因的转录本信息变化最大,如何变化。
由于之前的示例文件包含基因数比较少,且涉及了三种情况,较复杂。因此这里换一个理想的示例文件进行演练。
data("exampleSwitchListAnalyzed")
sar <- subsetSwitchAnalyzeRlist(
exampleSwitchListAnalyzed,
exampleSwitchListAnalyzed$isoformFeatures$condition_1 == 'COAD_ctrl'
)
sar
image.png
步骤三:结果分析 Post Analysis
1、查看Switch排名
- 这里的排名依据主要是依靠q-value/dIF值
(1)看看变化最大的两个基因
extractTopSwitches(
sar,
filterForConsequences = TRUE,
n = 2, #如果设置n=NA,则会返回所有排名的一个表格
sortByQvals = TRUE #若为FALSE则按dIF值排名,默认为TRUE
)
-
filterForConsequences = TRUE
filter for genes with functional consequences,参考步骤二最后一步,默认为FALSE。 -
sortByQvals =
参数若为TRUE,则按Q值排名;若为FALSE,则按dIF值排名; -
n=
参数设置数量,若设置为NA,则会得到所有排名的表格。
image.png
(2)获得IsoformSwitch排名表格 - gene
sar.ge_df=extractTopSwitches(
sar,
filterForConsequences = TRUE,
n = NA,
extractGenes = TRUE, # when FALSE isoforms are returned
sortByQvals = TRUE
)
sar.ge_df
- isoform
sar.is_df=extractTopSwitches(
sar,
filterForConsequences = TRUE,
n = NA,
extractGenes = FALSE, # when FALSE isoforms are returned
sortByQvals = TRUE
)
head(sar.is_df,2)
head(sar.is_df,2)
2、单个基因switch可视化★
(1)以比较显著的ZAK
基因为例
subset(sar.is_df, gene_name == 'ZAK')
ZAK基因的两个转录本情况
-
switchPlot()
函数将其可视化
switchPlot(sar, gene = 'ZAK')
如下图结果:
- 上半部分为:The isoform structures along with the concatenated annotations (including transcript classification, ORF, Coding Potential, NMD sensitivity, annotated protein domains as well as annotated signal peptides)。
上面两个 Decreased/Increased useage的应该是dIF分别小于0、大于0的isoform,而第三个isoform说明ZAK基因表达这个转录本没有显著变化。 - 下半部分为三个柱状图,从三个角度反映表达情况。重点关注两边的,可以看出二者基因表达没有显著差异,但是isoform usage变化显著!
switchPlot柱状图上方的横线标注应该表示p值,即有无显著差异。
ns
表示 not significant;*
~***
表示有显著差异。
- 综上,COAD_cancer患者的ZAK基因的uc002uhz.2转录本表达明显比健康人高许多,而uc002uhy.2相应得比健康人表达少。后期深入分析可结合注释信息差异,进行转录蛋白结构等深入挖掘。
- 最后将该结果下载保存,建议为pdf
pdf(file = '~/li/test/ZAK.pdf', onefile = FALSE, height=6, width = 9)
switchPlot(sar, gene='ZAK')
dev.off()
(2)批量绘制
switchPlotTopSwitches(
switchAnalyzeRlist = sar,
n = 10, #绘制排名前10的
filterForConsequences = FALSE,
splitFunctionalConsequences = TRUE
)
-
splitFunctionalConsequences =
参数设置 to create a sub-folder for those switches with predicted consequences and another sub-folder for those without.
如下图结果,因为设置了filterForConsequences = FALSE
,splitFunctionalConsequences = TRUE
两个参数,因此按照排名将结果分成两类,分别存储在两个子文件夹里。
process
3、Genome-wide Consequences Summaries
extractConsequenceSummary(
sar,
consequencesToAnalyze='all', #也可以指定感兴趣的consequences
plotGenes = FALSE, # enables analysis of genes (instead of isoforms)
asFractionTotal = FALSE # enables analysis of fraction of significant features
)
-
From this summary plots above many conclusions are possible. First of all, the most frequent changes are changes affecting protein domains, Intrinsically Disordered Regions (IDR) and ORFs.
extractConsequenceSummary - To easily see the opposite consequence that are unevenly distributed,可以将上图转化为富集分析图。
extractConsequenceEnrichment(
exampleSwitchListAnalyzed,
consequencesToAnalyze='all',
analysisOppositeConsequence = TRUE,
returnResult = FALSE # if TRUE returns a data.frame with the results
)
-
对比上面的柱状图,中心点在虚线右边的应该表示相应的consequence增多。If this fraction is significantly different from 0.5 it indicates there is a systematic bias in isoform switch consequences.
extractConsequenceEnrichment
From the analysis above, it is therefore quite clear, that many of the opposing consequences are significantly unevenly distributed. In other words, many types of consequences seem to be used in a condition-specific manner.
4、Genome-wide Splicing Summaries
extractSplicingSummary(sar)
extractSplicingSummary
extractSplicingEnrichment(
sar,
splicingToAnalyze = c('A3','MES','ATSS','ATTS'),
returnResult = TRUE # if TRUE returns a data.frame with the results
)
image.png
5、最后画一个火山图吧~
ggplot(data=sar$isoformFeatures, aes(x=dIF, y=-log10(isoform_switch_q_value))) +
geom_point(
aes( color=abs(dIF) > 0.1 & isoform_switch_q_value < 0.05 ), # default cutoff
size=1
) +
geom_hline(yintercept = -log10(0.05), linetype='dashed') + # default cutoff
geom_vline(xintercept = c(-0.1, 0.1), linetype='dashed') + # default cutoff
scale_color_manual('Signficant\nIsoform Switch', values = c('black','red')) +
labs(x='dIF', y='-Log10 ( Isoform Switch Q Value )') +
theme_bw()
volcano figure
As there are many dIF values (effect size) very close to zero, which have a significant isoform switch (black dots above dashed horizontal line) this nicely illustrates why a cutoff on both the dIF and the q-value are necessary.
- 以上是IsoformSwitchAnalyzeR分析的一个大致流程了,其实这个包还可以分析更加复杂的情况,比如三组实验设计的两两比较,存在协变量的情况都可以进行分析。之后实际遇到这些数据再深入了解下。
- 在注释过程中,涉及到许多生物知识,比如蛋白结构域,信号肽,内在无序区域,以及CDS与ORF区别等,以后有机会再进行深入的学习吧~
- 官方说明书除了流程以外,关于可能遇到的一些问题说得很详细,值得仔细学习。
网友评论