美文网首页生物信息学
Nature Protocal:转录组差异表达分析

Nature Protocal:转录组差异表达分析

作者: 胡童远 | 来源:发表于2020-07-19 14:36 被阅读0次

导读

Nature Protocols 2012的方法,学习转录组分析。

文献:Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 2012
文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3334321/

一、数据下载

GEO数据地址:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32038

二、软件下载、安装

1. 安装tophat

conda install tophat

三、流程

1. 比对参考基因组

tophat -p 8 -G genes.gtf -o C1_R1_thout genome C1_R1_1.fq C1_R1_2.fq
tophat -p 8 -G genes.gtf -o C1_R2_thout genome C1_R2_1.fq C1_R2_2.fq
tophat -p 8 -G genes.gtf -o C1_R3_thout genome C1_R3_1.fq C1_R3_2.fq
tophat -p 8 -G genes.gtf -o C2_R1_thout genome C2_R1_1.fq C1_R1_2.fq
tophat -p 8 -G genes.gtf -o C2_R2_thout genome C2_R2_1.fq C1_R2_2.fq
tophat -p 8 -G genes.gtf -o C2_R3_thout genome C2_R3_1.fq C1_R3_2.fq

2. 组装转录本

cufflinks -p 8 -o C1_R1_clout C1_R1_thout/accepted_hits.bam
cufflinks -p 8 -o C1_R2_clout C1_R2_thout/accepted_hits.bam
cufflinks -p 8 -o C1_R3_clout C1_R3_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R1_clout C2_R1_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R2_clout C2_R2_thout/accepted_hits.bam
cufflinks -p 8 -o C2_R3_clout C2_R3_thout/accepted_hits.bam

3. 创建assemblies.txt
文件内容:

./C1_R1_clout/transcripts.gtf
./C2_R2_clout/transcripts.gtf
./C1_R2_clout/transcripts.gtf
./C2_R1_clout/transcripts.gtf
./C1_R3_clout/transcripts.gtf
./C2_R3_clout/transcripts.gtf

4. 转录本注释

cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt

5. Identify差异基因、差异转录本

cuffdiff -o diff_out -b genome.fa -p 8 –L C1,C2 -u merged_asm/merged.gtf \
./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,./C1_R3_thout/ accepted_hits.bam \
./C2_R1_thout/accepted_hits.bam,./C2_R3_thout/accepted_hits.bam,./C2_R2_thout/ accepted_hits.bam

6. 探索差异分析结果

library(cummeRbund)
cuff_data < - readCufflinks(‘diff_out’)

# 密度图:plot每个样品的表达水平分布
csDensity(genes(cuff_data))
# 散点图:比较每个基因在两组的表达
csScatter(genes(cuff_data), ‘C1’, ‘C2’)
# 火山图:inspect差异表达基因
csVolcano(genes(cuff_data), ‘C1’, ‘C2’)
# barplot感兴趣基因的表达水平
mygene < - getGene(cuff_data, ‘regucalcin’)
expressionBarplot (mygene)
# barplot感兴趣基因的isoform表达水平
expressionBarplot(isoforms (mygene))

7. 计算map到每个染色体的reads数

for i in *thout/accepted_hits.bam; do
    echo $i
    samtools index $i
done

for i in *thout/accepted_hits.bam; do
    echo $i
    samtools idxstats $i
done

8. 比较组装转录组与参考转录组

find . -name transcripts.gtf > gtf_out_list.txt
cuffcompare -i gtf_out_list.txt -r genes.gtf
for i in ‘find . -name *.tmap’; do
    echo $i
    awk ‘NR > 1 { s[$3] + + } END { \ for (j in s) { print j, s[j] }} ’ $i
done

9.把差异表达基因和差异表达转录本写到新的文件

library(cummeRbund)
cuff_data < - readCufflinks(‘diff_out’)
gene_diff_data < - diffData(genes(cuff_data))
sig_gene_data < - subset(gene_diff_data, (significant = = ‘yes’))
nrow(sig_gene_data)

10. 抽提差异表达转录本、差异spliced和regulated基因

isoform_diff_data < - diffData(isoforms(cuff_data), ‘C1’, ‘C2’)
sig_isoform_data < - subset(isoform_diff_data, (significant = = ‘yes’))
nrow(sig_isoform_data)
tss_diff_data < - diffData(TSS(cuff_data), ‘C1’, ‘C2’)
sig_tss_data < - subset(tss_diff_data, (significant = = ‘yes’))
nrow(sig_tss_data)
cds_diff_data < - diffData(CDS(cuff_data), ‘C1’, ‘C2’)
sig_cds_data < - subset(cds_diff_data, (significant = = ‘yes’))
nrow(sig_cds_data)
promoter_diff_data < - distValues(promoters(cuff_data))
sig_promoter_data < - subset(promoter_diff_data, (significant = = ‘yes’))
nrow(sig_promoter_data)
splicing_diff_data < - distValues(splicing(cuff_data))
sig_splicing_data < - subset(splicing_diff_data, (significant = = ’yes’))
nrow(sig_splicing_data)
relCDS_diff_data < - distValues(relCDS(cuff_data))
sig_relCDS_data < - subset(relCDS_diff_data, (significant = = ‘yes’))
nrow(sig_relCDS_data)

11. 保存差异表达基因

gene_diff_data < - diffData(genes(cuff_data))
sig_gene_data < - subset(gene_diff_data, (significant = = ‘yes’))
write.table(sig_gene_data, ‘diff_genes.txt’, sep = ‘t’, row.names = F, col.names = T, quote = F)

更新中...

相关文章

网友评论

    本文标题:Nature Protocal:转录组差异表达分析

    本文链接:https://www.haomeiwen.com/subject/twefkktx.html