一篇PNAS Fig.1的复现

作者: 大吉岭猹 | 来源:发表于2019-09-27 20:47 被阅读0次
    • 文章链接:https://www.pnas.org/content/115/8/E1839.long
    • GSE编号:GSE102339
    • 特别提示:如果只是复现Fig1,不需要下载RNA-seq数据和非WT的ChIP-seq数据
    • 注1:运行本文中的脚本时建议均采用nohup bash name.sh &的方式
    • 注2:本文脚本默认存放在epi/script文件夹下
    • 注3:所有循环脚本运行前一定要用echo检查一下变量

    1. 安装软件和下载数据(sra文件)

    2. 建文件夹

    mkdir sra raw clean align bw mergeBam peaks script
    

    3. 准备config

    # SraRunTable.txt是下载的,file.list是自己复制的
    awk '{print $11,$13}' SraRunTable.txt |grep -v 'Run'|sort -k2,2|paste - file.list |awk '{print $1,$4}' > config
    
    • config做好了长这样


      config

    4. sra转fq

    • 脚本
    analysis_dir=../raw
    sra=.sra
    cat config|while read id;
    do echo $id
    arr=($id)
    srr=${arr[0]}$sra
    sample=${arr[1]}
    fastq-dump -A $sample -O $analysis_dir --gzip --split-3 /trainee/ZJU/epi/sra/$srr
    done
    

    5. ChIP-seq过滤fq

    • 脚本
    ls ../raw/ChIP*.gz | while read id;
    do
    trim_galore -q 25 --phred33 --length 25 -e 0.1 --stringency 4 -o ../clean $id
    done
    

    6. 准备trim_config_RNA

    cd ../clean
    ls ../raw/RNA*.fq.gz | paste - - > trim_config_RNA
    

    7. RNA-seq过滤fq

    • 脚本
    cat trim_config_RNA | while read sample;
    do
    arr=$sample
    fq1=${arr[0]}
    fq2=${arr[1]}
    trim_galore -q 25 --phred33 --length 35 -e 0.1 --stringency 4  --paired -o ../clean $fq1 $fq2
    done
    

    8. ChIP-seq比对

    • 脚本
    bowtie2_index=/trainee/ZJU/reference/index_bowtie2/dm6
    ls ../clean/ChIP-Seq_*.gz | while read id;
    do
    file=$(basename $id)
    sample=${file%%.*}
    #echo $file $sample
    bowtie2 -p 6 -x $bowtie2_index -U $id | samtools sort -O bam -@ 6 -o - > ../align/${sample}.bam
    done
    

    9. 准备RNA_bowtie2_config

    cd ../align
    ls ../clean/RNA*.fq.gz | paste - - > RNA_bowtie2_config
    

    10. RNA-seq比对

    • 脚本
    bowtie2_index=/trainee/ZJU/reference/index_bowtie2/dm6
    cat RNA_bowtie2_config | while read sample;
    do
    arr=($sample)
    fq1=${arr[0]}
    fq2=${arr[1]}
    file=$(basename $sample )
    tmp=$(basename $fq1)
    sample=${tmp%%_1_trimmed.fq.gz}
    #echo $file $sample
    bowtie2 -p 6 -x $bowtie2_index -1 $fq1 -2 $fq2 | samtools sort -O bam -@ 6 -o - > ../align/${sample}.bam
    done
    

    11. ChIP-seq比对去重

    • 脚本
    ls ../align/ChIP*.bam | while read id
    do
    sample=${id%%.bam}
    #echo $id $sample
    samtools rmdup -s $id $sample.rmdup.bam
    done
    
    cd ../align
    ls  *.rmdup.bam  | xargs -i samtools index {}
    ls  *.rmdup.bam  | while read id ;do samtools flagstat $id > $(basename $id ".bam").stat ;done #basename的神奇用法
    
    • 自此,RNA-seq数据被我抛弃了

    12. bam merge

    cd align
    ls  ChIP*.rmdup.bam|sed 's/_[0-9]_trimmed.rmdup.bam//g'|sort -u |while read id;do samtools merge ../mergeBam/$id.merge.bam $id*.rmdup.bam ;done
    

    13. call peaks

    cd align
    ls  *_WT.merge.bam |cut -d"." -f 1 |while read id;
    do
        if [ ! -s ${id}_summits.bed ];
        then
    echo $id 
    nohup macs2 callpeak -c  ChIP-Seq_Input_WT.merge.bam -t $id.merge.bam -f BAM -B -g dm -n $id --outdir ../peaks  2> $id.log &
        fi
    done
    

    14. peak转bed(for Fig.1A)

    • H3K27me3_peakdomain.bed从文章补充材料中获得
    • 脚本
    intersectBed -a ChIP-Seq_Spps_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Spps_inH3K27.bed
    intersectBed -a ChIP-Seq_Pho_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Pho_inH3K27.bed
    intersectBed -a ChIP-Seq_Ph_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Ph_inH3K27.bed
    intersectBed -a ChIP-Seq_Psc_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Psc_inH3K27.bed
    intersectBed -a ChIP-Seq_Pc_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Pc_inH3K27.bed
    intersectBed -a ChIP-Seq_Ez_WT_peaks.narrowPeak -b H3K27me3_peakdomain.bed -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Ez_inH3K27.bed
    
    • 下一步在R中完成
    library(regioneR)
    library(ChIPpeakAnno)
    Psc <- toGRanges("Psc_inH3K27.bed")
    Spps <- toGRanges("Spps_inH3K27.bed")
    Pho <- toGRanges("Pho_inH3K27.bed")
    ol <- findOverlapsOfPeaks(Psc, Spps, Pho, maxgap=1,connectedPeaks="min")
    pdf("Psc_Spps_Pho_overlap.pdf",width=5,height=5)
    makeVennDiagram(ol, totalTest=2500)
    dev.off()
    
    • 成品


      Fig.1A

    15. bam转bw(for Fig.1C)

    • 脚本
    ls ../mergeBam/*_WT.merge.bam | while read fq1;
    do
    sample=${fq1%%.bam}
    samtools index $fq1
    bamCoverage -b $fq1 -o $sample.bw --normalizeUsing RPKM -p 5
    done
    #第一次运行这个脚本没有在bw文件夹里看到预期的输出结果,其实是代码有一个小问题,一开始没有发现这个问题就是因为没有完全养成echo检查的习惯
    
    • 载入IGV后查看基因Abd-B,成品


      Fig.1B

    16. deeptools可视化(for Fig.1B)

    16.1. step1

    • 按文章要求进行log2处理的脚本
    ls ../mergeBam/*_WT.merge.bam | while read fq1;
    do
    tmp=$(basename $fq1)
    sample=${tmp%%.bam}
    samtools index $fq1
    bamCompare -b1 $fq1 -b2 ChIP-Seq_Input_WT.merge.bam --operation log2 -of bigwig -o ../bw/$sample.log2.bw -p 2
    done
    

    16.2. step2

    • 思路是把Spps,Cg,Pho的peak merge到一起,形成一个reference,然后分别将上面三个peak和这个reference做intersectBed,也就是把三个的peak对应成一个标准模式
    mergePeaks ChIP-Seq_Spps_WT_peaks.narrowPeak ChIP-Seq_Pho_WT_peaks.narrowPeak ChIP-Seq_Psc_WT_peaks.narrowPeak | cut -f2- > reference.bed
    intersectBed -a reference.bed -b ChIP-Seq_Spps_WT_peaks.narrowPeak -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Spps_standard.bed
    intersectBed -a reference.bed -b ChIP-Seq_Pho_WT_peaks.narrowPeak -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Pho_standard.bed
    intersectBed -a reference.bed -b ChIP-Seq_Psc_WT_peaks.narrowPeak -c|awk 'BEGIN{OFS="\t"}{if($NF>0){print $1,$2,$3}}' > Fig1/Psc_standard.bed
    

    16.3. step3

    • 使用网页工具把每一个区域的位点信息做成一个bed文件

    16.4. step4

    computeMatrix reference-point  --referencePoint center  -p 6 -b 5000 -a 5000 -R ../peaks/Fig1/veen_result/all_3.bed ../peaks/Fig1/veen_result/Pho_Spps.bed ../peaks/Fig1/veen_result/Spps_Psc.bed ../peaks/Fig1/veen_result/Pho_Psc.bed ../peaks/Fig1/veen_result/only_Spps.bed ../peaks/Fig1/veen_result/only_Pho.bed ../peaks/Fig1/veen_result/only_Psc.bed -S ChIP-Seq_Spps_WT.merge.log2.bw ChIP-Seq_Pho_WT.merge.log2.bw ChIP-Seq_Ez_WT.merge.log2.bw ChIP-Seq_Ph_WT.merge.log2.bw ChIP-Seq_Psc_WT.merge.log2.bw ChIP-Seq_Pc_WT.merge.log2.bw --skipZeros  -o ../peaks/Fig1/b/all_center.gz
    plotHeatmap -m all_center.gz  -out all_Heatmap.png --colorList "blue,white,red" --zMin -2 --zMax 2 --whatToShow "heatmap only" --regionsLabel G1 G2 G3 G4 G5 G6 G7 --samplesLabel Spps Pho Ez Ph Psc Pc
    
    • 成品


      Fig.1B

    最后,向大家隆重推荐生信技能树的一系列干货!

    1. 生信技能树全球公益巡讲:https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g
    2. B站公益74小时生信工程师教学视频合辑:https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw
    3. 招学徒:https://mp.weixin.qq.com/s/K8WiX5C7BYC3WonM2VcdHQ

    相关文章

      网友评论

        本文标题:一篇PNAS Fig.1的复现

        本文链接:https://www.haomeiwen.com/subject/vvzpyctx.html