美文网首页三维基因组三维基因组学
AB区室计算【juicer/homer】

AB区室计算【juicer/homer】

作者: caokai001 | 来源:发表于2020-09-14 10:57 被阅读0次

    Tips:

    目前主流的计算AB区室工具: juicer/homer/cworld

    https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02095-z#Sec11
    • juicer: 第一篇HiC文章实验室提供的工具,计算多个分辨率每个bin的PC1 值,但是输出只有一列,不是bedgraph 文件,不太人性化;同时不能判断AB区室,需要自己手动校正.
    • homer: NGS分析套件,也有HiC分析流程




    实例1:HiC-Pro+homer

    使用hicpro进行预处理,用homer进行AB区室分析,自己可以通过TSS进行判断AB区室
    (Tips:有时候也需要表观信号进行人工校正)

    • Step1:安装配置 homer
    ## 安装homer.pl
    wget http://homer.ucsd.edu/homer/configureHomer.pl
    perl configureHomer.pl -install
    echo PATH=/home/kcao/biosoft/homer/bin:$PATH
    source ~/.bashrc
    
    • Step2:下载基因组文件信息
    下载内置的hg19基因组文件
    perl configureHomer.pl -install hg19
    

    homer 内置一些基因组文件:比如hg19 genome


    image.png
    或者手动构建基因组信息(非常见的物种:Pig)

    大多数情况,直接指定基因组和注释文件位置就可以,比如annotation.pl 注释peak 文件.

    ## 1.gff 转换成gtf
    (base) [00:16:03] kcao@login:~/Work/Pig_NCBI/Pig_source/other_ref
    $ gffread   GCF_000003025.5_Sscrofa10.2_genomic.gff   -T  -o  GCF_000003025.5_Sscrofa10.2_genomic.gtf
    $ less /public/home/kcao/Work/Pig_NCBI/Pig_source/other_ref/GCF_000003025.5_Sscrofa10.2_genomic.gtf
    
    ## 2.基因组信息建立
    (base) [00:17:13] kcao@login:~/Work/Pig_NCBI/Pig_source/other_ref
    $ loadGenome.pl -name pig_10.2 -org null \
    -fasta /public/home/kcao/Work/Pig_NCBI/Pig_source/GCF_000003025.5_Sscrofa10.2_genomic.fa \
    -gtf /public/home/kcao/Work/Pig_NCBI/Pig_source/other_ref/GCF_000003025.5_Sscrofa10.2_genomic.gtf 
    
    • Step3: 修改输入文件格式,将hicpro 输出文件,转换成homer格式
      homer格式如下:
      image.png

    hicpro validpair 格式: https://nservant.github.io/HiC-Pro/RESULTS.html#list-of-valid-interaction-products

    #read name / chr_reads1 / pos_reads1 / strand_reads1 / chr_reads2 / pos_reads2 / strand_reads2 / fragment_size [/ allele_specific_tag]
    

    格式转换

    $ ls *.allValidPairs| while read sample ;do
    echo $sample; 
    ( nohup cat $sample | awk 'BEGIN{FS=OFS="\t"}{print NR,$2,$3,$4,$5,$6,$7}' > ${sample}.homer & );
    done
    
    • Step4:运行homer,计算AB区室
      运行三个分辨率
    for sample in *.homer; do
    echo "makeTagDirectory tag_${sample%%.*} -format HiCsummary ${sample};
    analyzeHiC tag_${sample%%.*} -cpu 16 -res 100000 -bgonly;
    analyzeHiC tag_${sample%%.*} -cpu 8 -res 100000 -norm -override >${sample%%.*}_whole_genome_matrix.txt;
    runHiCpca.pl ${sample%%.*}_100kb_pca tag_${sample%%.*} -cpu 16 -res 100000 -superRes 100000 -genome pig_10.2 -corrDepth 1;
    runHiCpca.pl ${sample%%.*}_100kb_pca tag_${sample%%.*} -cpu 16 -res 300000 -superRes 300000 -genome pig_10.2 -corrDepth 1;
    runHiCpca.pl ${sample%%.*}_500kb_pca tag_${sample%%.*} -cpu 16 -res 500000 -superRes 500000 -genome pig_10.2 -corrDepth 1;"|qsub -d ./ -N ${sample%%.*};
    done
    
    



    实例2:juicer + juicer

    juicer上游分析,下游用juicer 计算AB区室
    下面以Pig为例,染色体编号以NC命名的。

    • Step1: 产生需要计算AB区室的染色体列表
    (base) [14:06:58] kcao@login:~/Work/Pig_NCBI/Pig_HiC_PEFs/6_AB_juicer/PEFs
    $ cat /public/home/kcao/Work/Pig_NCBI/Pig_source/GCF_000003025.5_Sscrofa10.2_genomic.fa | grep ">" >chrom.txt
    $ cat chrom.txt | grep "NC" | awk '{print $1}' | sed 's/>//g' | grep -v "NC_000845.1" > chrom_2.txt
    
    • Step2:批量运行juicer eigenvector分析
    $ for sample in *.hic; do echo $sample;   mkdir -p ./${sample%%.*};( cat chrom_2.txt | while read i; do echo $i ; java -jar /public/home/kcao/tools/Juicer/juicer/CPU/juicer_tools.jar eigenvector -p KR ${sample} ${i} BP 500000 ./${sample%%.*}/${sample/.hic/}.${i}_500kb.KR.txt ; done & ) ;done 
    
    
    • Step3:将计算的PC1结果加上前面三列及其header变成bedgraph 格式
    ## 产生染色体长度
    (base) [15:50:34] kcao@login:~/Work/Pig_NCBI/Pig_source
    $ cat GCF_000003025.5_Sscrofa10.2_genomic.fa.fai | grep "NC" | grep -v "NC_000845.1" | awk '{print $1"\t"$2}' > GCF_000003025.5_Sscrofa10.2_genomic.fa.lensize
    
    ## 产生binsize为500k,bin区间信息
    $  bedtools makewindows -g GCF_000003025.5_Sscrofa10.2_genomic.fa.lensize -w 500000 > chrom.bin.500k.size
    
    ## binsize 按照染色体分文件
    $ mkdir sizeChrome && cd sizeChrome
    $ mv ../chrom.bin.500k.size ./
    
    $ cat chrom.bin.500k.size | awk '{print $0 >>$1}' 
    
    
    
    
    将bin区间添加到向量文件中
    ## Step1.创建最后的bedgraph文件[添加nohup与do冲突]
    $ ls | grep -e  "inter_30$" | while read sample ; do
    
    ( cat chrom_2.txt | while read i ; do
    echo $i;
    cat ./sizeChrome/${i} | paste -  ./${sample%%.*}/${sample/.hic/}.${i}_500kb.KR.txt >>${sample}.inter.clean.bedgraph; 
    done & )
    
    done
    
    
    
    ## Step2:添加头文件(注意换行符位置)
    $  ls | grep -e  "inter_30$" | while read id;
    do 
    sample=$(basename $id /);
    
    echo -e "track name=${sample}  yLineMark="0.0" alwaysZero=on maxHeightPixels=100:75:11 visibility=full viewLimits=-1:1 autoScale=on type=bedGraph" >${sample}.inter_bedgraph.header;
    cat ${sample}.inter_bedgraph.header ${sample}.inter.clean.bedgraph | grep -v "NaN" > ${sample}.inter.clean.checked.bedgraph ;
    done
    
    
    • Step4: 进行AB区室调整
      由于PC1为正数的不一定为A区室,需要进行判断;这里使用的TSS数目进行判断,
      tips: TSS 判断不一定准确,有时候需要加上表观信息进行判断
    ## TSS.bed
    $ cat GCF_000003025.5_Sscrofa10.2_genomic.gene.bed | awk 'BEGIN{FS=OFS="\t"}{if($6=="+"){print $1,$2,$2+1,$4} else {print $1,$3-1,$3,$4}}' > GCF_000003025.5_Sscrofa10.2_genomic.TSS.bed
    
    ## 计算区室TSS数目
    $ 
    sample=PEFs_rep2_inter_30.inter.clean.bedgraph
    refdir=/public/home/kcao/Work/Pig_NCBI/Pig_source/other_ref
    echo -e "chr\tpos_TSS_num\tneg_TSS_num\tstatu" > rep2_AB_status.txt
    
    cat chrom_2.txt | while read id ; do
    echo $id;
    grep $id $sample > ${id}.bedgraph
    # pos_TSS
    cat ${id}.bedgraph | awk '$4>0{print}' > ${id}.pos.bedgraph
    pos_tss_num=`intersectBed -a  ${id}.pos.bedgraph -b ${refdir}/GCF_000003025.5_Sscrofa10.2_genomic.TSS.bed -wa -wb| wc -l`
    
    # neg_TSS
    cat ${id}.bedgraph | awk '$4<0{print}' > ${id}.neg.bedgraph
    neg_tss_num=`intersectBed -a  ${id}.neg.bedgraph -b ${refdir}/GCF_000003025.5_Sscrofa10.2_genomic.TSS.bed -wa -wb| wc -l`
    
    if [ $pos_tss_num -gt $neg_tss_num ]
    then echo -e "$id\t$pos_tss_num\t$neg_tss_num\tTRUE" >> rep2_AB_status.txt
    else echo -e "$id\t$pos_tss_num\t$neg_tss_num\tFALSE" >> rep2_AB_status.txt
    fi
    
    rm -rf ${id}.pos.bedgraph ${id}.neg.bedgraph ${id}.bedgraph
    done
    

    查看需要转换成染色体信息,最后一列为FALSE表示需要PC1取反

    $ cat rep2_AB_status.txt
    chr pos_TSS_num neg_TSS_num statu
    NC_010443.4 2159    1413    TRUE
    NC_010444.3 1506    1624    FALSE
    NC_010445.3 1288    992 TRUE
    NC_010446.4 1283    782 TRUE
    NC_010447.4 936 884 TRUE
    NC_010448.3 1977    913 TRUE
    NC_010449.4 1344    1243    TRUE
    NC_010450.3 744 693 TRUE
    NC_010451.3 1215    984 TRUE
    NC_010452.3 420 604 FALSE
    NC_010453.4 257 536 FALSE
    NC_010454.3 550 1025    FALSE
    NC_010455.4 1483    898 TRUE
    NC_010456.4 1266    823 TRUE
    NC_010457.4 519 1117    FALSE
    NC_010458.3 632 165 TRUE
    NC_010459.4 285 748 FALSE
    NC_010460.3 464 398 TRUE
    NC_010461.4 704 591 TRUE
    NC_010462.2 14  7   TRUE
    
    (base) [11:39:08] kcao@login:~/Work/Pig_NCBI/Pig_HiC_PEFs/6_AB_juicer/PEFs
    $ cat rep2_AB_status.txt | grep "FALSE" | cut -f 1 | xargs| sed "s/ /|/g"
    NC_010443.4|NC_010446.4|NC_010447.4|NC_010450.3|NC_010451.3|NC_010456.4|NC_010457.4|NC_010460.3|NC_010461.4
    
    #
    (base) [11:30:51] kcao@login:~/Work/Pig_NCBI/Pig_HiC_PEFs/6_AB_juicer/PEFs
    $ cat PEFs_rep2_inter_30.inter.clean.checked.bedgraph | awk 'BEGIN{FS=OFS="\t"}$1~/(NC_010443.4|NC_010446.4|NC_010447.4|NC_010450.3|NC_010451.3|NC_010456.4|NC_010457.4|NC_010460.3|NC_010461.4)$/{print $1,$2,$3,-$4} ;$1!~/(NC_010443.4|NC_010446.4|NC_010447.4|NC_010450.3|NC_010451.3|NC_010456.4|NC_010457.4|NC_010460.3|NC_010461.4)$/ {print $0}' > PEFs_rep2_inter_30.inter.clean.checked.ok.bedgraph
    
    image.png

    思考:

    • 大部分情况,使用不同分辨率,homer可能计算结果一致性更好
    • 从人性化来看,homer 输出为bedgraph及其进行了AB校正.



    欢迎评论留言~😀

    相关文章

      网友评论

        本文标题:AB区室计算【juicer/homer】

        本文链接:https://www.haomeiwen.com/subject/fbwaektx.html