Peak annotation

作者: 余绕 | 来源:发表于2022-03-09 15:39 被阅读0次

Peak annotation
什么是PCU，ACU
2019-01-16
万科V_learn万科父母学院_家庭双人工程打卡，seven30
chip-seq信息分析
你了解注解内的@interface List么
Annotation Processor
memory_get_peak_usage-返回分配给 PHP
Java注解(Annotation)详解
强化四二分答案

To annotate the peaks generated by Macs2， the following codes can be used。
Peak注释需要考虑一下几种情况：

Gene是在+还是-链上，不同链方向不一致，启动子区间设置时候需要注意，由于基因无论是在正义链还是反义链其注释的坐标都是5‘->3’,但是基因是有方向的，所以+链的promoter与-promoter计算是有区别的（下图a）。
注释的适合需要peak与基因overlap的如下三种种情况（下图b）；这是后续写注释程序的逻辑所在。
Gene in peak，这种很好判断。
Gene 与peak overlap，这种一般考虑与peak的overlap>=50%。
没有overlap，gene在peak的上游或者下游（3的对立面是1和2两种情况）。

image.png

open PEAK, "$ARGV[0]";

while(<PEAK>){
    chomp;
    if(/^Chr/){
        @temp=split /\t/,$_;
        $chr=$temp[0];
        $chr=~s/ *//;
        $hash_peak{$chr}{"$temp[1]\t$temp[2]\t$temp[3]"}=$temp[5];
    }
}

#foreach $chr(keys %hash_peak){
#       foreach(keys %{$hash_peak{$chr}}){
#       print"$chr\t$_\t$hash_peak{$chr}{$_}\n";
#   }
#}



open GFF3,"$ARGV[1]";

while(<GFF3>){
    chomp;
    if(/^Chr/){
    @temp=split /\t/,$_;
    $GENE{$temp[2]}=[$temp[0],$temp[3],$temp[4],$temp[5]];
    
}
}

$gene_in_peak="Gene in peak";
$gene_body="Gene body";
$promoter="Promoter";
my %peak_anno=();

foreach (keys %GENE){
    
    $LOC=$_;
    my($chr,$gene_start,$gene_end,$strand)=@{$GENE{$LOC}};
    
    foreach(keys %{$hash_peak{$chr}}){
    my($peak_start,$peak_end,$peak_length)=split /\t/,$_;
        if($strand eq "+"){
            my($a1,$a2,$b1,$b2)=($gene_start,$gene_end,$peak_start,$peak_end); #gene_body
            if($a1>=$b1 and $a1<=$b2 and $a2>$b1 and $a2<=$b2){
                $peak_anno{$chr}{"$peak_start\t$peak_end"}.="$LOC\t$gene_body\t$gene_in_peak\t";
                
            }
            elsif(!($a1>$b2 or $a2<$b1)){
                @compare=sort{$a<=>$b}($a1,$a2,$b1,$b2);
                if(($compare[2]-$compare[1])/$peak_length>=0.5){
                    $peak_anno{$chr}{"$peak_start\t$peak_end"}.="$LOC\t$gene_body\t";
                    
                }
                
            }
            
            my($a1,$a2,$b1,$b2)=($gene_start-2000,$gene_start,$peak_start,$peak_end);  #gene_2k_promoter
            
            if($a1>=$b1 and $a1<=$b2 and $a2>$b1 and $a2<=$b2){
                $peak_anno{$chr}{"$peak_start\t$peak_end"}.="$LOC\t$promoter\t";
                
            }
            elsif(!($a1>$b2 or $a2<$b1)){
                @compare=sort{$a<=>$b}($a1,$a2,$b1,$b2);
                if(($compare[2]-$compare[1])/$peak_length>=0.5){
                    $peak_anno{$chr}{"$peak_start\t$peak_end"}.="$LOC\t$promoter\t";
                    
                }
                
            }
            
            
        }
        
        if($strand eq "-"){
            
            my($a1,$a2,$b1,$b2)=($gene_start,$gene_end,$peak_start,$peak_end); #gene_body
            if($a1>=$b1 and $a1<=$b2 and $a2>$b1 and $a2<=$b2){
                $peak_anno{$chr}{"$peak_start\t$peak_end"}.="$LOC\t$gene_body\t$gene_in_peak\t";
                
            }
            elsif(!($a1>$b2 or $a2<$b1)){
                @compare=sort{$a<=>$b}($a1,$a2,$b1,$b2);
                if(($compare[2]-$compare[1])/$peak_length>=0.5){
                    $peak_anno{$chr}{"$peak_start\t$peak_end"}.="$LOC\t$gene_body\t";
                    
                }
                
            }
            
        my($a1,$a2,$b1,$b2)=($gene_end+2000,$gene_end,$peak_start,$peak_end);  #gene_2k_promoter

            if($a1>=$b1 and $a1<=$b2 and $a2>$b1 and $a2<=$b2){
                $peak_anno{$chr}{"$peak_start\t$peak_end"}.="$LOC\t$promoter\t";
                
            }
            elsif(!($a1>$b2 or $a2<$b1)){
                @compare=sort{$a<=>$b}($a1,$a2,$b1,$b2);
                if(($compare[2]-$compare[1])/$peak_length>=0.5){
                    $peak_anno{$chr}{"$peak_start\t$peak_end"}.="$LOC\t$promoter\t";
                    
                }
                
            }
            
            
            
        }
        
        
        
    }
}

open OU,">$ARGV[2]";

foreach $chr (sort keys %peak_anno ){
    
    foreach(keys %{$peak_anno{$chr}}){
        
        print OU "$chr\t$_\t$peak_anno{$chr}{$_}\n";
        
    }
    
}

运行：perl new_5.pl WT_K5_rep1_peaks.xls GENE_FF3 Peak_anno_v3.txt

Prepare the GENE coordinates filels for all.gff3 files.

产生GENE_FF3

awk -F "\t" '$3=="gene"{print $0}' all.gff3 |awk -F ";" '{print $1}' |sed s/ID=// |awk  '{print $1,$3,$9,$4,$5,$7}' OFS="\t"  >  GENE_FF3

To test whether the perl script was correct or not, we used the bedtools to annotate the same peak file and did the comparasions.
Prepare the the bedfile for bedtools. As the bedfile was 0-based and half-closed-half-open interval, thus the left coordinate should be 1 bp less （$4-1）.

产生GENE_FF3.bed file

awk '{print $1,$4-1,$5,$2,$3}' GENE_FF3 >GENE_FF3.bed

bedtools进行注释(注意要用macs产生的peak 文件)：

bedtools intersect -f 0.5 -a WT_K5_rep1_peaks.narrowPeak  -b  GENE_FF3.bed  -wa -wb >Bedttols_annotaion_only_for_gene

对二者进行注释的结果进行比较，由于bedtoo注释只考虑了Gene body没有考虑promoter，所以只对Gene body注释的基因进行比较，结果如下:

image.png

从结果看，基本符合，由于bedtools这里只考虑了peak 与Gene 有至少>=50%的overlap部分（下图情况1，2），而对于Gene in peak（下图情况3）的这种情况没有计算（wa wb互换且，-F 1），所以有一些出入。

image.png

网友评论

本文标题：Peak annotation

本文链接：https://www.haomeiwen.com/subject/kyutdrtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Peak annotation

Prepare the GENE coordinates filels for all.gff3 files.

产生GENE_FF3.bed file

相关文章

Peak annotation

什么是PCU，ACU

2019-01-16

万科V_learn万科父母学院_家庭双人工程打卡，seven30

chip-seq信息分析

你了解注解内的@interface List么

Annotation Processor

memory_get_peak_usage-返回分配给 PHP

Java注解(Annotation)详解

强化四二分答案

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

重点

DNase-seq