美文网首页funny生物信息生信软件使用
Annovar注释细节说明(一)

Annovar注释细节说明(一)

作者: 京古 | 来源:发表于2019-10-30 17:39 被阅读0次

    Annovar注释结果中Func.refGeneWithVer类型

    annovar注释结果中,一些列中的内容需要认真研究,如下是常用的信息:

    Value Default precedence Explanation Sequence Ontology
    exonic 1 variant overlaps a coding exon_variant (SO:0001791)
    splicing 1 variant is within 2-bp of a splicing junction (use -splicing_threshold to change this) splicing_variant (SO:0001568)
    ncRNA 2 variant overlaps a transcript without coding annotation in the gene definition (see Notes below for more explanation) non_coding_transcript_variant (SO:0001619)
    UTR5 3 variant overlaps a 5' untranslated region 5_prime_UTR_variant (SO:0001623)
    UTR3 3 variant overlaps a 3' untranslated region 3_prime_UTR_variant (SO:0001624)
    intronic 4 variant overlaps an intron intron_variant (SO:0001627)
    upstream 5 variant overlaps 1-kb region upstream of transcription start site upstream_gene_variant (SO:0001631)
    downstream 5 variant overlaps 1-kb region downtream of transcription end site (use -neargene to change this) downstream_gene_variant (SO:0001632)
    intergenic 6 variant is in intergenic region intergenic_variant (SO:0001628)
    注释的优先级如下:

    The value of the first column takes the following precedence (as of December 2010 and later version of ANNOVAR): exonic = splicing > ncRNA> > UTR5/UTR3 > intron > upstream/downstream > intergenic. The precedence defined above is used to decide what function to print out when a variant fit multiple functional categories.
    If the users want to have all functional consequences printed out (rather than just the most important one defined by the precedence above), the --separate argument should be used. In this case, several output lines may be present for each variant, representing several possible functional consequences.
    默认情况下,会根据上面的优先级只显示一个注释类型,也就是说注释结果文件中每一行只是一个位点的一种变异类型。如果想要把多个变异类型都显示出来,加上 --separate 参数,那么结果文件中就是好几行都表示同一个突变位点的几个不同变异类型。
    另外,这个优先级顺序是可以自己修改的,用 -precedence 参数

    每一个注释类型的详细解释:

    (1) the "exonic" here refers only to coding exonic portion , but not UTR portion, as there are two keywords (UTR5, UTR3) that are specifically reserved for UTR annotations. "exonic" 只是编码区的外显子,不包括UTR区域。
    (2) "splicing" in ANNOVAR is defined as variant that is within 2-bp away from an exon/intron boundary by default, but the threshold can be changed by the --splicing_threshold argument. Before Feb 2013, if "exonic,splicing" is shown, it means that this is a variant within exon but close to exon/intron boundary; this behavior is due to historical reason, when a user requested that exonic variants near splicing sites be annotated with splicing as well. However, I continue to get user emails complaining about this behavior despite my best efforts to put explanation in the ANNOVAR website with details. Therefore, starting from Feb 2013 , "splicing" only refers to the 2bp in the intron that is close to an exon, and if you want to have the same behavior as before, add -exonicsplicing argument. "splicing"是内含子中的,靠近剪接点 2bp的序列,这个2bp可以设置其他值。此外,-exonicsplicing参数可以回到以前注释版本。
    (3) If a variant is located in both 5' UTR and 3' UTR region (possibly for two different genes), then the "UTR5,UTR3" will be printed as the output.如果注释为"UTR5,UTR3",表示该位点可能同时在2个不同基因中,一个是UTR5,另一个是UTR3区域。
    (4) The term "upstream" and "downstream" is defined as 1-kb away from transcription start site or transcription end site, respectively, taking in account of the strand of the mRNA; the --neargene threshold can be used to adjust this threshold."upstream" and "downstream"指的是转录起始位点和转录终止位点上游/下游 1kb的区间内。
    (5) Technical Notes: ncRNA above refers to RNA without coding annotation. It does not mean that this is a RNA that will never be translated; it merely means that the user-selected gene annotation system was not able to give a coding sequence annotation. It could still code protein products and may have such annotations in future versions of gene annotation or in another gene annotation system. For example, BC039000 is regarded as ncRNA by ANNOVAR when using UCSC Known Gene annotation, but it is regarded as a protein-coding gene by ANNOVAR when using ENSEMBL annotation. If the goal of the user is to find known (well-annotated) microRNA or other known (well-annotated) non-coding RNA, then the region-based annotation should be used and the wgRNA track should be selected. Read instructions here.需要指出的是,"ncRNA"并不是说这个RNA是 non-conding,而是当前的注释系统中没有该RNA coding的注释信息,如果用其他注释系统就有可能注释为coding。
    (6) Technical Notes: if the first codon of a transcript is deleted, it will be reported as wholegene deletion by ANNOVAR because the gene cannot be translated.
    (7) If a variant is located in both downstream and upstream region (possibly for 2 different genes), then the "upstream,downstream" will be printed as the output. In 2011 June version of ANNOVAR, the splicing annotation is improved. If the splicing site is in intron, then all isoforms and the corresponding base change will be printed. For example:

    splicing SMS(NM_004595:c.447+2T>G) X 21895357 21895357 T G hetero 8 15
    splicing DMD(NM_004011:c.48+1A>C) X 31803228 31803228 T G homo 117 30
    splicing BAGE(NM_001187:c.14+1A>G),BAGE4(NM_181704:c.14+1A>G),BAGE5(NM_182484:c.14+1A>G) 21 10120594 10120594 T C hetero 66 53
    

    (8)如果转录本的第一个密码子发生了deletion,则annovar会注释为整个基因都deletion。
    Technical Notes: if the first codon of a transcript is deleted, it will be reported as wholegene deletion by ANNOVAR because the gene cannot be translated.

    下面用一个例子说明:

    image.png

    SNP1 is an intergenic variant, as it is >1kb away from any gene, 离两边的基因大于1kb距离
    SNP2 is a downstream variant, as it is 1kb from the 3'end of the NADK gene; 注意转录方向
    SNP3 is a UTR3 variant; 图中外显子和UTR都是蓝色柱子,但是UTR的柱子低一点
    SNP4 is an intronic variant;
    SNP5 is an exonic variant.
    deletion的情况与SNP的一样:
    Deletion 1 is an intergenic variant;
    deletion 2 is a downstream variant;
    deletion3 is a UTR3 variant;
    deletion 4 overlaps both with UTR3 and intron, and based on the precedence rule, it is a UTR3 variant; 同时注释到UTR3和intron中,但是根据优先级会默认只保留UTR3的注释
    deletion 5 is an intronic variant;
    deletion6 overlaps with both an exon and an intron, and based on the precedence rule, it is an exonic variant.根据优先级,注释保留了exonic。

    关于文件中基因名称的确定:

    (1)annovar根据数据库(such as RefSeq, UCSC Gene and Ensembl Gene)中定义的名称注释基因名称,这些数据库中的名称一般是用户提供的;
    (2)对于一些复杂的情况:
    ①如果一个基因同时注释到 coding and non-coding (multiple transcripts, some coding, some non-coding),则默认注释是 coding;
    ② If a gene or a transcript has one or several non-coding definitions but without coding definition, it will be regarded as ncRNA in annotation output.
    ③ If a transcript maps to multiple locations as "coding transcripts", but some with complete ORF, some without complete ORF (that is, with premature stop codon), then the ones without complete ORF will be ignored. 如果转录本比对到多个 conding transcripts,但是存在没有完整ORF的,默认忽略没有完整ORF的注释;
    ④ If a transcript maps to multiple locations, all as "coding transcripts", but none has a complete ORF, then this transcript will not be used in exonic_variant_function annotation and the corresponding annotation will be marked as "UNKNOWN".
    ⑤ NEW in July 2014: If a transcript maps to multiple genomic locations, all mapping wil be used in the annotation process. Previously, only the "most likely" mapping will be used in annotation.

    原文路径:
    http://annovar.openbioinformatics.org/en/latest/user-guide/gene/

    相关文章

      网友评论

        本文标题:Annovar注释细节说明(一)

        本文链接:https://www.haomeiwen.com/subject/yazovctx.html