美文网首页群体遗传二代测序生物信息
snpEff使用说明(下)-SnpEff注释SNP/INDEL

snpEff使用说明(下)-SnpEff注释SNP/INDEL

作者: APExBIO | 来源:发表于2020-11-05 16:21 被阅读0次

    上一期,给大家介绍了SnpEff注释数据库。这一期着重介绍SnpEff的命令,最后一期介绍注释结果解析

    准备文件

    1. 已经注释好的物种SnpEff注释库- GRCh37.100 (~/snpeff/genome/GRCh37.100 详细过程参照说明一)
    2. 需要注释的SNP/INDEL文件,格式VCF (任意文件夹 ~/database/SNP/human_GRCh37.vcf.gz)

    🎃1 快速注释的代码很简单,一步搞定

    snpeffDir=~/snpeff
    snpEff=${snpeffDir}/snpEff.jar
    cd  ~/database/SNP/
    ##常规注释
    nohup java -Xmx10G -jar $snpEff GRCh37.100 human_GRCh37.vcf.gz  > human_GRCh37_snpeff.snp.vcf -csvStats human_GRCh37_snpeff.snp.csv -stats human_GRCh37_snpeff.snp.html &
    

    解说:注释的文件human_GRCh37_snpeff.snp.vcf 有详细信息, human_GRCh37_snpeff.snp.html链接有统计图片,该链接在Microsoft Edge显示图片失败,如果出现这种情况,可以换一个浏览器打开。

    🎃2 对特定区间注释

    过滤结果的选项(与命令ann配合使用):
    -fi , -filterInterval <file> : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
    -no-downstream : Do not show DOWNSTREAM changes
    -no-intergenic : Do not show INTERGENIC changes
    -no-intron : Do not show INTRON changes
    -no-upstream : Do not show UPSTREAM changes
    -no-utr : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
    -no EffectType : Do not show 'EffectType'. This option can be used several times.

    #例:展示基因内注释
    java -Xmx10G -jar $snpEff ann -no-intron -no-utr -no-downstream -no-upstream -no-intergenic GRCh37.100 human_GRCh37_snpeff.snp.vcf.gz  > RNA-H-DL_snpeff.snp.gene.vcf -csvStats human_GRCh37_snpeff.csv -stats human_GRCh37_snpeff.html
    

    注释常规选项解说
    Options:
    -chr <string> : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). 染色体输出前缀
    -classic : Use old style annotations instead of Sequence Ontology and Hgvs. 使用旧的注释格式,现在使用的Sequence Ontology, 新旧示例如下
    -download : Download reference genome if not available. Default: true
    -i <format> : Input format [ vcf, bed ]. Default: VCF.
    -fileList : Input actually contains a list of files to process.
    -o <format> : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
    -s , -stats : Name of stats file (summary). Default is 'snpEff_summary.html'
    -noStats : Do not create stats (summary) file
    -csvStats : Create CSV summary file instead of HTML

    常用选项-chr,-classic,-csvStats
    -classic

    Type Classic
    coding_sequence_variant CDS
    chromosome CHROMOSOME_LARGE DELETION
    coding_sequence_variant CODON_CHANGE
    inframe_insertion CODON_INSERTION
    disruptive_inframe_insertion CODON_CHANGE_PLUS CODON_INSERTION
    inframe_deletion CODON_DELETION
    disruptive_inframe_deletion CODON_CHANGE_PLUS CODON_DELETION
    downstream_gene_variant DOWNSTREAM
    exon_variant EXON
    exon_loss_variant EXON_DELETED
    frameshift_variant FRAME_SHIFT
    gene_variant GENE
    intergenic_region INTERGENIC
    conserved_intergenic_variant INTERGENIC_CONSERVED
    intragenic_variant INTRAGENIC
    intron_variant INTRON
    conserved_intron_variant INTRON_CONSERVED
    miRNA MICRO_RNA
    missense_variant NON_SYNONYMOUS_CODING
    initiator_codon_variant NON_SYNONYMOUS_START
    stop_retained_variant NON_SYNONYMOUS_STOP
    rare_amino_acid_variant RARE_AMINO_ACID
    splice_acceptor_variant SPLICE_SITE_ACCEPTOR
    splice_donor_variant SPLICE_SITE_DONOR
    splice_region_variant SPLICE_SITE_REGION
    splice_region_variant SPLICE_SITE_BRANCH
    splice_region_variant SPLICE_SITE_BRANCH_U12
    stop_lost STOP_LOST
    5_prime_UTR_premature start_codon_gain_variant START_GAINED
    start_lost START_LOST
    stop_gained STOP_GAINED
    synonymous_variant SYNONYMOUS_CODING
    start_retained SYNONYMOUS_START
    stop_retained_variant SYNONYMOUS_STOP
    transcript_variant TRANSCRIPT
    regulatory_region_variant REGULATION
    upstream_gene_variant UPSTREAM
    3_prime_UTR_variant UTR_3_PRIME
    3_prime_UTR_truncation + exon_loss UTR_3_DELETED
    5_prime_UTR_variant UTR_5_PRIME
    5_prime_UTR_truncation + exon_loss_variant UTR_5_DELETED

    部分变异注释:密码子变异(initiator_codon_variant),下游基因变异(downstream_gene_variant),基因间变异(intergenic_region),基因内变异(intragenic_variant),内含子变异(intron_variant),错义突变(missense_variant),非编码转录外显子突变(non_coding_transcript_exon_variant),剪切受体突变(splice_acceptor_variant),剪切供体突变(splice_donor_variant),剪切位点区域变异(splice_region_variant),终止密码子获(stop_gained),终止密码子丢失(stop_lost),终止密码子保留(stop_retained_variant),同义突变(synonymous_variant ),上游基因突变(upstream_gene_variant),5_prime_UTR_premature_start_codon_gain_variant,5_prime_UTR(5_prime_UTR_variant),3_prime_UTR变异(3_prime_UTR_variant)。

    🎃3 注释文件的参数设置

    Annotations options:
    -cancer : Perform 'cancer' comparisons (Somatic vs Germline). Default: false
    -cancerSamples <file> : Two column TXT file defining 'original \t derived' samples.
    -formatEff : Use 'EFF' field compatible with older versions (instead of 'ANN').
    -geneId : Use gene ID instead of gene name (VCF output). Default: false
    -hgvs : Use HGVS annotations for amino acid sub-field. Default: true
    -lof : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
    -noHgvs : Do not add HGVS annotations.
    -noLof : Do not add LOF and NMD annotations.
    -noShiftHgvs : Do not shift variants according to HGVS notation (most 3prime end).
    -oicr : Add OICR tag in VCF file. Default: false
    -sequenceOntology : Use Sequence Ontology terms. Default: true (跟-classic对应)

    🎃4 注释典型转录本 (canonical transcripts)

    结果会输出gene name, geneID, trianscriptId, cdsLength。

    java -Xmx10G -jar $snpEff -v -canon GRCh37.100 human_GRCh37.vcf.gz > human_GRCh37ann.canon.vcf
    
    image.png

    snpEff的主要功能及解析就介绍到这里,如果大家有什么疑问,可以在评论下方留言哦🧶

    相关文章

      网友评论

        本文标题:snpEff使用说明(下)-SnpEff注释SNP/INDEL

        本文链接:https://www.haomeiwen.com/subject/tlwovktx.html