2023-02-03 | snpeff帮助文档（报错版）

作者: 汪大山 | 来源:发表于2023-02-02 10:15 被阅读0次

2023-02-03
snpEff注释vcf
SnpEff安装和下载
snpEff
snpEff 使用
帮助文档
帮助文档
帮助文档
帮助文档
帮助文档

里面包括了输入输出文件的格式选择，结果过滤，新老版本注释信息的选择项等等，比官方的-h文档内容要多，遂备份

Options:
        -chr <string>                   : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
        -classic                        : Use old style annotations instead of Sequence Ontology and Hgvs.
        -csvStats <file>                : Create CSV summary file.
        -download                       : Download reference genome if not available. Default: true
        -i <format>                     : Input format [ vcf, bed ]. Default: VCF.
        -fileList                       : Input actually contains a list of files to process.
        -o <format>                     : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
        -s , -stats, -htmlStats         : Create HTML summary file.  Default is 'snpEff_summary.html'
        -noStats                        : Do not create stats (summary) file

Results filter options:
        -fi , -filterInterval  <file>   : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
        -no-downstream                  : Do not show DOWNSTREAM changes
        -no-intergenic                  : Do not show INTERGENIC changes
        -no-intron                      : Do not show INTRON changes
        -no-upstream                    : Do not show UPSTREAM changes
        -no-utr                         : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
        -no <effectType>                : Do not show 'EffectType'. This option can be used several times.

Annotations options:
        -cancer                         : Perform 'cancer' comparisons (Somatic vs Germline). Default: false
        -cancerSamples <file>           : Two column TXT file defining 'oringinal \t derived' samples.
        -fastaProt <file>               : Create an output file containing the resulting protein sequences.
        -formatEff                      : Use 'EFF' field compatible with older versions (instead of 'ANN').
        -geneId                         : Use gene ID instead of gene name (VCF output). Default: false
        -hgvs                           : Use HGVS annotations for amino acid sub-field. Default: true
        -hgvsOld                        : Use old HGVS notation. Default: false
        -hgvs1LetterAa                  : Use one letter Amino acid codes in HGVS notation. Default: false
        -hgvsTrId                       : Use transcript ID in HGVS notation. Default: false
        -lof                            : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
        -noHgvs                         : Do not add HGVS annotations.
        -noLof                          : Do not add LOF and NMD annotations.
        -noShiftHgvs                    : Do not shift variants according to HGVS notation (most 3prime end).
        -oicr                           : Add OICR tag in VCF file. Default: false
        -sequenceOntology               : Use Sequence Ontology terms. Default: true

Generic options:
        -c , -config                 : Specify config file
        -configOption name=value     : Override a config file option
        -d , -debug                  : Debug mode (very verbose).
        -dataDir <path>              : Override data_dir parameter from config file.
        -download                    : Download a SnpEff database, if not available locally. Default: true
        -nodownload                  : Do not download a SnpEff database, if not available locally.
        -h , -help                   : Show this help and exit
        -noLog                       : Do not report usage statistics to server
        -q , -quiet                  : Quiet mode (do not show any messages or errors)
        -v , -verbose                : Verbose mode
        -version                     : Show version number and exit

Database options:
        -canon                       : Only use canonical transcripts.
        -canonList <file>            : Only use canonical transcripts, replace some transcripts using the 'gene_id       transcript_id' entries in <file>.
        -interaction                 : Annotate using inteactions (requires interaciton database). Default: true
        -interval <file>             : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
        -maxTSL <TSL_number>         : Only use transcripts having Transcript Support Level lower than <TSL_number>.
        -motif                       : Annotate using motifs (requires Motif database). Default: true
        -nextProt                    : Annotate using NextProt (requires NextProt database).
        -noGenome                    : Do not load any genomic database (e.g. annotate using custom files).
        -noExpandIUB                 : Disable IUB code expansion in input variants
        -noInteraction               : Disable inteaction annotations
        -noMotif                     : Disable motif annotations.
        -noNextProt                  : Disable NextProt annotations.
        -onlyReg                     : Only use regulation tracks.
        -onlyProtein                 : Only use protein coding transcripts. Default: false
        -onlyTr <file.txt>           : Only use the transcripts in this file. Format: One transcript ID per line.
        -reg <name>                  : Regulation track to use (this option can be used add several times).
        -ss , -spliceSiteSize <int>  : Set size for splice sites (donor and acceptor) in bases. Default: 2
        -spliceRegionExonSize <int>  : Set size for splice site region within exons. Default: 3 bases
        -spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases
        -spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases
        -strict                      : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
        -ud , -upDownStreamLen <int> : Set upstream downstream interval length (in bases)

顺便说一下snpeff 4.3t的数据库建造过程

配置文件设置

1.在~/snpEff/目录中，创建一个文件夹：data
2.在~/snpEFF/data目录下，创建一个文件夹：AT_10/
3.将GFF文件和基因组文件更名为genes.gff sequences.fa 后放入AT_10 中
4.编辑~/snpEff/snpEff.config文件
   添加一行 AT_10.genome: AT_10 
5.将snpEff.jar和snpEff.config 这两个挪到data文件夹下面

构建数据库

在data文件下运行代码
java -jar snpEff.jar build -gff3 -v AT_10

最后在AT_10文件下看到snpEffectPredictor.bin文件就表示数据库构建成功了。

强调一点：4.3t和目前的5.1在构建数据库的地方不太一样，5.1还需要基因组的cds序列和蛋白序列

网友评论

本文标题：2023-02-03 | snpeff帮助文档（报错版）

本文链接：https://www.haomeiwen.com/subject/gsvwhdtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

2023-02-03 | snpeff帮助文档（报错版）

顺便说一下snpeff 4.3t的数据库建造过程

配置文件设置

构建数据库

强调一点：4.3t和目前的5.1在构建数据库的地方不太一样，5.1还需要基因组的cds序列和蛋白序列

相关文章

2023-02-03

snpEff注释vcf

SnpEff安装和下载

snpEff

snpEff 使用

帮助文档

帮助文档

帮助文档

帮助文档

帮助文档

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读