里面包括了输入输出文件的格式选择,结果过滤,新老版本注释信息的选择项等等,比官方的-h文档内容要多,遂备份
Options:
-chr <string> : Prepend 'string' to chromosome name (e.g. 'chr1' instead of '1'). Only on TXT output.
-classic : Use old style annotations instead of Sequence Ontology and Hgvs.
-csvStats <file> : Create CSV summary file.
-download : Download reference genome if not available. Default: true
-i <format> : Input format [ vcf, bed ]. Default: VCF.
-fileList : Input actually contains a list of files to process.
-o <format> : Ouput format [ vcf, gatk, bed, bedAnn ]. Default: VCF.
-s , -stats, -htmlStats : Create HTML summary file. Default is 'snpEff_summary.html'
-noStats : Do not create stats (summary) file
Results filter options:
-fi , -filterInterval <file> : Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)
-no-downstream : Do not show DOWNSTREAM changes
-no-intergenic : Do not show INTERGENIC changes
-no-intron : Do not show INTRON changes
-no-upstream : Do not show UPSTREAM changes
-no-utr : Do not show 5_PRIME_UTR or 3_PRIME_UTR changes
-no <effectType> : Do not show 'EffectType'. This option can be used several times.
Annotations options:
-cancer : Perform 'cancer' comparisons (Somatic vs Germline). Default: false
-cancerSamples <file> : Two column TXT file defining 'oringinal \t derived' samples.
-fastaProt <file> : Create an output file containing the resulting protein sequences.
-formatEff : Use 'EFF' field compatible with older versions (instead of 'ANN').
-geneId : Use gene ID instead of gene name (VCF output). Default: false
-hgvs : Use HGVS annotations for amino acid sub-field. Default: true
-hgvsOld : Use old HGVS notation. Default: false
-hgvs1LetterAa : Use one letter Amino acid codes in HGVS notation. Default: false
-hgvsTrId : Use transcript ID in HGVS notation. Default: false
-lof : Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.
-noHgvs : Do not add HGVS annotations.
-noLof : Do not add LOF and NMD annotations.
-noShiftHgvs : Do not shift variants according to HGVS notation (most 3prime end).
-oicr : Add OICR tag in VCF file. Default: false
-sequenceOntology : Use Sequence Ontology terms. Default: true
Generic options:
-c , -config : Specify config file
-configOption name=value : Override a config file option
-d , -debug : Debug mode (very verbose).
-dataDir <path> : Override data_dir parameter from config file.
-download : Download a SnpEff database, if not available locally. Default: true
-nodownload : Do not download a SnpEff database, if not available locally.
-h , -help : Show this help and exit
-noLog : Do not report usage statistics to server
-q , -quiet : Quiet mode (do not show any messages or errors)
-v , -verbose : Verbose mode
-version : Show version number and exit
Database options:
-canon : Only use canonical transcripts.
-canonList <file> : Only use canonical transcripts, replace some transcripts using the 'gene_id transcript_id' entries in <file>.
-interaction : Annotate using inteactions (requires interaciton database). Default: true
-interval <file> : Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)
-maxTSL <TSL_number> : Only use transcripts having Transcript Support Level lower than <TSL_number>.
-motif : Annotate using motifs (requires Motif database). Default: true
-nextProt : Annotate using NextProt (requires NextProt database).
-noGenome : Do not load any genomic database (e.g. annotate using custom files).
-noExpandIUB : Disable IUB code expansion in input variants
-noInteraction : Disable inteaction annotations
-noMotif : Disable motif annotations.
-noNextProt : Disable NextProt annotations.
-onlyReg : Only use regulation tracks.
-onlyProtein : Only use protein coding transcripts. Default: false
-onlyTr <file.txt> : Only use the transcripts in this file. Format: One transcript ID per line.
-reg <name> : Regulation track to use (this option can be used add several times).
-ss , -spliceSiteSize <int> : Set size for splice sites (donor and acceptor) in bases. Default: 2
-spliceRegionExonSize <int> : Set size for splice site region within exons. Default: 3 bases
-spliceRegionIntronMin <int> : Set minimum number of bases for splice site region within intron. Default: 3 bases
-spliceRegionIntronMax <int> : Set maximum number of bases for splice site region within intron. Default: 8 bases
-strict : Only use 'validated' transcripts (i.e. sequence has been checked). Default: false
-ud , -upDownStreamLen <int> : Set upstream downstream interval length (in bases)
顺便说一下snpeff 4.3t的数据库建造过程
配置文件设置
1.在~/snpEff/目录中,创建一个文件夹:data
2.在~/snpEFF/data目录下,创建一个文件夹:AT_10/
3.将GFF文件和基因组文件更名为genes.gff sequences.fa 后放入AT_10 中
4.编辑~/snpEff/snpEff.config文件
添加一行 AT_10.genome: AT_10
5.将snpEff.jar和snpEff.config 这两个挪到data文件夹下面
构建数据库
在data文件下运行代码
java -jar snpEff.jar build -gff3 -v AT_10
最后在AT_10文件下看到snpEffectPredictor.bin文件就表示数据库构建成功了。
网友评论