GATK简介

作者: Greatji | 来源:发表于2020-02-08 14:47 被阅读0次

    GATK (全称The Genome Analysis Toolkit)是Broad Institute开发的用于二代重测序数据分析的一款软件,是基因分析的工具集。在4.0以后,GATK包含有Picard工具集,所有Picard工具都能够使用GATK完成。

    1.软件运行的要求:

    大部分的GATK4工具只需要简单的软件运行条件:Unix-style OS and Java 1.8。但是也有一些哦那个局需要额外的R或Python。有一些工具需要生成图时会用到R(gsalib, ggplot2, reshape, gplots)。gatk-launch 需要用到Python。

    2.下载:

    从https://github.com/broadinstitute/gatk/releases后加压就是以下几个文件和文件夹:

    gatk gatk-package-[version]-local.jar gatk-package-[version]-spark.jar README.md

    gatk-package-[version]-local.jar主要是用来运行的jar。至于gatk-package-[version]-spark.jar是spark运行所需。

    3.安装:

    GATK不需要安装,将gatk文件export PATH=$PATH:/path/to/gatk-package/gatk中,就行。/path/to/gatk-package/gatk就是gatk所在的目录,必须保证gatk和*.jar放在一起。

    4.测试运行:

    ./gatk --help

    会输出一系列帮助信息。

    5. 运行GATK和Picard命令:

    常见的命令按照以下句势:

    gatk [--java-options "jvm args like -Xmx4G go here"] 工具名称 [GATK args go here]

    例如:

    gatk --java-options "-Xmx8G" HaplotypeCaller -R reference.fasta -I input.bam -O output.vcf

    简单的Picard命令:

    gatk ValidateSamFile -I input.bam -MODE SUMMA

    5.1 GATK的命令详细说明:

    5.1.1 Java基础命令

    GATK也遵循Java的运行模式:

    java -jar program.jar [program argument

    5.1.2 gatk包装后的客户端命令

    如果使用gatk包装后的客户端,那么就不需要使用java的命名了,直接运行

    gatk [program arguments]

    5.1.3 添加GATK参数

    唯一需要的是工具名及工具的命令

    gatk ToolName [tool arguments]

    再工具名后,需要添加工具命令

    gatk ToolName --argument-name value

    后面的命令可以使用--,也可以使用-,命令是没有顺序的,如果有二进制的命名是,是需要Flags的,例如 TRUE or FALSE,例如--create-output-variant-index FALSE

    5.1.4 添加Java参数

    当使用java命名时,需要在java-jar之间添加。例如使用-Xmx指定需要的内存大小。

    java -Xmx4G -jar program.jar [program arguments]

    但是如果需要使用gatk的程序命令制定内存大小时,则需要--java-options,例如:

    gatk --java-options "-Xmx4G" [program arguments]

    但是如果需要使用两个命令时,则可以直接在双引号内加入进去就可以。

    gatk --java-options "-Xmx4G -XX:+PrintGCDetails" [program arguments]

    5.1.5 添加Spark参数

    Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。Spark 是一种与 Hadoop 相似的开源集群计算环境,但是两者之间还存在一些不同之处。

    不太懂,无法提供更多信息。

    5.16 真实举例

    可以在一条命令行也可以使用“\”进行分行输入

    gatk --java-options "-Xmx4G" HaplotypeCaller \

    -R reference.fasta \

    -I sample1.bam \

    -O variants.g.vcf \

    -ERC GVCF

    如果是exon测序的话,就需要提供目标区域,这个时候就可以加上一个-L

    gatk --java-options "-Xmx4G" HaplotypeCaller \

    -R reference.fasta \

    -I sample1.bam \

    -O variants.g.vcf \

    -ERC GVCF \

    -L exome_intervals.list

    如果reads中存在问题,可以加上--read-filter过滤reads

    gatk --java-options "-Xmx4G" HaplotypeCaller \

    -R reference.fasta \

    -I sample1.bam \

    -O variants.g.vcf \

    -ERC GVCF \

    -L exome_intervals.list \

    --read-filter OverclippedReadFilter

    如果想减少运行结果信息,加上 --QUIET

    gatk --java-options "-Xmx4G" HaplotypeCaller \

    -R reference.fasta \

    -I sample1.bam \

    -O variants.g.vcf \

    -ERC GVCF \

    -L exome_intervals.list \

    --read-filter OverclippedReadFilter \

    --QUIET

    如果想关闭automatic variant index生成,

    gatk --java-options "-Xmx4G" HaplotypeCaller \

    -R reference.fasta \

    -I sample1.bam \

    -O variants.g.vcf \

    -ERC GVCF \

    -L exome_intervals.list \

    --read-filter OverclippedReadFilter \

    --QUIET \

    --create-output-variant-index FALSE


    工具目录

    Copy Number Variant Discovery

    Tools that analyze read coverage to detect copy number variants.

    NameSummary

    AnnotateIntervalsAnnotates intervals with GC content, mappability, and segmental-duplication content

    CallCopyRatioSegmentsCalls copy-ratio segments as amplified, deleted, or copy-number neutral

    CreateReadCountPanelOfNormalsCreates a panel of normals for read-count denoising

    DenoiseReadCountsDenoises read counts to produce denoised copy ratios

    DetermineGermlineContigPloidyDetermines the baseline contig ploidy for germline samples given counts data

    FilterIntervalsFilters intervals based on annotations and/or count statistics

    GermlineCNVCallerCalls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy

    ModelSegmentsModels segmented copy ratios from denoised read counts and segmented minor-allele fractions from allelic counts

    PlotDenoisedCopyRatiosCreates plots of denoised copy ratios

    PlotModeledSegmentsCreates plots of denoised and segmented copy-ratio and minor-allele-fraction estimates

    PostprocessGermlineCNVCallsPostprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios

    Coverage Analysis

    Tools that count coverage, e.g. depth per allele

    NameSummary

    ASEReadCounterGenerates table of filtered base counts at het sites for allele specific expression

    AnalyzeSaturationMutagenesis**BETA** (EXPERIMENTAL) Processes reads from a MITESeq or other saturation mutagenesis experiment.

    CollectAllelicCountsCollects reference and alternate allele counts at specified sites

    CollectReadCountsCollects read counts at specified intervals

    CountBasesCount bases in a SAM/BAM/CRAM file

    CountBasesSparkCounts bases in the input SAM/BAM

    CountReadsCount reads in a SAM/BAM/CRAM file

    CountReadsSparkCounts reads in the input SAM/BAM

    GetPileupSummariesTabulates pileup metrics for inferring contamination

    PileupPrints read alignments in samtools pileup format

    PileupSpark**BETA** Prints read alignments in samtools pileup format

    Diagnostics and Quality Control

    Tools that collect sequencing quality related and comparative metrics

    NameSummary

    AccumulateVariantCallingMetrics (Picard)Combines multiple Variant Calling Metrics files into a single file

    AnalyzeCovariatesEvaluate and compare base quality score recalibration (BQSR) tables

    BamIndexStats (Picard)Generate index statistics from a BAM file

    CalcMetadataSpark**BETA** (Internal) Collects read metrics relevant to structural variant discovery

    CalculateContaminationCalculate the fraction of reads coming from cross-sample contamination

    CalculateFingerprintMetrics (Picard)Calculate statistics on fingerprints, checking their viability

    CalculateReadGroupChecksum (Picard)Creates a hash code based on the read groups (RG).

    CheckFingerprint (Picard)Computes a fingerprint from the supplied input (SAM/BAM or VCF) file and compares it to the provided genotypes

    CheckPileupCompare GATK's internal pileup to a reference Samtools mpileup

    CheckTerminatorBlock (Picard)Asserts the provided gzip file's (e.g., BAM) last block is well-formed; RC 100 otherwise

    ClusterCrosscheckMetrics (Picard)Clusters the results of a CrosscheckFingerprints run by LOD score

    CollectAlignmentSummaryMetrics (Picard)Produces a summary of alignment metrics from a SAM or BAM file.

    CollectArraysVariantCallingMetrics (Picard)Collects summary and per-sample from the provided arrays VCF file

    CollectBaseDistributionByCycle (Picard)Chart the nucleotide distribution per cycle in a SAM or BAM file

    CollectBaseDistributionByCycleSpark**BETA** Collects base distribution per cycle in SAM/BAM/CRAM file(s).

    CollectGcBiasMetrics (Picard)Collect metrics regarding GC bias.

    CollectHiSeqXPfFailMetrics (Picard)Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories.

    CollectHsMetrics (Picard)Collects hybrid-selection (HS) metrics for a SAM or BAM file.

    CollectIndependentReplicateMetrics (Picard)**EXPERIMENTAL** Estimates the rate of independent replication of reads within a bam.

    CollectInsertSizeMetrics (Picard)Collect metrics about the insert size distribution of a paired-end library.

    CollectInsertSizeMetricsSpark**BETA** Collects insert size distribution information on alignment data

    CollectJumpingLibraryMetrics (Picard)Collect jumping library metrics.

    CollectMultipleMetrics (Picard)Collect multiple classes of metrics.

    CollectMultipleMetricsSpark**BETA** Runs multiple metrics collection modules for a given alignment file

    CollectOxoGMetrics (Picard)Collect metrics to assess oxidative artifacts.

    CollectQualityYieldMetrics (Picard)Collect metrics about reads that pass quality thresholds and Illumina-specific filters.

    CollectQualityYieldMetricsSpark**BETA** Collects quality yield metrics from SAM/BAM/CRAM file(s).

    CollectRawWgsMetrics (Picard)Collect whole genome sequencing-related metrics.

    CollectRnaSeqMetrics (Picard)Produces RNA alignment metrics for a SAM or BAM file.

    CollectRrbsMetrics (Picard)Collects metrics from reduced representation bisulfite sequencing (Rrbs) data.

    CollectSamErrorMetrics (Picard)Program to collect error metrics on bases stratified in various ways.

    CollectSequencingArtifactMetrics (Picard)Collect metrics to quantify single-base sequencing artifacts.

    CollectTargetedPcrMetrics (Picard)Calculate PCR-related metrics from targeted sequencing data.

    CollectVariantCallingMetrics (Picard)Collects per-sample and aggregate (spanning all samples) metrics from the provided VCF file

    CollectWgsMetrics (Picard)Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.

    CollectWgsMetricsWithNonZeroCoverage (Picard)**EXPERIMENTAL** Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.

    CompareBaseQualitiesCompares the base qualities of two SAM/BAM/CRAM files

    CompareDuplicatesSpark**BETA** Determine if two potentially identical BAMs have the same duplicate reads

    CompareMetrics (Picard)Compare two metrics files.

    CompareSAMs (Picard)Compare two input ".sam" or ".bam" files.

    ConvertSequencingArtifactToOxoG (Picard)Extract OxoG metrics from generalized artifacts metrics.

    CrosscheckFingerprints (Picard)Checks that all data in the input files appear to have come from the same individual

    EstimateLibraryComplexity (Picard)Estimates the numbers of unique molecules in a sequencing library.

    FlagStatAccumulate flag statistics given a BAM file

    FlagStatSparkSpark tool to accumulate flag statistics

    GetSampleNameEmit a single sample name

    MeanQualityByCycle (Picard)Collect mean quality by cycle.

    MeanQualityByCycleSpark**BETA** MeanQualityByCycle on Spark

    QualityScoreDistribution (Picard)Chart the distribution of quality scores.

    QualityScoreDistributionSpark**BETA** QualityScoreDistribution on Spark

    ValidateSamFile (Picard)Validates a SAM or BAM file.

    ViewSam (Picard)Prints a SAM or BAM file to the screen

    Intervals Manipulation

    Tools that process genomic intervals in various formats

    NameSummary

    BedToIntervalList (Picard)Converts a BED file to a Picard Interval List.

    IntervalListToBed (Picard)Converts an Picard IntervalList file to a BED file.

    IntervalListTools (Picard)A tool for performing various IntervalList manipulations

    LiftOverIntervalList (Picard)Lifts over an interval list from one reference build to another.

    PreprocessIntervalsPrepares bins for coverage collection

    SplitIntervalsSplit intervals into sub-interval files.

    Metagenomics

    Tools that perform metagenomic analysis, e.g. microbial community composition and pathogen detection

    NameSummary

    PathSeqBuildKmersBuilds set of host reference k-mers

    PathSeqBuildReferenceTaxonomyBuilds a taxonomy datafile of the microbe reference

    PathSeqBwaSparkStep 2: Aligns reads to the microbe reference

    PathSeqFilterSparkStep 1: Filters low quality, low complexity, duplicate, and host reads

    PathSeqPipelineSparkCombined tool that performs all steps: read filtering, microbe reference alignment, and abundance scoring

    PathSeqScoreSparkStep 3: Classifies pathogen-aligned reads and generates abundance scores

    Other

    Miscellaneous tools, e.g. those that aid in data streaming

    NameSummary

    CreateHadoopBamSplittingIndex**BETA** Create a Hadoop BAM splitting index

    FifoBuffer (Picard)Provides a large, FIFO buffer that can be used to buffer input and output streams between programs.

    FixCallSetSampleOrdering**EXPERIMENTAL** fix sample names in a shuffled callset

    GatherBQSRReportsGathers scattered BQSR recalibration reports into a single file

    GatherTranches**BETA** Gathers scattered VQSLOD tranches into a single file

    IndexFeatureFileCreates an index for a feature file, e.g. VCF or BED file.

    ParallelCopyGCSDirectoryIntoHDFSSpark**BETA** Parallel copy a file or directory from Google Cloud Storage into the HDFS file system used by Spark

    ReblockGVCF**BETA** Condenses homRef blocks in a single-sample GVCF

    Read Data Manipulation

    Tools that manipulate read data in SAM, BAM or CRAM format

    NameSummary

    AddCommentsToBam (Picard)Adds comments to the header of a BAM file.

    AddOATag (Picard)Record current alignment information to OA tag.

    AddOrReplaceReadGroups (Picard)Assigns all the reads in a file to a single new read-group.

    ApplyBQSRApply base quality score recalibration

    ApplyBQSRSpark**BETA** Apply base quality score recalibration on Spark

    BQSRPipelineSpark**BETA** Both steps of BQSR (BaseRecalibrator and ApplyBQSR) on Spark

    BamToBfq (Picard)Converts a BAM file into a BFQ (binary fastq formatted) file

    BaseRecalibratorGenerates recalibration table for Base Quality Score Recalibration (BQSR)

    BaseRecalibratorSpark**BETA** Generate recalibration table for Base Quality Score Recalibration (BQSR) on Spark

    BuildBamIndex (Picard)Generates a BAM index ".bai" file.

    BwaAndMarkDuplicatesPipelineSpark**BETA** Takes name-sorted file and runs BWA and MarkDuplicates.

    BwaSpark**BETA** Align reads to a given reference using BWA on Spark

    CleanSam (Picard)Cleans the provided SAM/BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads

    ClipReadsClip reads in a SAM/BAM/CRAM file

    ConvertHeaderlessHadoopBamShardToBam**BETA** Convert a headerless BAM shard into a readable BAM

    DownsampleSam (Picard)Downsample a SAM or BAM file.

    ExtractOriginalAlignmentRecordsByNameSpark**BETA** Subsets reads by name

    FastqToSam (Picard)Converts a FASTQ file to an unaligned BAM or SAM file

    FilterSamReads (Picard)Subsets reads from a SAM or BAM file by applying one of several filters.

    FixMateInformation (Picard)Verify mate-pair information between mates and fix if needed.

    FixMisencodedBaseQualityReadsFix Illumina base quality scores in a SAM/BAM/CRAM file

    GatherBamFiles (Picard)Concatenate efficiently BAM files that resulted from a scattered parallel analysis

    LeftAlignIndelsLeft-aligns indels from reads in a SAM/BAM/CRAM file

    MarkDuplicates (Picard)Identifies duplicate reads.

    MarkDuplicatesSparkMarkDuplicates on Spark

    MarkDuplicatesWithMateCigar (Picard)Identifies duplicate reads, accounting for mate CIGAR.

    MergeBamAlignment (Picard)Merge alignment data from a SAM or BAM with data in an unmapped BAM file.

    MergeSamFiles (Picard)Merges multiple SAM and/or BAM files into a single file.

    PositionBasedDownsampleSam (Picard)Downsample a SAM or BAM file to retain a subset of the reads based on the reads location in each tile in the flowcell.

    PrintReadsPrint reads in the SAM/BAM/CRAM file

    PrintReadsHeaderPrint the header from a SAM/BAM/CRAM file

    PrintReadsSparkPrintReads on Spark

    ReorderSam (Picard)Reorders reads in a SAM or BAM file to match ordering in a second reference file.

    ReplaceSamHeader (Picard)Replaces the SAMFileHeader in a SAM or BAM file.

    RevertBaseQualityScoresRevert Quality Scores in a SAM/BAM/CRAM file

    RevertOriginalBaseQualitiesAndAddMateCigar (Picard)Reverts the original base qualities and adds the mate cigar tag to read-group BAMs

    RevertSam (Picard)Reverts SAM or BAM files to a previous state.

    RevertSamSpark**BETA** Reverts SAM or BAM files to a previous state.

    SamFormatConverter (Picard)Convert a BAM file to a SAM file, or a SAM to a BAM

    SamToFastq (Picard)Converts a SAM or BAM file to FASTQ.

    SetNmAndUqTags (Picard)DEPRECATED: Use SetNmMdAndUqTags instead.

    SetNmMdAndUqTags (Picard)Fixes the NM, MD, and UQ tags in a SAM file

    SimpleMarkDuplicatesWithMateCigar (Picard)**EXPERIMENTAL** Examines aligned records in the supplied SAM or BAM file to locate duplicate molecules.

    SortSam (Picard)Sorts a SAM or BAM file

    SortSamSpark**BETA** SortSam on Spark (works on SAM/BAM/CRAM)

    SplitNCigarReadsSplit Reads with N in Cigar

    SplitReadsOutputs reads from a SAM/BAM/CRAM by read group, sample and library name

    SplitSamByLibrary (Picard)Splits a SAM or BAM file into individual files by library

    SplitSamByNumberOfReads (Picard)Splits a SAM or BAM file to multiple BAMs.

    UmiAwareMarkDuplicatesWithMateCigar (Picard)**EXPERIMENTAL** Identifies duplicate reads using information from read positions and UMIs.

    UnmarkDuplicatesClears the 0x400 duplicate SAM flag

    Reference

    Tools that analyze and manipulate FASTA format references

    NameSummary

    BaitDesigner (Picard)Designs oligonucleotide baits for hybrid selection reactions.

    BwaMemIndexImageCreatorCreate a BWA-MEM index image file for use with GATK BWA tools

    CountBasesInReferenceCount the numbers of each base in a reference file

    CreateSequenceDictionary (Picard)Creates a sequence dictionary for a reference sequence.

    ExtractSequences (Picard)Subsets intervals from a reference sequence to a new FASTA file.

    FastaAlternateReferenceMakerCreate an alternative reference by combining a fasta with a vcf.

    FastaReferenceMakerCreate snippets of a fasta file

    FindBadGenomicKmersSpark**BETA** Identifies sequences that occur at high frequency in a reference

    NonNFastaSize (Picard)Counts the number of non-N bases in a fasta file.

    NormalizeFasta (Picard)Normalizes lines of sequence in a FASTA file to be of the same length.

    ScatterIntervalsByNs (Picard)Writes an interval list created by splitting a reference at Ns.

    Short Variant Discovery

    Tools that perform variant calling and genotyping for short variants (SNPs, SNVs and Indels)

    NameSummary

    CombineGVCFsMerges one or more HaplotypeCaller GVCF files into a single GVCF with appropriate annotations

    GenomicsDBImportImport VCFs to GenomicsDB

    GenotypeGVCFsPerform joint genotyping on one or more samples pre-called with HaplotypeCaller

    GnarlyGenotyper**BETA** Perform "quick and dirty" joint genotyping on one or more samples pre-called with HaplotypeCaller

    HaplotypeCallerCall germline SNPs and indels via local re-assembly of haplotypes

    HaplotypeCallerSpark**BETA** HaplotypeCaller on Spark

    Mutect2Call somatic SNVs and indels via local assembly of haplotypes

    ReadsPipelineSpark**BETA** Runs BWA (if specified), MarkDuplicates, BQSR, and HaplotypeCaller on unaligned or aligned reads to generate a VCF.

    Structural Variant Discovery

    Tools that detect structural variants

    NameSummary

    CpxVariantReInterpreterSpark**BETA** (Internal) Tries to extract simple variants from a provided GATK-SV CPX.vcf

    DiscoverVariantsFromContigAlignmentsSAMSpark**BETA** (Internal) Examines aligned contigs from local assemblies and calls structural variants

    ExtractSVEvidenceSpark**BETA** (Internal) Extracts evidence of structural variations from reads

    FindBreakpointEvidenceSpark**BETA** (Internal) Produces local assemblies of genomic regions that may harbor structural variants

    StructuralVariationDiscoveryPipelineSpark**BETA** Runs the structural variation discovery workflow on a single sample

    SvDiscoverFromLocalAssemblyContigAlignmentsSpark**BETA** (Internal) Examines aligned contigs from local assemblies and calls structural variants or their breakpoints

    Variant Evaluation and Refinement

    Tools that evaluate and refine variant calls, e.g. with annotations not offered by the engine

    NameSummary

    AnnotatePairOrientation**EXPERIMENTAL** Annotate a non-M2 VCF (using the associated tumor bam) with pair orientation fields (e.g. F1R2 ).

    AnnotateVcfWithBamDepth(Internal) Annotate a vcf with a bam's read depth at each variant locus

    AnnotateVcfWithExpectedAlleleFraction(Internal) Annotate a vcf with expected allele fractions in pooled sequencing

    CalculateGenotypePosteriorsCalculate genotype posterior probabilities given family and/or known population genotypes

    CalculateMixingFractions(Internal) Calculate proportions of different samples in a pooled bam

    Concordance**BETA** Evaluate concordance of an input VCF against a validated truth VCF

    CountFalsePositives**BETA** Count PASS variants

    CountVariantsCounts variant records in a VCF file, regardless of filter status.

    CountVariantsSparkCountVariants on Spark

    EvaluateInfoFieldConcordance**BETA** Evaluate concordance of info fields in an input VCF against a validated truth VCF

    FilterFuncotations**EXPERIMENTAL** Filter variants based on clinically-significant Funcotations.

    FindMendelianViolations (Picard)Finds mendelian violations of all types within a VCF

    FuncotateSegments**BETA** Functional annotation for segment files. The output formats are not well-defined and subject to change.

    FuncotatorFunctional Annotator

    FuncotatorDataSourceDownloaderData source downloader for Funcotator.

    GenotypeConcordance (Picard)Calculates the concordance between genotype data of one sample in each of two VCFs - truth (or reference) vs. calls.

    ValidateBasicSomaticShortMutations**EXPERIMENTAL** Check variants against tumor-normal bams representing the same samples, though not the ones from the actual calls.

    ValidateVariantsValidate VCF

    VariantEval**BETA** General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more)

    VariantsToTableExtract fields from a VCF file to a tab-delimited table

    Variant Filtering

    Tools that filter variants by annotating the FILTER column

    NameSummary

    ApplyVQSRApply a score cutoff to filter variants based on a recalibration table

    CNNScoreVariantsApply a Convolutional Neural Net to filter annotated variants

    CNNVariantTrain**EXPERIMENTAL** Train a CNN model for filtering variants

    CNNVariantWriteTensors**EXPERIMENTAL** Write variant tensors for training a CNN to filter variants

    CreateSomaticPanelOfNormals**BETA** Make a panel of normals for use with Mutect2

    FilterAlignmentArtifacts**EXPERIMENTAL** Filter alignment artifacts from a vcf callset.

    FilterByOrientationBias**EXPERIMENTAL** Filter Mutect2 somatic variant calls using orientation bias

    FilterMutectCallsFilter somatic SNVs and indels called by Mutect2

    FilterVariantTranchesApply tranche filtering

    FilterVcf (Picard)Hard filters a VCF.

    VariantFiltrationFilter variant calls based on INFO and/or FORMAT annotations

    VariantRecalibratorBuild a recalibration model to score variant quality for filtering purposes

    Variant Manipulation

    Tools that manipulate variant call format (VCF) data

    NameSummary

    FixVcfHeader (Picard)Replaces or fixes a VCF header.

    GatherVcfs (Picard)Gathers multiple VCF files from a scatter operation into a single VCF file

    GatherVcfsCloud**BETA** Gathers multiple VCF files from a scatter operation into a single VCF file

    LeftAlignAndTrimVariantsLeft align and trim vairants

    LiftoverVcf (Picard)Lifts over a VCF file from one reference build to another.

    MakeSitesOnlyVcf (Picard)Creates a VCF that contains all the site-level information for all records in the input VCF but no genotype information.

    MergeVcfs (Picard)Combines multiple variant files into a single variant file

    PrintVariantsSparkPrints out variants from the input VCF.

    RemoveNearbyIndels(Internal) Remove indels from the VCF file that are close to each other.

    RenameSampleInVcf (Picard)Renames a sample within a VCF or BCF.

    SelectVariantsSelect a subset of variants from a VCF file

    SortVcf (Picard)Sorts one or more VCF files.

    SplitVcfs (Picard)Splits SNPs and INDELs into separate files.

    UpdateVCFSequenceDictionaryUpdates the sequence dictionary in a variant file.

    UpdateVcfSequenceDictionary (Picard)Takes a VCF and a second file that contains a sequence dictionary and updates the VCF with the new sequence dictionary.

    VariantAnnotator**BETA** Tool for adding annotations to VCF files

    VcfFormatConverter (Picard)Converts VCF to BCF or BCF to VCF.

    VcfToIntervalList (Picard)Converts a VCF or BCF file to a Picard Interval List

    Base Calling

    Tools that process sequencing machine data, e.g. Illumina base calls, and detect sequencing level attributes, e.g. adapters

    NameSummary

    CheckIlluminaDirectory (Picard)Asserts the validity for specified Illumina basecalling data.

    CollectIlluminaBasecallingMetrics (Picard)Collects Illumina Basecalling metrics for a sequencing run.

    CollectIlluminaLaneMetrics (Picard)Collects Illumina lane metrics for the given BaseCalling analysis directory.

    ExtractIlluminaBarcodes (Picard)Tool determines the barcode for each read in an Illumina lane.

    IlluminaBasecallsToFastq (Picard)Generate FASTQ file(s) from Illumina basecall read data.

    IlluminaBasecallsToSam (Picard)Transforms raw Illumina sequencing data into an unmapped SAM or BAM file.

    MarkIlluminaAdapters (Picard)Reads a SAM or BAM file and rewrites it with new adapter-trimming tags.

    Genotyping Arrays Manipulation

    Tools that manipulate data generated by Genotyping arrays

    NameSummary

    CreateVerifyIDIntensityContaminationMetricsFile (Picard)Program to generate a picard metrics file from the output of the VerifyIDIntensity tool.

    GtcToVcf (Picard)Program to convert a GTC file to a VCF

    MergePedIntoVcf (Picard)Program to merge a single-sample ped file from zCall into a single-sample VCF.

    VcfToAdpc (Picard)Program to convert an Arrays VCF to an ADPC file.

    Methylation-Specific Tools

    Tools that perform methylation calling, processing bisulfite sequenced, methylation-aware aligned BAM

    NameSummary

    MethylationTypeCaller**EXPERIMENTAL** Identify methylated bases from bisulfite sequenced, methylation-aware BAMs

    tools

    NameSummary

    AlleleFrequencyStratify by eval RODs by the allele frequency of the alternate allele

    Read Filters

    Applied by engine to select reads for analysis

    NameSummary

    AlignmentAgreesWithHeaderReadFilterFilters out reads where the alignment does not match the contents of the header

    AllowAllReadsReadFilterDo not filter out any read

    AmbiguousBaseReadFilterFilters out reads that have greater than the threshold number of N bases

    CigarContainsNoNOperatorFilter out reads with CIGAR containing N operator

    FirstOfPairReadFilterKeep only reads that are first of pair

    FragmentLengthReadFilterKeep only read pairs with insert length less than or equal to the given value

    GoodCigarReadFilterKeep only reads containing good CIGAR string

    HasReadGroupReadFilterFilter out reads without Read Group

    IntervalOverlapReadFilterFilters out reads that don't overlap the specified region. NOTE: This approach to extracting overlapping reads is very slow compared to using PrintReads and -L on an indexed bam file.

    LibraryReadFilterKeep only reads from the specified library

    MappedReadFilterFilter out unmapped reads

    MappingQualityAvailableReadFilterFilter out reads without available mapping quality

    MappingQualityNotZeroReadFilterFilter out reads with mapping quality equal to zero

    MappingQualityReadFilterKeep only reads with mapping qualities within a specified range

    MatchingBasesAndQualsReadFilterFilter out reads where the bases and qualities do not match

    MateDifferentStrandReadFilterKeep only reads with mates mapped on the different strand

    MateOnSameContigOrNoMappedMateReadFilterKeep only reads whose mate maps to the same contig or is unmapped

    MateUnmappedAndUnmappedReadFilterFilters reads whose mate is unmapped as well as unmapped reads.

    MetricsReadFilterFilter out reads that fail platform quality checks, are unmapped and represent secondary/supplementary alignments

    NonChimericOriginalAlignmentReadFilterFilters reads whose original alignment was chimeric.

    NonZeroFragmentLengthReadFilterFilter out reads with fragment length different from zero

    NonZeroReferenceLengthAlignmentReadFilterFilter out reads that do not align to the reference

    NotDuplicateReadFilterFilter out reads marked as duplicate

    NotSecondaryAlignmentReadFilterFilter out reads representing secondary alignments

    NotSupplementaryAlignmentReadFilterFilter out reads representing supplementary alignments

    OverclippedReadFilterFilter out reads that are over-soft-clipped

    PairedReadFilterFilter out unpaired reads

    PassesVendorQualityCheckReadFilterFilter out reads failing platfor/vendor quality checks

    PlatformReadFilterKeep only reads with matching Read Group platform

    PlatformUnitReadFilterFilter out reads with matching platform unit attribute

    PrimaryLineReadFilterKeep only reads representing primary alignments (those that satisfy both the NotSecondaryAlignment and NotSupplementaryAlignment filters, or in terms of SAM flag values, must have neither of the 0x100 or 0x800 flags set).

    ProperlyPairedReadFilterKeep only reads that are properly paired

    ReadGroupBlackListReadFilterKeep records not matching the read group tag and exact match string.

    ReadGroupReadFilterKeep only reads from the specified read group

    ReadLengthEqualsCigarLengthReadFilterFilter out reads where the read and CIGAR do not match in length

    ReadLengthReadFilterKeep only reads whose length is within a certain range

    ReadNameReadFilterKeep only reads with this read name

    ReadStrandFilterKeep only reads whose strand is as specified

    SampleReadFilterKeep only reads for a given sample

    SecondOfPairReadFilterKeep only paired reads that are second of pair

    SeqIsStoredReadFilterKeep only reads with sequenced bases

    SoftClippedReadFilterFilter out reads that are over-soft-clipped

    ValidAlignmentEndReadFilterKeep only reads where the read end is properly aligned

    ValidAlignmentStartReadFilterKeep only reads with a valid alignment start

    WellformedReadFilterKeep only reads that are well-formed

    Variant Annotations

    Available to HaplotypeCaller, Mutect2, VariantAnnotator and GenotypeGVCFs. See https://software.broadinstitute.org/gatk/documentation/article?id=10836

    NameSummary

    AS_BaseQualityRankSumTestAllele-specific rank sum test of REF versus ALT base quality scores (AS_BaseQRankSum)

    AS_FisherStrandAllele-specific strand bias estimated using Fisher's exact test (AS_FS)

    AS_InbreedingCoeffAllele-specific likelihood-based test for the consanguinity among samples (AS_InbreedingCoeff)

    AS_MappingQualityRankSumTestAllele-specific rank sum test for mapping qualities of REF versus ALT reads (AS_MQRankSum)

    AS_QualByDepthAllele-specific call confidence normalized by depth of sample reads supporting the allele (AS_QD)

    AS_RMSMappingQualityAllele-specific root-mean-square of the mapping quality of reads across all samples (AS_MQ)

    AS_ReadPosRankSumTestAllele-specific rank sum test for relative positioning of REF versus ALT allele within reads (AS_ReadPosRankSum)

    AS_StrandOddsRatioAllele-specific strand bias estimated by the symmetric odds ratio test (AS_SOR)

    AlleleFractionVariant allele fraction for a genotype

    BaseQualityMedian base quality of bases supporting each allele (MBQ)

    BaseQualityRankSumTestRank sum test of REF versus ALT base quality scores (BaseQRankSum)

    ChromosomeCountsCounts and frequency of alleles in called genotypes (AC, AF, AN)

    ClippingRankSumTestRank sum test for hard-clipped bases on REF versus ALT reads (ClippingRankSum)

    CountNsNumber of Ns at the pileup

    CoverageTotal depth of coverage per sample and over all samples (DP)

    DepthPerAlleleBySampleDepth of coverage of each allele per sample (AD)

    DepthPerSampleHCDepth of informative coverage for each sample (DP)

    ExcessHetPhred-scaled p-value for exact test of excess heterozygosity (ExcessHet)

    FisherStrandStrand bias estimated using Fisher's exact test (FS)

    FragmentLengthMedian fragment length of reads supporting each allele (MFRL)

    GenotypeSummariesSummary of genotype statistics from all samples (NCC, GQ_MEAN, GQ_STDDEV)

    InbreedingCoeffLikelihood-based test for the consanguinity among samples (InbreedingCoeff)

    LikelihoodRankSumTestRank sum test of per-read likelihoods of REF versus ALT reads (LikelihoodRankSum)

    MappingQualityMedian mapping quality of reads supporting each allele (MMQ)

    MappingQualityRankSumTestRank sum test for mapping qualities of REF versus ALT reads (MQRankSum)

    MappingQualityZeroCount of all reads with MAPQ = 0 across all samples (MQ0)

    OrientationBiasReadCountsCount of read pairs in the F1R2 and F2R1 configurations supporting REF and ALT alleles (F1R2, F2R1)

    OriginalAlignmentNumber of alt reads with an OA tag that doesn't match the current alignment contig.

    PossibleDeNovoExistence of a de novo mutation in at least one of the given families (hiConfDeNovo, loConfDeNovo)

    QualByDepthVariant confidence normalized by unfiltered depth of variant samples (QD)

    RMSMappingQualityRoot mean square of the mapping quality of reads across all samples (MQ)

    ReadPosRankSumTestRank sum test for relative positioning of REF versus ALT alleles within reads (ReadPosRankSum)

    ReadPositionMedian distance of variant starts from ends of reads supporting each allele (MPOS)

    ReferenceBasesAnnotate with local reference bases (REF_BASES)

    SampleListList of samples that are not homozygous reference at a variant site (Samples)

    StrandBiasBySampleNumber of forward and reverse reads that support REF and ALT alleles (SB)

    StrandOddsRatioStrand bias estimated by the symmetric odds ratio test (SOR)

    TandemRepeatTandem repeat unit composition and counts per allele (STR, RU, RPA)

    UniqueAltReadCountNumber of non-duplicate-insert ALT reads (UNIQ_ALT_READ_COUNT)

    相关文章

      网友评论

        本文标题:GATK简介

        本文链接:https://www.haomeiwen.com/subject/kbcmxhtx.html