Variant Call Format

作者: 吴十三和小可爱的札记 | 来源:发表于2021-10-19 18:21 被阅读0次

    INFO fields

    Name Brief description
    AC allele count in genotypes, for each ALT allele, in the same order as listed
    AF allele frequency for each ALT allele in the same order as listed (use this when estimated from primary data, not called genotypes)
    AN total number of alleles in called genotypes

    example 1

    example 2

    Genotype fields

    GT : genotype, encoded as allele values separated by either of / or |. The allele values are 0 for the reference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on.

    For diploid calls examples could be 0/1, 1 | 0, or 1/2, etc.

    - 0/0 : the sample is homozygous reference
    - 0/1 : the sample is heterozygous, carrying 1 copy of each of the REF and ALT alleles
    - 1/1 : the sample is homozygous alternate
    

    For haploid calls, e.g. on Y, male nonpseudoautosomal X, or mitochondrion, only one allele value should be given; a triploid call might look like 0/0/1.

    If a call cannot be made for a sample at a given locus, ‘.’ should be specified for each missing allele 5 in the GT field (for example ‘./.’ for a diploid genotype and ‘.’ for haploid genotype).

    Subset vcf

    # extract list of samples from VCF
    bcftools view -S sample.txt input.vcf -Oz -o sample.vcf
    
    # remove list of samples from VCF
    bcftools view -S ^sample.txt input.vcf -Oz -o sample.vcf
    
    # or
    vcftools --vcf input.vcf  --recode --recode-INFO-all --stdout  --remove-indv sample1  --remove-indv sample2  --remove-indv sample3 > sample.vcf
    

    相关文章

      网友评论

        本文标题:Variant Call Format

        本文链接:https://www.haomeiwen.com/subject/rrmqoltx.html