美文网首页SV
2基因组间鉴定SV

2基因组间鉴定SV

作者: 斩毛毛 | 来源:发表于2020-12-22 21:25 被阅读0次

    本文学习费章军老师文章Genome of Solanum pimpinellifolium provides insights into structural variants during tomato breeding 如何鉴定SV。

    其流程见 https://github.com/GaoLei-bio/SV

    1 SV-calling

    1.1 基因组间比较

    简单思路: 2个基因组比较--》调取unique 比对--〉二代reads比对过滤

    软件准备:

    • minimap2 (v2.11 or higher)
    • sam2delta.py (RaGoo 下的一个脚本,将sam格式变为nucmer的格式)
    • Assemblytics
    • samtools (v1.5 or higher)
    • blast+
    • python 2.7
      -- operator
      -- pathlib
      -- argparse
      -- os
      -- re
      -- subprocess
      -- sys

    数据准备:

    • reference genome
    • query genome (二者质量较高,且相关物种;\color{red}{染色体命名规则;xxxchr01, xxxchr02, contig等合并为 xxxchr00})
    • 两个基因组所对应的illumine reads (因为组装不完整,导致没有组装的区域被认为是delete,因此用二代数据过滤SV)
    • QryRead2Ref.bam (sorted, rmdup; query reads 比对到reference;\color{red}{可以bwa-mem进行比对,名字必须是这个})
    • RefRead2Qry.bam (sorted, rmdup;ref reads比对到Query genome)
    • Ref_self.bam; 同上
    • Qry_self.bam; 同上

    \color{red} {上述4个bam文件必须sort,且去重复,index,且命名和上述统一,且均放于工作目录}

    简单操练一下:
    将下载的所有脚本置于一个文件即可

    ## 运行SV_calling.sh 即可
    Command:
          bash path_to_SV_calling_script/SV_calling.sh \
               path_to_SV_calling_script \
               <Reference_genome_file>   \
               <Query_genome_file>  \
               <Prefix_for_outputs> \
               <number of threads>  \
               <min_SV_size>   \
               <max_SV_size>  \
               <assemblytics_path>
    ### 具体数据
    bash path_to_SV_calling_script/SV_calling.sh \
              path_to_SV_calling_script \
              SL4.0.genome.fasta  \
               Pimp_v1.4.fasta  \
                SP2SL 24 10 1000000 \
                path_to_assemblytics_scripts
    
    ## 上述的4个bam文件必须放在当前工作目录下
    

    结果:

    • SP2SL.Master_list.tsv
    • SP2SL.NR.bed (用于下一步)

    1.2 pacbio reads进行鉴定SV (可选)

    数据准备:

    • Prefix_for_outputs.NR.bed: 上一步结果
    • Reference_genome_file: fasta
    • Query_genome_file: fasta
    • Ref_base_pbsv_vcf (利用pbsv 将query pacbio reads比对到reference genome 获得SV)
    • Qry_base_pbsv_vcf (利用pbsv 将reference pacbio reads比对到Query genome 获得SV)

    直接操练

      Command:
          bash path_to_SV_calling_script/SV_PacBio.sh \
               path_to_SV_calling_script \
               <Reference_genome_file>   \
               <Query_genome_file>  \
               <Prefix_for_outputs> \
               <number of threads> \
               <Ref_base_pbsv_vcf> \
               <Qry_base_pbsv_vcf>
    
    ## 具体例子
    For example:
          bash path_to_SV_calling_script/SV_PacBio.sh \
               path_to_SV_calling_script \
               SL4.0.genome.fasta  \
               Pimp_v1.4.fasta   \
               SP2SL  24  \
               PimpReads2SL4.0.var.vcf  \
               HeinzReads2Pimp.var.vcf
    

    结果:

    • SP2SL.Master_list.tsv

    可将上述两种方法得到的SV结果进行合并即可。

    2 SV-genotyping

    主要程序 SV_genotyping.py, 且在Example中有实例文件

    准备数据:

    • 重测序数据分别比对到Query 和Ref genome (bwa即可【<3 bp错配】,去重复,)并排序
    Parameters:
    INPUT=Example/Example_SV.tsv  # Path to your input SV file. A example of input SV file is provided.
    sample=Sample_name            # Sample name, prefix of outputs
    Reference=Reference_genome    # Path to Reference genome, fasta format, indexed by samtools faidx
    Query=Query_genome            # Path to Query genome , fasta format, indexed by samtools faidx
    Ref_bam=Reference_base_bam    # Sorted bam file with reads aligned on Reference genome
    Qry_bam=Query_base_bam        # Sorted bam file with reads aligned on Query genome
    Mismath=3                     # Allowed mismath percentage of aligned read
    
    ## 实例数据
     python SV_genotyping.py Example/Example_input.tsv \
                        Example Example/Reference.fasta \
                          Example/Query.fasta \ 
                            Example/Sample.Ref_base.bam \
                                Example/Sample.Qry_base.bam 3
    

    结果

    • Example.GT.txt.
    head Example.GT.txt.
    #Genotype: R for homozygous Reference genotype; Q for homozygous Query genotype; H for Heterozygous genotype; U for Undetermined.
    SV_ID   Example_1m
    SV_w_5206   Q
    SV_w_5207   Q
    SV_w_5209   R
    SV_w_5210   Q
    SV_w_5211   Q
    SV_w_5212   H
    SV_w_5213   Q
    SV_w_5214   Q
    

    参考

    相关文章

      网友评论

        本文标题:2基因组间鉴定SV

        本文链接:https://www.haomeiwen.com/subject/vkxenktx.html