美文网首页群体遗传学基因组
Syri:基于组装基因组的变异检测

Syri:基于组装基因组的变异检测

作者: 橙子_orange | 来源:发表于2022-02-28 15:12 被阅读0次
    1.安装Syri

    GitHub页面:https://github.com/schneebergerlab/syri
    环境要求:
    🔸Python >=3.8
    🔸python包: Cython-0.29.23, numpy-1.20.2, scipy-1.6.2, pandas-1.2.4, python-igraph-0.9.1, psutil-5.8.0, pysam-0.16.0.1, and matplotlib-3.3.4
    🔸C/C++ compiler: g++

    git clone https://github.com/schneebergerlab/syri.git
    cd syri
    python setup.py install
    chmod  +x syri/bin/syri syri/bin/chroder
    
    2.运行Syri
    nucmer --maxmatch -c 100 -b 500 -l 50 refgenome qrygenome 
    delta-filter -m -i 90 -l 100 out.delta > out_m_i90_l100.delta 
    show-coords -THrd out_m_i90_l100.delta > out_m_i90_l100.coords 
    
    /path/syri-1.5.1/syri/bin/syri  -c out_m_i90_l100.coords -d out_m_i90_l100.delta -r refgenome -q qrygenome
    
    绘制syri预测的基因组结构:https://github.com/schneebergerlab/plotsr

    genomes.txt 是一个制表符分隔的文件,其中包含基因组的路径和名称。第三列也可以添加到基因组的自定义可视化属性中

    $genomes.txt
    #file   name    tags
    /path/ref.fa    col-0   lw:1.5
    /path/query.fa  ler lw:1.5
    
    plotsr \
        --sr syri.out \
        --genomes genomes.txt \
        -o plotsr_out.png
    
    plotsr
    3.生成结果
    output
    3.1 syri.out(tsv格式)
    syri.out

    Column Number Value Type
    1 chromosome ID in reference string
    2 reference start position (1-based, includes start position) int
    3 reference end position (1-based, includes end position) int
    4 sequence in reference (Only for SNPs and indels) string
    5 sequence in query (Only for SNPs and indels) string
    6 chromosome ID in query string
    7 query start position (1-based, includes start position) int
    8 query end position (1-based, includes end position) int
    9 unique ID (annotation type + number) string
    10 parent ID (annotation type + number) string
    11 Annotation type string
    12 Copy status (for duplications) string
    🔸Copy status:描述query/reference中是否有复制区域(copygain, i.e. query有额外的拷贝;copyloss, i.e. reference有额外的拷贝)
    🔸Parent ID:对应于其中存在比对或局部变异的注释块(同线区域或结构重排)的唯一 ID。
    因此,如果在ref Chr1:10 和query Chr2:542 的易位区域(唯一 ID TRANS1)中存在 A->T SNP(具有唯一 ID SNP1),则相应条目将是:

    Chr1 10 10 A T Chr2 542 542 SNP1 TRANS1 SNP -

    Annotation Meaning Annotation Meaning
    SYN Syntenic region SYNAL Alignment in syntenic region
    INV Inverted region INVAL Alignment in inverted region
    TRANS Translocated region TRANSAL Alignment in translocated region
    INVTR Inverted translocated region INVTRAL Alignment in inverted translocated region
    DUP Duplicated region DUPAL Alignment in duplicated region
    INVDP Inverted duplicated region INVDPAL Alignment in inverted duplicated region
    NOTAL Un-aligned region SNP Single nucleotide polymorphism
    CPG Copy gain in query CPL Copy loss in query
    HDR Highly diverged regions TDM Tandem repeat
    INS Insertion in query DEL Deletion in query

    3.2 syri.vcf
    syri.vcf
    3.3 syri.summary
    summary

    相关文章

      网友评论

        本文标题:Syri:基于组装基因组的变异检测

        本文链接:https://www.haomeiwen.com/subject/jvjlrrtx.html