美文网首页chip-seqrna_seq
BOWTIE2 进行基因组比对

BOWTIE2 进行基因组比对

作者: 尘世中一个迷途小书僮 | 来源:发表于2021-04-25 21:05 被阅读0次

    整理ChIP-seq / CUT & Tag 分析时用到的工具。本文只对使用的工具用法进行简单介绍。

    Bowtie 2是常用的基因组比对软件。其原理在此不过多赘述,有兴趣的同学可以参阅其官方文档以及其发表的文章(https://doi.org/10.1038/nmeth.1923)。下面简单介绍Bowtie 2 Index和比对的命令及个人常用参数。

    用法

    Index

    bowtie2-build [options]* <reference_in> <bt2_base>
    

    <reference_in>:如果此处使用-f 参数,则指明index的参考fasta 文件;如果使用-c参数,则指明index的参考序列,例如,GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA.
    <bt2_base>:指的是生成的index文件的前缀,默认情况,bowtie2-build产生NAME.1.bt2, NAME.2.bt2, NAME.3.bt2, NAME.4.bt2, NAME.rev.1.bt2, and NAME.rev.2.bt2, where NAME is <bt2_base>.
    --threads 使用的线程数

    例子

    bowtie2-build -f /public/Reference/GRCh38.primary_assembly.genome.fa --threads 24 GRCh38
    

    上述命令使用该fasta文件/public/Reference/GRCh38.primary_assembly.genome.fa ,在当前位置产生前缀为GRCh38的index文件。

    Alignment

    单端测序比对

    bowtie2 [options]* -x <bt2-idx> -U <fq> -S <sam_output> -p <threads> 2>Align.summary
    

    -x:参考基因组index文件的前缀(包括路径)
    -U:单端测序的fastq文件
    -S:输出的SAM文件,包含比对结果
    -p:使用的线程数
    "2>Align.summary":将输出到屏幕的标准误(standard error)重导向到"Align.summary"文件,其格式通常如下

    ## Single-end
    20000 reads; of these:
      20000 (100.00%) were unpaired; of these:
        1247 (6.24%) aligned 0 times
        18739 (93.69%) aligned exactly 1 time
        14 (0.07%) aligned >1 times
    93.77% overall alignment rate
    
    ## Paired-end
    10000 reads; of these:
      10000 (100.00%) were paired; of these:
        650 (6.50%) aligned concordantly 0 times
        8823 (88.23%) aligned concordantly exactly 1 time
        527 (5.27%) aligned concordantly >1 times
        ----
        650 pairs aligned concordantly 0 times; of these:
          34 (5.23%) aligned discordantly 1 time
        ----
        616 pairs aligned 0 times concordantly or discordantly; of these:
          1232 mates make up the pairs; of these:
            660 (53.57%) aligned 0 times
            571 (46.35%) aligned exactly 1 time
            1 (0.08%) aligned >1 times
    96.70% overall alignment rate
    The indentation indicates how subtotals relate to t
    
    

    双端测序比对

    bowtie2 [options]* -x <bt2-idx> -1 <fq1> -2 <fq2> -S <sam_output> -p <threads> 2>Align.summary
    

    双端比对模式基本与单端一致,只需替换fastq文件传入的参数即可
    -1:一链fastq文件
    -2:二链fastq文件

    Bowtie2 还有更多详细的比对参数可以调整,这里就不一一介绍了。下面再介绍其输出的SAM文件中各列的含义。

    SAM OUTPUT

    SAM文件的每一行代表一个reads的比对情况,至少包含了12列(tab分割),从左往右,每一列的含义依次为:

    1. Read的名字
    2. flags之和

    在bowtie2中,flags的含义为
    1
    The read is one of a pair
    2
    The alignment is one end of a proper paired-end alignment
    4
    The read has no reported alignments
    8
    The read is one of a pair and has no reported alignments
    16
    The alignment is to the reverse reference strand
    32
    The other mate in the paired-end alignment is aligned to the reverse reference strand
    64
    The read is mate 1 in a pair
    128
    The read is mate 2 in a pair
    注意每个比对软件flags的含义有所区别

    1. 比对到的参考基因组染色体名称
    2. read 5’端比对到的参考基因组正链染色体坐标(1-based)
    3. 比对质量
    4. CIGAR字符串,用以表征比对的结果
    5. 双端测序中,二链所比对上的染色体名称,如果与一链相同则为=,如果没有二链则为*
    6. 双端测序中,二链read 5’端比对到的参考基因组正链染色体坐标(1-based),如果没有二链则为0
    7. 推测的一链与二链之间的片段长度。该值为负表明,二链比对到一链的上游;该值为0表明二链没有比对上;该值为non-0表明二链与一链比对到不同的染色体上(non-0如何理解?)
    8. Read的序列
    9. ASCII 编码的read碱基质量
    10. 可选的列,包括以下这些
    AS:i:<N> Alignment score. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if SAM record is for an aligned read. 
    XS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i. 
    YS:i:<N> Alignment score for opposite mate in the paired-end alignment. Only present if the SAM record is for a read that aligned as part of a paired-end alignment. 
    XN:i:<N> The number of ambiguous bases in the reference covering this alignment. Only present if SAM record is for an aligned read. 
    XM:i:<N> The number of mismatches in the alignment. Only present if SAM record is for an aligned read. 
    XO:i:<N> The number of gap opens, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read. 
    XG:i:<N> The number of gap extensions, for both read and reference gaps, in the alignment. Only present if SAM record is for an aligned read. 
    NM:i:<N> The edit distance; that is, the minimal number of one-nucleotide edits (substitutions, insertions and deletions) needed to transform the read string into the reference string. Only present if SAM record is for an aligned read. 
    YF:Z:<S> String indicating reason why the read was filtered out. See also: Filtering. Only appears for reads that were filtered out. 
    YT:Z:<S> Value of UU indicates the read was not part of a pair. Value of CP indicates the read was part of a pair and the pair aligned concordantly. Value of DP indicates the read was part of a pair and the pair aligned discordantly. Value of UP indicates the read was part of a pair but the pair failed to aligned either concordantly or discordantly. 
    MD:Z:<S> A string representation of the mismatched reference bases in the alignm
    

    以上就是对Bowtie 2进行基因组比对的一些总结,以后有新的心得再做补充。

    ref:
    http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#how-is-bowtie-2-different-from-bowtie-1

    完。

    相关文章

      网友评论

        本文标题:BOWTIE2 进行基因组比对

        本文链接:https://www.haomeiwen.com/subject/jyrjrltx.html