美文网首页RNA-seqrice related analysisRNASeq 数据分析
估计很多人都像我一样一直没弄懂关于samtools flag 0

估计很多人都像我一样一直没弄懂关于samtools flag 0

作者: 热衷组培的二货潜 | 来源:发表于2019-03-15 20:50 被阅读17次

    Question: In Sam Format, Clarify The Meaning Of The "0" Flag.

    @Jts, can you comment on Devon Ryan's comment in the answer to my question? I actually realize I agree with him on it that FLAG 0 only mean it's not reverse complemented, not necessarily mapped to forward/+ strand.

    A flag field of 0 may or may not mean that a read is actually mapped to the + strand. All it means is that it wasn't reverse complemented. Whether that means it should be assigned to one strand or the other or none at all is dependent on other factors and isn't encoded in BAM files. As an example, most common single-end RNAseq experiments use a stranded protocol wherein alignments with a flag of 0 come from the "-" strand.

    Link is here: SAM flag and select reads that map uniquely

    Question: SAM flag and select reads that map uniquely

    • Hi Devon, I've spend a lot of time this week trying to REALLY understand this strand, flag, mapping uniqueness issue. I think I'm finally getting my head around it and I have to say I disagree with your first point here on flag 0 meaning. I will outline my thoughts below and please feel free to comment on them. Thanks so much for all the discussions so far.

    • I think a flag field of 0 does mean a read is mapped to the +/forward strand. And flag field indicates where the read itself maps to, between + and - strand of the reference genome.

    • For dUTP-based stranded RNA-seq experiments (Illumina TruSeq stranded), alignments with flag of 0does mean the transcript/gene is on the -/reverse strand as you said, but at the same time, the read itself indeed maps to the +/forward strand, which is the antisense strand for a gene on the -/reverse strand. The "-"strand you mentioned is on the XS:A:-field (in Tophat or hisat2 output) and it's for which strand a transcript/gene (from which the read is derived) is on.It means the read is derived from a transcript/gene on the - strand (although it still maps to the + strand of the genome).

    • Lastly, flag value doesn't tell if a read maps uniquely or not. I need to look for NH:i:field to determine if a read is uniquely mapped.

    samtools view bamfile.bam | awk '$2==0'  #will output all the primary alignments mapped to the + strand 
    samtools view bamfile.bam | awk '$2==16'  #will output all the primary alignments mapped to the - strand
    
    • all the primary alignments contain uniquely mapped and primary alignments from multi-mapping reads.

    • uniquely mapped in the above sentence is determined by the aligner and I'm not sure what MAPQ value they have yet.

    • I think I did here in the end of the 3rd paragraph:

    • It means the read is derived from a transcript/gene on the - strand (although it still maps to the + strand of the genome).

    • I agree that for this particular RNA-seq strand thing, we care about where the transcript is located.

    陈连福的生信博客 : samtools常用命令详解

    Samtools 官方参数

    最终我选择了手动提取

    samtools view -h $bam | awk 'length($6) == 3 && $$2~/0/ && $5~/255/ || $0 ~/@/' \
      | samtools view -bS | samtools sort -@ 8 > ${sample}_uniq_sorted.bam
    

    相关文章

      网友评论

        本文标题:估计很多人都像我一样一直没弄懂关于samtools flag 0

        本文链接:https://www.haomeiwen.com/subject/cxfdmqtx.html