Question: In Sam Format, Clarify The Meaning Of The "0" Flag.
@Jts, can you comment on Devon Ryan's comment in the answer to my question? I actually realize I agree with him on it that
FLAG 0 only mean it's not reverse complemented
, not necessarily mapped to forward/+ strand.
A flag field of 0 may or may not mean that a read is actually mapped to the + strand. All it means is that it wasn't reverse complemented. Whether that means it should be assigned to one strand or the other or none at all is dependent on other factors and isn't encoded in BAM files. As an example,
most common single-end RNAseq experiments use a stranded protocol wherein alignments with a flag of 0 come from the "-" strand
.
Link is here: SAM flag and select reads that map uniquely
Question: SAM flag and select reads that map uniquely
-
Hi Devon, I've spend a lot of time this week trying to REALLY understand this strand, flag, mapping uniqueness issue. I think I'm finally getting my head around it and I have to say I disagree with your first point here on flag 0 meaning. I will outline my thoughts below and please feel free to comment on them. Thanks so much for all the discussions so far.
-
I think a flag field of 0 does mean a read is mapped to the +/forward strand. And flag field indicates where the read itself maps to, between + and - strand of the reference genome.
-
For
dUTP-based stranded RNA-seq
experiments (Illumina TruSeq stranded), alignments withflag of 0
does mean thetranscript/gene is on the -/reverse strand
as you said, but at the same time,the read itself indeed maps to the +/forward strand
, which is the antisense strand for a gene on the -/reverse strand. The"-"
strand you mentioned is on theXS:A:-
field (in Tophat or hisat2 output) and it's for whichstrand a transcript/gene
(from which the read is derived) is on.It means the read is derived from a transcript/gene on the - strand (although it still maps to the + strand of the genome)
. -
Lastly,
flag value
doesn't tell if a read maps uniquely or not. I need to look forNH:i:
field to determine if a read is uniquely mapped.
samtools view bamfile.bam | awk '$2==0' #will output all the primary alignments mapped to the + strand
samtools view bamfile.bam | awk '$2==16' #will output all the primary alignments mapped to the - strand
-
all the primary alignments contain uniquely mapped and primary alignments from multi-mapping reads.
-
uniquely mapped in the above sentence is determined by the aligner and I'm not sure what MAPQ value they have yet.
-
I think I did here in the end of the 3rd paragraph:
-
It means the read is
derived from a transcript/gene on the - strand (although it still maps to the + strand of the genome)
. -
I agree that for this particular RNA-seq strand thing, we care about where the transcript is located.
最终我选择了手动提取
samtools view -h $bam | awk 'length($6) == 3 && $$2~/0/ && $5~/255/ || $0 ~/@/' \
| samtools view -bS | samtools sort -@ 8 > ${sample}_uniq_sorted.bam
网友评论