NT064_S43_L001_R1_001.fastq.gz
第一部分:样本名
第二部分:(和 illumina Experiment Manage 的编号一致)S1 .. S* 后面跟的数字与样本在Sample Sheet中的顺序一致,从1开始。不能分配到确定样本的read会归到S0(Undetermined_S0)
第三部分:泳道lane的编号
第四部分:R1表示read1,R2表示read2。R1和R2为paired end reads
第五部分:通常为001
查看fastq的文件格式
[xmxjy@xmxjy filter]$ less allhpv.fastq.gz | head
@TPNB500301:48:HHTKNAFXY:1:11101:11715:1039 1:N:0:TCCGGAGA+NGGATAGG
TGACGNTCTCAATATATGTGTGCTTTTTTGCATATTCATAATCTCCCTACTTTATTTTCTTTTATTTTTAATTGATACATAATCATTATACATATTTATGGGTTAAAGTGTAATGTTTTAATATGTGTAAACATATTGACCAAATCAGGGT
+
AA6AA#EEEEE6EEEEEEEEEEEEE6EEEAEEEAAEEEA/E6/EE/EE6EAAEAEEE6EAEEE6EEEEE/EEEEEAEE/EEEEEEAAEAEEEAEE<EEEE/EEAAE/EEEEEEE/EEEEAEEEEEEEEE/AEAEEAEE<<A/EE<<E//EE
@TPNB500301:48:HHTKNAFXY:1:11101:14066:1039 1:N:0:TCCGGAGA+NGGATAGG
AGACTNTCGTAATATATGTGTGCTTATTTGCATATTCATAATCTCCCTACTTTATTTTCTTTTATTTTTAATTGATACATAATCATTATACATATTTATGGGTTAAAGTGTAATGTTTTAATATGTGTACACATATTGACCAAATCAGGGT
+
6AAAA#EEEEEEEEEEEEAAEE/E/EEEEEAEE6EAA/EAEEEE</EEEA/EEEEEEEEEE6EEEE/E/EAEEAE/EAAEE6EEEEEEEAEEAEEEEEEEEEAEEAEAEEE/EEAEEEE/EEA/EEEE/EE<EAE/E/<E<E/EA<EAAEE
意思如下:
Each entry in a FASTQ file consists of four lines:
• Sequence identifier
• Sequence
• Quality score identifier line (consisting of a +)
• Quality score
@TPNB500301:48:HHTKNAFXY:1:11101:11715:1039 1:N:0:TCCGGAGA+NGGATAGG
以:分隔
@<instrument>
<run number>
<flowcell ID>
<lane>
<tile>
<x-pos>
<y-pos>
<read>
<is filtered>
<control number>
<index sequence>
下面是别人的一张图片
image.png
Quality score
The character '!' represents the lowest quality while '~' is the highest. Here are the quality value characters in left-to-right increasing order of quality (ASCII):
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
image.png
Q值
质量值Q是p的整数映射(即相应碱基判定不正确的概率),主要有两种不同的公式被使用。第一种是评估碱基判定的可靠性的不标准Sanger变体,也称为Phred质量分数:
Solexa流程(即与Illumina Genome Analyzer一起交付的软件)较早使用了不同的映射编码概率p/(1-p),而不是p:
测序质量值和准确度
Phead Quality Score | Probability of incorrect base call | Base call accuracy |
---|---|---|
10 | 1 in 10 | 90% |
20 | 1 in 100 | 99% |
30 | 1 in 1000 | 99.9% |
40 | 1 in 10000 | 99.99% |
50 | 1 in 100000 | 99.999% |
维基百科
https://en.wikipedia.org/wiki/FASTQ_format#File_extension
中文翻译
http://www.cnblogs.com/yahengwang/p/8973948.html
shell 操作
https://www.jianshu.com/p/bc1fe435879c
网友评论