以人类基因组为例
1. 测序深度粗略计算:
3000 M nt (300 Mb) 大约有1/30为 蛋白编码基因,单端测序长度位100 nt (bp), 1M reads 产生100 M nt的数据相当于1x coverage. (3000 M nt x 1/30/100 nt=1 coverage)。
(英文原文:human genome has 3000 M nt of which approximately 1/30 is used for protein coding genes, single reads 100 nt in length, 1 M reads gives 100 M nt of sequence data equals 1 x coverage. )
2. 一个read map到某个基因的几率?
假定基因的平均大小为4000 nt (100 M nt 被 25000 个基因隔开)。30 M reads相当于30 x coverage,单个 reads map的平均表达和长度基因位4000 nt x 30 coverage/100 nt 1200 times。 如果一个基因的表达为1/2000,则检测到这个基因的概率为50%。
(英文原文:to calculate the probability that a read will map to a specific gene, we can assume an average gene size is 4000 nt (100 M nt divided by 25000 genes). 30 M reads equivalent to 30 x coverage, we can expect a single read to map to the average expressed and length gene 4000 nt x 30 coverage/100 nt 1200 times. if the gene expressed 1/1200, we have 50% to get expressed gene.)
单端测序还是双端测序?
单端测序在建库过程中可能存在RNA片段化,接头连接,链转向等偏好性,为了避免这种文库准备过程中的偏好性,可以进行双端测序。
(英文原文:Single (library preparation: fragmentation of RNA, ligation of adaptor, orientation of strands), to avoid biases in these library preparation steps, one way ro increasing the randomization of fragments to be sequenced is sequencing both ends of a library clone.)
网友评论