美文网首页
RNA-Seq 入门:Fastaq文件解析

RNA-Seq 入门:Fastaq文件解析

作者: jlyq617 | 来源:发表于2017-11-20 16:22 被阅读73次

    FASTAQ format stores short-read sequences and Phred qualities from NGS platform into a single file.
    Every 4 lines represent for a short-read.

    图片 1.png

    Four lines per FASTAQ record

    1. @ indicates the sequence id(above is longer than sequence itself) 描述行
    eg2.png

    通常,仪器的使用次数在200-9999次比较适合。

    2. the sequence content of the read 测到的碱基,A/G/T/C/N,其中N表示无法确定的碱基
    3.+ optionally repeat the sequence id (often left empty)
    4.quality string 质量评判

    A quality score is a number.
    One character encodes a number using AscII table
    A quality score represents an error probability.
    Quality scores are used to represent base calling accuracy, alignment accuracy and other probabilities.
    由于如果使用数字表示质量的话,当表示质量的数字为两位及以上时,无法做到一位对应一个数字。因此我们需要用其他的方法将表示质量的数字转换位单个字符,在fastaq的质量评判中我们使用了Ascll table。

    ascll.png
    The number can be convert to probability based on following formula:
    P=10^[-(Q-33)/10]
    Start the scale at character 33 (so Q should minus 33)
    Quality value (Q) range between 33 to 126
    Character range between ‘!’ to ‘~’
    Currently, most NGS platform only produce quality value (Q) in the range from 33 to 73. (from ‘!’ to ‘I’).
    For P value, from 10^0 to 10^-4 (from 1 to 0.0001).
    举例而言:
    比如时质量评判给了一个‘!’:
    查询Ascll table,‘!’对应的数值为33,将其带入P-value的计算公式,即P=10^[-(33-33)/10] =10^0=1
    Various formats for NGS data:

    Input data (raw data): .fasta, .fastq (.SRA)
    Annotation data: .gff, .gtf, .bed
    Alignment result: .sam, .bam, .wig, .bed
    Variant call result: .vcf

    相关文章

      网友评论

          本文标题:RNA-Seq 入门:Fastaq文件解析

          本文链接:https://www.haomeiwen.com/subject/uuvvvxtx.html