美文网首页
2018-04-25-(GTF文件各列含义 )

2018-04-25-(GTF文件各列含义 )

作者: 天秤座的机器狗 | 来源:发表于2018-04-25 16:05 被阅读8次

    转载自 马疾香幽的博客

    Fieldsmustbetab-separated. Also, all but the final field in each feature linemust contain a value; "empty" columns should be denoted with a'.'

    seqname- name of the chromosomeor scaffold; chromosome names can be given with or without the'chr' prefix.Important note: the seqnamemust be one used within Ensembl, i.e. a standard chromosome name oran Ensembl identifier such as a scaffold ID, without any additionalcontent such as species or assembly. See the example GFF outputbelow.

    source- name of the program thatgenerated this feature, or the data source (database or projectname)

    feature- feature type name, e.g.Gene, Variation, Similarity

    start- Start position of thefeature, with sequence numbering starting at 1.

    end- End position of thefeature, with sequence numbering starting at 1.

    score- A floating pointvalue.

    strand- defined as + (forward)or - (reverse).

    frame- One of '0', '1' or '2'.'0' indicates that the first base of the feature is the first baseof a codon, '1' that the second base is the first base of a codon,and so on..

    attribute- A semicolon-separatedlist of tag-value pairs, providing additional information abouteach feature.

    1.染色体名

    2.注释信息的来源,比如”Genescan”、”Genbank”

    等,可以为空,为空用”.”点号代替

    3.注释信息的类型,比如Gene、cDNA、mRNA等,或者是SO对应的编号

    4、5.开始和结束位置

    6.得分,数字,是注释信息可能性的说明,可以是序列相似性比对时的E-values值或者基因预测是的P-values值。”.”表示为空。

    7.序列的方向,

    +表示正义链, -反义链 , ? 表示未知

    8.阅读框:有数字0、1和2。0代表序列的第一个碱基为密码子的第一个碱基,1代表是密码子第二个,2代表第三个。

    9.以多个键值对组成的注释信息描述,键与值之间用”=“,不同的键值用”;“隔开,一个键可以有多个值,不同值用”,“分割。注意如果描述中包括tab键以及”,=;”,要用URL转义规则进行转义,如tab键用

    代替。键是区分大小写的,以大写字母开头的键是预先定义好的,在后面可能被其他注释信息所调用

    相关文章

      网友评论

          本文标题:2018-04-25-(GTF文件各列含义 )

          本文链接:https://www.haomeiwen.com/subject/rghylftx.html