美文网首页
2023-02-09 | gff文件的注释信息解读

2023-02-09 | gff文件的注释信息解读

作者: 汪大山 | 来源:发表于2023-02-08 16:32 被阅读0次
1       ensembl gene    339070  350389  .       -       .       ID=gene:ENSBTAG00000006648;biotype=protein_coding;gene_id=ENSBTAG00000006648;logic_name=ensembl;version=6
1       ensembl mRNA    339070  346959  .       -       .       ID=transcript:ENSBTAT00000007786;Parent=gene:ENSBTAG00000006648;biotype=protein_coding;transcript_id=ENSBTAT00000007786;version=5
1       ensembl exon    339070  339312  .       -       .       Parent=transcript:ENSBTAT00000007786;Name=ENSBTAE00000545140;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=ENSBTAE00000545140;rank=4;version=1
1       ensembl CDS     339070  339312  .       -       0       ID=CDS:ENSBTAP00000007786;Parent=transcript:ENSBTAT00000007786;protein_id=ENSBTAP00000007786
1       ensembl exon    342547  342721  .       -       .       Parent=transcript:ENSBTAT00000007786;Name=ENSBTAE00000470282;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=ENSBTAE00000470282;rank=3;version=2
1       ensembl CDS     342547  342721  .       -       1       ID=CDS:ENSBTAP00000007786;Parent=transcript:ENSBTAT00000007786;protein_id=ENSBTAP00000007786
1       ensembl exon    346602  346889  .       -       .       Parent=transcript:ENSBTAT00000007786;Name=ENSBTAE00000461318;constitutive=0;ensembl_end_phase=2;ensembl_phase=2;exon_id=ENSBTAE00000461318;rank=2;version=2
1       ensembl CDS     346602  346889  .       -       1       ID=CDS:ENSBTAP00000007786;Parent=transcript:ENSBTAT00000007786;protein_id=ENSBTAP00000007786
1       ensembl exon    346925  346959  .       -       .       Parent=transcript:ENSBTAT00000007786;Name=ENSBTAE00000510779;constitutive=0;ensembl_end_phase=2;ensembl_phase=0;exon_id=ENSBTAE00000510779;rank=1;version=1
1       ensembl CDS     346925  346959  .       -       0       ID=CDS:ENSBTAP00000007786;Parent=transcript:ENSBTAT00000007786;protein_id=ENSBTAP00000007786
1       ensembl mRNA    339070  350389  .       -       .       ID=transcript:ENSBTAT00000008737;Parent=gene:ENSBTAG00000006648;biotype=protein_coding;tag=Ensembl_canonical;transcript_id=ENSBTAT00000008737;version=6
1       ensembl exon    339070  339312  .       -       .       Parent=transcript:ENSBTAT00000008737;Name=ENSBTAE00000545140;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=ENSBTAE00000545140;rank=4;version=1
1       ensembl CDS     339070  339312  .       -       0       ID=CDS:ENSBTAP00000008737;Parent=transcript:ENSBTAT00000008737;protein_id=ENSBTAP00000008737
1       ensembl exon    342547  342721  .       -       .       Parent=transcript:ENSBTAT00000008737;Name=ENSBTAE00000470282;constitutive=1;ensembl_end_phase=0;ensembl_phase=2;exon_id=ENSBTAE00000470282;rank=3;version=2
1       ensembl CDS     342547  342721  .       -       1       ID=CDS:ENSBTAP00000008737;Parent=transcript:ENSBTAT00000008737;protein_id=ENSBTAP00000008737
1       ensembl exon    346602  346924  .       -       .       Parent=transcript:ENSBTAT00000008737;Name=ENSBTAE00000062541;constitutive=0;ensembl_end_phase=2;ensembl_phase=0;exon_id=ENSBTAE00000062541;rank=2;version=4
1       ensembl CDS     346602  346924  .       -       0       ID=CDS:ENSBTAP00000008737;Parent=transcript:ENSBTAT00000008737;protein_id=ENSBTAP00000008737
1       ensembl exon    350267  350389  .       -       .       Parent=transcript:ENSBTAT00000008737;Name=ENSBTAE00000512015;constitutive=0;ensembl_end_phase=0;ensembl_phase=0;exon_id=ENSBTAE00000512015;rank=1;version=1
1       ensembl CDS     350267  350389  .       -       0       ID=CDS:ENSBTAP00000008737;Parent=transcript:ENSBTAT00000008737;protein_id=ENSBTAP00000008737

相同下载来源的gff信息和gtf信息表达的位置信息是一致的

gff文件不像gtf文件那样包含start_codon、stop_codon,也把transcript改为了mRNA。

通过gene-->mRNA-->exon-->CDS这样的四级层级关系来描述基因组的注释情况。

从后面的详情中依然可以看出ID信息是根据ID=transcript:,也就是转录本的id来区分的,

所以划分的实质是转录本,不同转录本都会被记录

相关文章

网友评论

      本文标题:2023-02-09 | gff文件的注释信息解读

      本文链接:https://www.haomeiwen.com/subject/tbvqkdtx.html