你到底会如何数基因的外显子和内含子个数呢
Method 1
. We can filter the transcripts from the knownGene table to the subset contained in the knownCanonical table. The knownCanonical table generally includes 1 transcript per gene, and thus is sometimes used as a non-redundant set. You can read about the knownCanonical set on the GENCODE v24 track description page:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=knownGene
knownCanonical: This set identifies the canonical isoform of each cluster ID or gene using the ENSEMBL gene IDs to define each cluster. The canonical transcript is chosen using the APPRIS principal transcript when available. If no APPRIS tag exists for any transcript associated with the cluster, then a transcript in the BASIC set is chosen. If no BASIC transcript exists, then the longest isoform is used.
For example, if counting exons and introns for the knownCanonical set, we would ignore all other transcripts:
KC ########------------##############---------------#######
Total exons = 3
Total introns = 2
Method 2. Another approach is to include all transcripts for knownGene, but don't count overlapping regions more than once.
For example, if counting exons and introns for the knownGene set (excluding overlapping regions), we would count all exons/introns that don't have overlap with another transcript (TX):
TX1 ########------------##############---------------#######
TX2 ########------------##############-----####------#######
Total exons = 4
Total introns = 3
(In your original query, you would have counted 7 exons and 5 introns for the example above).
网友评论