Ontology
概念: 个人理解就是生物信息学界对生物上的一些重要信息,如序列和基因信息 制定一个通用的标准化协议,对一些概念、信息进行精准的定义。而非复杂模棱两可的解释。
在文章 The Sequence Ontology: a tool for the unification of genome annotations,作者这样强调一致性consistency的重要性:
Unfortunately, biological terminology is notoriously ambiguous; the same word is often used to describe more than one thing and there are many dialects. For example, does a coding sequence (CDS) contain the stop codon or is the stop codon part of the 3'-untranslated region (3' UTR)?
There really is no right or wrong answer to such questions, but consistency is crucial when attempting to compare annotations from different sources, or even when comparing annotations performed by the same group over an extended period of time.
- Ontology主要包括两个部分
- what a piece of DNA is: annotations or classification:注释,分类。
- what a piece of DNA does: functional analyses.
Sequence Ontology 序列本体论
某一段序列的注释,分类,genetic features。
在Sequence Ontology Browser有对序列信息进行详细分类、定义。
例如,对于CDS来说,准确的定义是
imageA contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon.
可以对Sequence ontology信息下载进行一些探索
URL=https://raw.githubusercontent.com/The-Sequence-Ontology/SO-Ontologies/master/so-simple.obo
wget $URL
cat so-simple.obo | grep 'name: gene$' -B 1 -A 6
cat so-simple.obo | grep 'PCR' -B 2 -A 2
基因本体论 Gene Ontology
对基因的功能进行注释,分类。对gene products分类,每一个基因可能含有多个功能信息。
两个重要网站Gene Ontology, Quick GO
GO主要包括3个子类:
- Cellular component (CC)细胞组分:基因产物的定位,如细胞核、线粒体基质
- Molecular function (MF)分子功能:元件的活性,如催化活性、结合活性
- Biological process (BP)生物学过程:某些代谢从开始到终止的过程,如嘧啶代谢、配糖基的运输等。
对GO数据的一些探索:
wget http://geneontology.org/gene-associations/goa_human.gaf.gz
grep -v ! goa_human.gaf |cut -f 2|sort |uniq -c \
sort -k1nr |less -S
grep -v ! goa_human.gaf \
|cut -f 14 \
|perl -alne 'print substr($_,0,4)' \
|sort |uniq -c \
|sort -k2nr \
|perl -alne 'print"$F[1]\t$F[0]"'
生信数据的功能分析
对于生物数据的处理,科学家希望能将其从生物的角度做出合理的解释。
当你得到一堆基因或蛋白之后(基因/序列)接下来就可以用通路分析(pathway analysis)或者叫功能分析(functional analysis)
功能通路分析functional pathway analysis主要包括三个层次:
-
过表征分析Over-Representation Analysis
看某功能是否有更加明显的趋势;ORA attempts to find representative functions of a list of genes by comparing the number of times a function is observed to a baseline.
-
Functional Class Scoring (FCS算法)
强调非单个基因的显著影响,而是那些功能相关的类似微效基因累加后其代表的功能通路也有显著的效果。
FCS methods use this information to detect coordinated changes in the expression of genes in the same pathway. Finally, by considering the coordinated changes in gene expression, FCS methods account for dependence between genes in a pathway, which ORA does not.基本步骤包括:1. 单个基因的基因水平的统计值;2. 同一通路上所有基因的基因水平的统计值 聚合成单个通路水平的统计值 3. 评估通路水平的统计显著值。
-
Pathway Topology (PT)通路拓扑学
基于通路拓扑学的方法,需要用到给定通路互作的信息。
网友评论