美文网首页
2015-Human genomics-A survey of

2015-Human genomics-A survey of

作者: 英天 | 来源:发表于2017-08-19 17:29 被阅读0次

1. 将这些计算工具分为以下三类

(1) basic traditional statistical analysis,
(2) machine learning approaches
(3) assignment of functional and biological information to describe and understand protein interaction networks.

2. 分析大数据的Guideline

Step one: Observe your data, quality control

Step two: Traditional statistics

Groups identified by the researcher either during experimental
design or during the data observation step can be compared here using Student’s t test, analysis of variance (ANOVA), and their nonparametric equivalents such as Kruskal-Wallis, in addition to regression modeling and other tests of traditional statistics. Many tests done simultaneously should be corrected using a multiple
test correction such as the Benjamini-Hochberg correction algorithm

Step three: Dimension reduction with machine learning

使用Table 1所示分类算法将features减少。而这些分类算法又分为Unsupervised和Supervised两类。

   (1)Unsupervised

principal component analysis (PCA)
Independent component analysis (ICA)
K-means
Hierarchical clustering

   (2)supervised

Partial least squares (PLS)
Random forests (RF)
Support vector machine (SVM)

支持上述分类算法的软件工具有:Weka [14], Scikit-learn (Machine Learning in Python)[15], and SHOGUN [16].


Table 1 Summary and comparison of classification and clustering methods

Step four: Pathway and network analysis

For pathway analysis, we refer to data analysis that aims to identify activated pathways or pathway modules from functional proteomic data.

For network analysis, we refer to data analysis that builds, overlays,
visualizes, and infers protein interaction networks from functional proteomics and other systems biology data.


Table 2 Summary of functional and network tools

3. Longitudinal or time-series data

Several software tools are available that specifically address
the problems associated with time-series data.
TimeClust is a stand-alone tool which is available for different platforms and allows the clustering of gene expression data collected over time with distance-based, model-based, and template-based methods [61]. There are also several other packages available in R such as maSigPro [62], timecourse [63], BAT [64], betr [65], fpca
[66], timeclip [67], rnits [68], and STEM [69].
Python probabilistic graphical query language (pGQL) [70] allows its user to interactively define linear HMM queries on time-course data using rectangular graphical widgets called probabilistic time boxes. The analysis is fully interactive, and the graphical display shows the
time courses along with the graphical query.
In JAVA, PESTS [71] and OPTricluster [72] both of which are
stand-alone with a GUI interface are useful for the clustering
of short time-series data in MATLAB.
DynamiteC is a dynamic modeling and clustering algorithm which
interleaves clustering time-course gene expression data
with estimation of dynamic models of their response by
biologically meaningful parameters [73].

相关文章

  • 2015-Human genomics-A survey of

    1. 将这些计算工具分为以下三类 (1) basic traditional statistical analys...

  • ACM Computing Survey 2019

    ACM Computing Survey 2019 A Survey on Power Management Te...

  • BA: Survey Design

    Active data -acquisition - Survey Survey Design: 1. S...

  • 传播学理论备忘录

    1. Cross-sectional survey/panel survey They are both rese...

  • Survey

    调查是专业人员最常用的工具之一,尤其是社会科学中。因为从调查中获得的数据可以量化,调查经常包含在广泛的统计标题下。...

  • A Survey

    Hello, everyone. My name is Tom. I'm the captain of Class...

  • The Happiest Country.

    Although there is a survey of the happiest country,but ...

  • 996

    this charts is base on The questionnaire survey overview

  • Do a survey

    1 How does the lemon smell? lt's nice. 2 Taste it. How do...

  • survey software

    1. Typeform. Only has Jump logic, i.e., jump to another q...

网友评论

      本文标题:2015-Human genomics-A survey of

      本文链接:https://www.haomeiwen.com/subject/epktdxtx.html